Alugu n OpenStreetMap OpenStreetMap

Aɣmis n dalek2point3

Inekcamen ineggura n uɣmis

Who uses OpenStreetMap?

Yuzen-it-id dalek2point3 af 24 October 2014 s English

I have recently been interested in measuring how openstreetmap is being used in different services around the world. Now obviously, this is a very hard question to answer, because, being an open project, OSM data can be downloaded at any point in time, and you can start playing around with it. We dont require any permission for this action, and while the Odbl license does require attribution if you use the data in production, such attribution is hard to track. Openstreetmap data can be found on planes, in disaster relief – not to mention the thousands of web and mobile applications that use it for different intents and purposes.

Alright, having convinced you that its quite hard to track all possible uses of openstreetmap, perhaps it is possible to track usage of OSM tiles in web applications online? Now, while still difficult, this is easier to accomplish, because at the very least the question is well defined, and in theory, answerable. If we could survey each and every website out there, see if they use tiles from an OpenStreetMap server (or Mapbox server) we might be able to say something about OSM usage. Now, this still would not cover cases where folks have set up their own tileserver with OSM data – which one might argue is a quite common way to use OSM data.

Either way, I recently discovered HTTPArchive and thought it would be a cool project to track the usage of different mapping APIs online, including Mapbox and folks using OpenStreetMap tiles (which you’re not supposed to do for heavy usage!). What HTTPArchive does it crawl about the top million websites, and for each website it records the HTTP requests that the site is making. Now, this it turns out, is a great way to measure the usage of javascript frameworks, Google Analytics etc – except, no one so far has used it to look at mapping APIs and OpenStreetMap in particular!-

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

NOTE: This post is mostly for my reference, might not make a lot of sense for other folks – but thought I’d put this out there in case you’re interested in the intricacies of TIGER data in the US.

Quick post to highlight some charts as I’m doing research on this topic:

  • Number of ways with a highway tag, and counting a way as one using unique “name” tags. TIGER counties are the ones that complete “good” TIGER data, while control counties are those that got missing TIGER data.

Imgur

  • Same chart, but only for highway class = 1 (i.e. motorway and trunk). Seems like someone added a lot of new class 1 highways in 2010q4

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

The Missing Mappers Problem?

Yuzen-it-id dalek2point3 af 2 July 2014 s English Last updated on 3 July 2014.

While playing around with the changeset data I noticed an interesting pattern. There were some users who had made a lot of contributions to the map, who were nowhere to be seen. A lot of the talk in the community has been on attracting new mappers, but if we’re losing existing mappers that is surely a problem, no?

Wondering if this was a big problem, I decided to dig in. Here is what I found – it doesnt seem to be a HUGE problem at the very top, but there are many mappers who make 10 or 100 changesets that never come back. Note that this is only using data from the USA at the moment.

missingmappers

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

Some Notes from Analyzing OSM US History Data

Yuzen-it-id dalek2point3 af 16 June 2014 s English Last updated on 28 June 2014.

The Data

Today, I’ve finally gotten a chance to play with data from the OSM History extract that I created using the parser that I wrote about last time.

This is what the data contains:

  • Every node that contains either an amenity, addr:housenumber or place tag.
  • For each node, I record basic metadata, “name” and gnis* tags
  • Every way that contains either an amenity, highway, building or parking tag.
  • For each node, I also record basic metadata and the following tags:
    • name
    • tiger:cfcc, tiger:county, tiger:reviewed
    • access
    • oneway
    • maxspeed
    • lanes

The resulting flatfiles are large, partly because I’m parsing the “history” data, so I’m including every past version of every node and way, in addition to the most current version. There are about 6.3 million node entries and about 48.7 million way entries. For each of these nodes and ways, I ran them through my point-in-poly program to code the county and the MSA that each way / node lies in.

The next big step was to drop imported data. I really dont care about this – obviously this includes data from the TIGER import but also many other major edits in the US. Interpret the numbers below as the contribution of OSM editors – but major national level imports. I’ve not removed smaller county level imports, because I see them as being relevant to my analyses, but also because they’re harder to pin down. So the data includes any way, node with the relevant tag touched by a non-TIGER (and some other importer accounts).

Some Highlights

Way data

How many items do we have for each of the 4 types? * Highway = 15.9 million versions, 7.9 million uniques * Building = 6 million versions, 4.9 uniques * Amenity = 460k versions, 331 uniques * Parking = 52.6k uniques, 66k versions, * All data = 22.3 million versions, 13.1 million uniques

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

One of the most prominent users of OpenStreetMap is Craigslist. Craigslist users often use OpenStreetMap to indicate the location of the house / item they are selling. When they dont find the street they’re looking for – Craigslist users have the option to submit a note to the OSM.org notes system.

I’ve found these notes to be useful, quite often containing information about subdivisions that are missing from the map. I wanted to visualize all the notes submitted through this system. Even though Craigslist does not submit notes using a dedicated URL (although I think they should!), they use a peculiar notes system and notes from Craigslist almost always look like this one:

bounds: (38.0118,-121.943 - 37.9966,-121.9013) osm.org/?box=yes&notes=yes&bbox=-121.943%2C37.9966%2C-121.9013%2C38.0118 Map is missing data here. Freshwater Court in Pittsburg CA is not showing up

Notice – how they begin with “bounds”. This suggests that using the OSM Notes API to search for the text “bounds” should give a reasonably accurate picture of notes from the CL system.

I wrote a script to use the API to get this data, and parse it into a CSV ready for visualization – you can check out the code on github and visualized the 2980 notes that I found using Cartodb. Each dot contains a link that will take you to the Notes page where you can read the full text of the comment.

And this is what we get! Click on the image to be taken to the CartoDB page (I cant figure out how to embed IFRAMEs in diary entries). You can even download the raw data here

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

Welcome to OpenStreetMap Telangana!

Yuzen-it-id dalek2point3 af 2 June 2014 s English Last updated on 6 June 2014.

June 2 is a monumental day for residents of the new state of Telangana in southern India. After decades of struggle, the government of India decided to create an independent state of Telangana, separate from the state of Andhra Pradesh. Historically, backward and poor as compared to the rest of Andhra Pradesh, the hope is that a more empowered state can bring development to the people of Telangana.

Meanwhile, in the digital world – OpenStreetMap also welcomed Telangana with open arms. User PlaneMad created the state boundary relation, and it went live today – exactly the day that Telanga came into existence on the ground! Before it went live, the OpenStreetMap community had a chance to talk and discuss this change and prepare for the impending arrival of the state!

Here are the two state boundaries on OpenStreetMap today – Telangana

Imgur

and the new, smaller Andhra Pradesh

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

One of the mysteries of OpenStreetMap not known to the new user is the issue of imports. I’ve been pondering for a while what the best way was to identify what user accounts are related to imports, what they have been importing, where is the data coming from, and what portion of the data comes from imports and what is “purely” from contributors.

Now, my sense is that initially there was a lot of importing going on informally, till someone instituted the formal process. The Import Catalogue where all the imports are supposed to be documented is sorely in need of some cleaning up and fixing. That is, there are many imports there that are not recorded. Hopefully we can use the data to fix the page as well.

In my own research, Im interested in identifying imports so as to get rid of them! I want to understand contributor activity, and your analysis can get seriously skewed if you consider imports. One example of this is Dennis’ SOTM-US 2014 talk where they found that there was lot of activity in North Dakota, but most of this was coming from imports (or so we think!).

Here, I wanted to write some notes about how I’ve discovered the best way is to indentify imports in the changesets data. The changesets data contains a field called “num_changes” that records the number of changes in any given changeset. A feature of most imports is that they cram as many features as they can in one changeset (the max is 50000). So what you can do is, look at all the changesets for a given user, and if a extraordinarily high number of them (say 80%) have more than 5000 changes, then its likely that the account is being used for imports.

Using this method, I calculated “import accounts” (at least 50% of their changesets have above 5000 changes and overall they have at least 50 changesets) to get this list of large import accounts in the US. Here “mean” is the percent of edits that are above 5000 changes, and N is the total number of changesets for that user.

See full entry

I’ve been processing changesets inside the BBOX {-125, 24.34, -66.9, 49.4} see code.

I then ran the center of each changesets on a point-in-polygon algorithm and that produces this rather interesting map of changesets in and around the US, but not actually IN the continental US. Thats about 300k changesets.

changeset map

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

Extracting Data from OpenStreetMap History Files Using the *New* Osmium

Yuzen-it-id dalek2point3 af 24 May 2014 s English Last updated on 25 May 2014.

TLDR; Version

I wrote a small script to get some data out of OpenStreetMap history files using Osmium. You can find it here: https://github.com/dalek2point3/osmium-tools/

The Story So Far …

I’m working on a project to analyze OSM contribution history in the United States. One way to do this is to use the changeset dumps – the changeset dumps contain fields for the username of the contributor, the time of the contribution and the bounding-box of all the edits made in a particular changeset. Using this data, and approximating the location of the edit as the center of the bounding box it is possible to do a lot of analysis about how contribution activity has changed and evolved in different regions around the country. My previous diary entry is one example of such an analysis.

History Files Here We Come!

However the approach of analyzing changesets comes quickly to a dead end if you want to understand the type of contributions. What are people actually doing when they edit in a certain area? Are they adding new subdivisions (likely to be lots of ways and “highway” tags here), are they adding useful metadata to existing streets (lots of maxspeed and oneway tags here) or are they adding POIs (amenity tags) and natural features like water bodies, hiking trails etc?

The changeset files are absolutely useless to understand this kind of activity. If you want to do such an analysis you have to look to the history files. History files are exactly like planet XML files, but with every past version of a particular node, way or relation also recorded. Like the planet XML, each feature comes with a changeset id, so you can reconstruct changesets from this file, and then look into what is actuall going on in the data.

Great! So how do we do this exactly?

See full entry

Adig: East Cambridge, Cambridge, Middlesex County, Massachusetts, 02141, United States

Battlegrid -- What still needs to be fixed?

Yuzen-it-id dalek2point3 af 12 March 2014 s English Last updated on 13 March 2014.

Martijn Van Exel’s Battlegrid has been a fun resource for me to fix TIGER errors. The one problem however is that it was hard for me to identify regions that really needed some love, because Battlegrid does not allow you to zoom out and do a visual overview of what cities might have the most potential issues with the TIGER data.

For example, Chicago seems to have had a lot of errors, but these errors seem to have been mostly resolved:

Chicago

While there are still regions like this part of Charleston, NC that needs a lot of work:

See full entry

I see the map as a garden. It needs love and care from local mappers who know the area and can look after it regularly. This means editing regularly in the area, as against armchair mapping – of which I do quite a bit (!), but there is nothing like a community of locals taking care of an area.

What areas of the US receive such community love and what areas do not? Digging into the raw changeset dump produces some interesting results.

Methodology

This is what I did :

  1. For all changesets, approximate the location of the changeset to the center of the bounding box and geocode the point to the country level – throw out all changesets which are not geocoded to the US and which seem to be over 2 latitudes or 2 longitudes wide (these are mostly programmatic edits)

  2. Remove changesets by special users, davehansentiger, milenko, OSMFRedactionbot, bot-mode, woodpeck_fixbot and nhd-import (these were the big import accounts I identified, did I miss any?)

  3. Divide up the country into .1 X .1 latitude / longitude squares (this is approximately a 11km X 11km box) and treat each box as a little garden of its own. For each of these of these gardens ask the following questions:

a) how many unique users have made commited a changeset in this area? b) how many unique users have made more than 5 changesets in this area? c) how many unique users have made more than 10 changesets in this area?

  1. Once I calculated these three metrics, then I was ready to generate some pretty maps and analyses! I’m presenting some results below.

Findings

  1. Here are the top OSM Gardens in the US that are well taken care of in the US.

See full entry

Fixing Tiger Deserts : The Progress So Far ...

Yuzen-it-id dalek2point3 af 1 March 2014 s English Last updated on 2 March 2014.

A History of TIGER in OSM

[TIGER] (http://http://www.census.gov/geo/www/tiger/) data serves as the base data for much of US map data for all the major US map providers including Google, Nokia and TomTom. Much of OpenStreetMap data for the US is also based off of the 2005 version of TIGER data and was completed between 2007 and 2008. Here is an animation of the import process thanks to Scurio

TIGER import animation

TODO: Go through the notes on the TIGER/Line website and figure out the major changes in the data collection process. The Wikipedia page is also helpful.

However, unfortunately the TIGER data was never designed to be used as an accurate map of the US which could be used reliably for things like GPS routing – it was a CENSUS project with more limited objectives. However, the consensus is that, major improvements were made to TIGER between 2000 and 2010 – for OSM however, because the import was made with the 2005 data, it “caught TIGER halfway through the update cycle” ref

See full entry