OpenStreetMap 로고 OpenStreetMap

Scaling multilingual name tags with Wikidata

PlaneMad님이 English로 2016년 11월 8일에 게시함. 최근 2016년 11월 9일에 업데이트됨.

Wikidata, the crowdsourced database of structured knowledge by the Wikimedia movement has grown to over 24 million entries and by now has structured information for every major settlement on earth. These are extremely useful properties like multlingual labels, statistics like populations and GDP, and other related information like politics, history and media about the place (See London, New York City, Timbuktu).

Geolocated articles on Wikidata, with those added in the last year highlighted in pink. Source: Wikidata Map

Current state of multilingual tags

One of the great strengths of OSM is to leverage the data to create create multlingual maps that make the map accessible to a lot more readers than just the local population. Since the beginning of the project, the community have been adding various name:code tags for this purpose, and has resulted in map features with a ever growing list of multilingual names eg. the node for London has 171 properties, of which 155(90%) are name tags in various languages.

A more scalable approach would be to leverage the Wikidata entry for London, which has the translated name in 248 languages, and growing automatically with every Wikipedia page of the city that is created in a new language.

This would also enable the translation of a map to languages where that language on OSM would be considered non local and not worthy on adding to the map, eg. Ukranian labels for cities and towns in UK.

The first step to start leveraging the power of Wikidata from OSM is adding a simple wikidata property to the feature on OSM with the associated QID of the corresponding concept on Wikidata eg. wikidata=Q84 for London. Check out this video by user:polyglot on doing this via the JOSM Wikipedia plugin or via the iD editor.

Matching Wikidata items to OSM

Just like OSM, Wikidata items of places have tags describing the feature and coordinates that make it possible to automatically match a feature on OSM to the corresponding feature on Wikidata. Unfortunately the geographical accuracy of Wikidata entries cannot be trusted, as many of the coordinates are derieved from Wikipedia pages which in turn are usually derived from Google Maps. Moreover entries of lesser known places may not be tagged correctly on Wikidata and might result in ambiguous matches to an OSM feature. For this reason manual confirmation of a match is necessary.

At the Mapbox data team, we have been experimenting with adding Wikidata tags to cities and towns on OSM based on an exact name and location match. The possible matches were loaded onto a spreadsheet with the match distance and Wikidata description of the corresponding item. After a manual review, its easy to confirm the match with a very high degree of confidence based on the name, distance and description of the match. With this approach we have found that just an exact name and location match can give a 99% success rate for places.

screenshot 2016-10-28 17 42 12 Over 5,300 cities and towns have been updated with corresponding wikidata tags in the last two weeks http://overpass-turbo.eu/s/jGy

There are two cases when the name matching happens: - Unique matches: One OSM feature matches to one Wikidata feature - Duplicate matches: One OSM feature matches to multiple Wikidata features with the same name

Unique matches

In most cases, the location of the matched feature on Wikidata is less than a few Kms, and by confirming from the description that the feature is also a city or town, its possible to confirm this was the correct match. It is important to be careful about the feature description as in some cases Wikidata may have ambiguous entries that represents multiple concepts like both a city and a province with the same name as one object.

For unique matches with a large match distance >10kms, it is likely the match was to another place with the same name and is an incorrect match. In a few rare cases, the Wikidata location was found to be incorrect and was actually a correct match.

Duplicate matches

When an OSM feature matches to multiple Wikidata entries with the same name, it is considered a duplicate match. In most cases a distance filter of around 10km enables a unique match, and a further look at the description can confirm the match is correct.

In a few rare cases multiple OSM features with the same name and location match to a single Wikidata feature. These are places with duplicate nodes on OSM itself and need to be merged.

What next?

Large scale map features like countries, cities, towns and water bodies are great candidates to start matching with Wikidata as they are fairly well defined on both projects and can be matched without ambiguity. Doing this will allow us to better understand the value that Wikidata can add to OSM, and help pave the wave for more interesting map services that can be built on open data.

There’s been some amazing work from EdwardBetts on matching all of Wikidata to OSM. You can see the results and this can be a good push to the efforts of contributors like User:Pigsonthewing on bringing the two biggest crowdsourced open data projects in humanity can get closer together.

이메일 아이콘 Bluesky 아이콘 Facebook 아이콘 LinkedIn 아이콘 마스토돈 아이콘 텔레그램 아이콘 X 아이콘

토론

2016년 11월 8일 13:17d1g님의 의견

Thank you to everyone involved in Wikidata and OpenStreetMap integration!

Currently Russian administrative divisions (relations) are well curated in OpenStreetMap but often we don’t have link back to an OSM relation in Wikidata items. We would appreciate bots in this area.

2016년 11월 8일 14:42DenisCarriere님의 의견

Awesome work! Keep it up!

2016년 11월 9일 07:57pavanvijjapu님의 의견

Good Analysis and location matching limitations rightly pointed out.

2016년 11월 9일 09:32ff5722님의 의견

Good initiative! I will add wikidata links manually when I come across a place that lacks them.

2016년 11월 10일 13:25SimonPoole님의 의견

The tiny weeny issue with this is naturally that there is the underlying assumption that wikidata is correct and that the data meets our quality criteria (as in actually being in use and not invented).

2016년 11월 10일 13:54PlaneMad님의 의견

Since the matching is based on the name, location and description on two databases being coherent, the chances of having invented data being added is really low, unless of course the same invented data made it to both the databases, and we found this did happen with the GNIS place data in the US. Check out this discussion osm.org/changeset/43187605

Still figuring out what the scale of the issue is, since it looks like nobody really reviewed if all these towns were tagged correctly on the map in the last 9 years.

2016년 11월 14일 16:58LogicalViolinist님의 의견

Canada should be mostly complete…those that don’t have wikidata tags dont have a wikidata page

2016년 11월 14일 17:30pigsonthewing님의 의견

Great post; and great work! Thank you for the namecheck.

2016년 11월 25일 13:59Skippern님의 의견

I like the idea of using Wikidata to link different platforms of information, but I miss information about API and development tools to properly make use of it. The problem is obviously not on OpenStreetMap side, but rather on WikiData side. An use example, a tool getting border relations from OSM, and collects the names of the City/State/Country from name:* tags, it could also call a WikiData API for the same reason, and would than be able to get the names from a broader selection, and less prone to miss names due to limited tagging, i.e., several Chinese provinces have no latinised name tags, though this might exist in WikiData.

2016년 11월 26일 11:10gorn님의 의견

Great initiative! I see one danger thought. Usually a city or village is also connected to surrounding region. Both the city and the region cam be mapped in OSM (and in Czechia they always are) having the same or similar name. One needs to be carefully to attach the wiki data label to the right area than. If needed I can easily fingers and example.

댓글을 남기려면 로그인하세요