OpenStreetMap 로고 OpenStreetMap

New quality checks in the Osmose QA tool for links from OpenStreetMap to Wikidata

Geonick님이 English로 2022년 7월 7일에 게시함. 최근 2022년 7월 11일에 업데이트됨.

Wikidata is a free knowledge base for linked open data designed to support Wikipedia and its sister projects, such as Wikivoyage. It contains over 97 million entries structured as a “Labeled Property Graph,” which is more powerful than RDF-based graphs. Like OpenStreetMap (OSM), Wikidata (WD) is an open crowdsourcing project with a large and active community.

Since 2014, OSM can be linked to WD through its tags. Currently, there are about 5.5 million such Wikidata tags with steadily growing popularity. These links can be used to create interesting products, for example a map with castles enriched with factual data from WD. However, the quality of these manually captured links in OSM is as yet unknown and untested. One must also note that the preferred way from WD to OSM - the other way around - is to use only coordinates (WD property P625) - i.e., no WD properties such as P402 are to be used because this covers only OSM relationships.

Now, two computer science students, Jari Elmer and Timon Erhart, from the University of Applied Sciences of Eastern Switzerland (OST), with the help of Sascha Brawer - a young software engineer in “un-retirement” and Wikipedian - have developed an application called “osm wikidata quality checker”. The goal was to check the existing links from OSM to WD. The errors found - for example invalid WD entries in OSM - are also sent to osmose with a suggested correction. Osmose is a quality assurance tool for detecting problems in OSM data. The goal of the application was to become an integral part of OSM’s quality assurance ecosystem. It handles the large amounts of data in the two databases (about 1.5 TB each).

The successful result of the thesis is a data processing pipeline capable of finding diverse types of erroneous Wikidata links in OSM with a high accuracy of >95%. By using multiprocessing and the developed database model, where only the relevant data is extracted, the tool is able to handle the large amount of data and check the whole world on a weekly basis. The difficulties of dealing with crowdsourced data, where unforeseen data errors are to be expected, were also mastered, resulting in a robust software. Documentation and an easy-to-understand architecture allow the tool to be extended and additional checks to be implemented. The optional configuration provides the necessary flexibility in operation and helps with further development.

Currently, a total of over 30,000 errors are found in the following nine categories:

  1. Incorrect value for Wikidata-Tag
  2. Wikidata item does not exist
  3. Redirected value for Wikidata tag
  4. The distance between OSM object and linked Wikidata item is unusually large
  5. Characteristics of the OSM tags and linked Wikidata item do not match
  6. The secondary Wikidata tag and the linked Wikidata item do not match
  7. The OSM object is linked to an unpermitted Wikidata item
  8. Unpermitted link to an instace of living organism on Wikidata
  9. The OSM object does not match the Wikidata item

We are happy that these categories already have been incorporated into Osmose (see e.g. this Tweet) and are ready also to be integrated e.g. in the id editor.

This is the OSM Wiki page of the tool. We are now searching for a permanent place to host this data processing pipeline.

위치: Rapperswil, Rapperswil-Jona, Wahlkreis See-Gaster, St. Gallen, 8640, Switzerland
이메일 아이콘 Bluesky 아이콘 Facebook 아이콘 LinkedIn 아이콘 마스토돈 아이콘 텔레그램 아이콘 X 아이콘

토론

2022년 7월 18일 12:11Claudius Henrichs님의 의견

Thanks for creating this report 👏 Great QA tool to use in combination with https://osm.wikidata.link/ to complete the mapping OSM->WD

2022년 12월 23일 19:18Mateusz Konieczny님의 의견

osm.wiki/OpenStreetMap_Wikidata_Quality_Checker

Project Repository: (tba.)

Can you publish it?

2022년 12월 26일 13:24Geonick님의 의견

Hi Mateusz: The code repo has been published in the mentioned Wiki page osm.wiki/OpenStreetMap_Wikidata_Quality_Checker now also as “Project repository”.

2022년 12월 26일 16:39Mateusz Konieczny님의 의견

Thanks! I will look at it!

I wonder would it be feasible for you to detect problem of “wikipedia/wikidata linked multiple times and it is not waterway/pipeline/railway/roads/etc[1] feature where linking multiple times may be correct” (or maybe you detect this kind of issue already?)

[1] I have list of such features and Ruby script trying to detect features invalidly linked multiple times.

BTW, I tried editing using Osmose and run into https://github.com/osm-fr/osmose-frontend/issues/437

2022년 12월 26일 17:20Geonick님의 의견

Hi Mateusz: I’ll forward your comments to Timon, the maintainer of the OSM Wikidata Quality Checker, who is now working in my institute IFS at OST.

2022년 12월 26일 22:43turbotimon님의 의견

@Geonick: thanks for this post and also for adding repo link to the osm wiki page!

@Mateusz: “detect problem of “wikipedia/wikidata linked multiple times”: Interesting idea, I created an issue for that. Please feel free to add more details, like your list there!

댓글을 남기려면 로그인하세요