Лагатып OpenStreetMap OpenStreetMap

Дзённік tchaddad

Свежыя запісы ў дзённіку

End of Project Summary

Апублікавана карыстальнікам tchaddad 2 Верасень 2019 на мове English

Summer has come to an end, and so this post is to wrap up the progress made over the course of the “Add Wikidata to Nominatim” project. Overall, the main contributions are documented in the 4 preceding diary posts, and in:

  • updated steps for extracting Wikipedia data and calculating importance scores
  • a new script for extracting Wikidata items and place types

These new processes have made big improvements in several OSM-to-Wikipedia comparison metrics as compared to equivalent numbers from 2013 (when the previous Wikipedia snapshot was taken).

Improved Numbers

For context, the number of Wikipedia articles in the top 40 languages in 2013 was 80,007,141, and the number of Wikipedia articles for the same 40 languages in 2019 was 142,620,084 - an increase of ~78%.

Within these article records, in 2013 it was possible under the old processing steps to attach latitude and longitude numbers to 692,541 articles, while in 2019 it was possible to enrich 7,755,392 records with location information - an increase of ~1,020%. This particular statistic largely reflects an improvement in the source Wikipedia / Wikidata projects.

More exciting, with the old method of linking Wikipedia articles to osm_ids, it was possible to link 313,606 Wikipedia article importance scores to osm_ids, but with the new method that uses both Wikidata item ids, and Wikipedia pages together, the number of Wikipedia article importance scores that can be linked has risen to 4,730,972 - an increase of ~1,409%. This increase is due to both the large number of Wikipedia and WIkidata tags added by OSM contributors since 2013, as well as the inclusion of Wikidata item ids in the linking process for the first time via this project.

Future Work

Although the project technically concludes today, there are obviously always areas of future work where more gains can be made. These include:

See full entry

Wikidata Queries

Апублікавана карыстальнікам tchaddad 30 Жнівень 2019 на мове English

The last post introduced Wikidata, and covered how to extract basic information from wikidata dumps using a postgres database and regular SQL queries. This post explains some simple SPARQL queries and the Wikidata Query System and related API, and how they relate to the improvements desired in Nominatim. The following examples, along with a few other tools from the Wikidata ecosystem, are a good start for any OSM contributor that is Wikidata-curious. SPARQL sur Wikidata CC-BY Wikimedia Commons, Jorge Abellán, Berlekemp

Explaining SPARQL

For a new user, probably the most foreign aspect of using Wikidata is the need to know something about SPARQL. So what is SPARQL?, and why do you need it?

See full entry

Digging into Wikidata

Апублікавана карыстальнікам tchaddad 30 Ліпень 2019 на мове English

The last post explained some of the background on the use of Wikipedia page links and other information in Nominatim. This post covers looking into Wikidata as another source of information that may be of use. Wikidata logo Wikidata is the knowledgebase of the Wikimedia foundation. It was founded in 2012 with the goal of collecting factual data used in Wikipedia across all languages. The project is maintained by over 20,000 active community contributors.

The Wikidata repository consists mainly of items, each one having a label, a description, and any number of aliases. Items are uniquely identified by a Q followed by a number, such as:

Statements describe detailed characteristics of an Item and consist of a property and a value. Properties in Wikidata have a P followed by a number, such as:

Properties can point to values, such as:

Or properties can point to values that represent other concepts, such as:

See full entry

Wikipedia Deep Dive

Апублікавана карыстальнікам tchaddad 29 Чэрвень 2019 на мове English

The GSoC project to add Wikidata to Nominatim is now well underway. This post will focus on the first phases that have centered on updating Wikipedia extraction scripts in order to build a modern Wikipedia extract for use in Nominatim. English Language Wikipedia logo

A little background

Why is Wikipedia data important to Nominatim? What many OpenStreetMap users might not know is that Wikipedia can be used as an optional auxiliary data source to help indicate the importance of OSM features. Nominatim will work without this information but it will improve the quality of the results if this is installed. To augment the accuracy of geocoded rankings, Nominatim uses page ranking of Wikipedia pages to help indicate the relative importance of OSM features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to a Wikipedia article for a given location. If two places have the same name and one is more important than the other, the Wikipedia score often points to the correct place.

See full entry

Adding Wikidata to Nominatim

Апублікавана карыстальнікам tchaddad 24 Травень 2019 на мове English

OpenStreetMap-Wikidata Semantic Bridge, credit:Krausst

Nominatim is the geocoder / search engine that powers the search box on the main OpenStreetMap site. The software indexes all named features in OSM and certain points of interest, and assigns ranks of importance scored between 0 and 30 (where 0 is most important). To augment the accuracy of rankings, Nominatim also uses page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.

See full entry