Summer has come to an end, and so this post is to wrap up the progress made over the course of the “Add Wikidata to Nominatim” project. Overall, the main contributions are documented in the 4 preceding diary posts, and in:
- updated steps for extracting Wikipedia data and calculating importance scores
- a new script for extracting Wikidata items and place types
These new processes have made big improvements in several OSM-to-Wikipedia comparison metrics as compared to equivalent numbers from 2013 (when the previous Wikipedia snapshot was taken).
Improved Numbers
For context, the number of Wikipedia articles in the top 40 languages in 2013 was 80,007,141, and the number of Wikipedia articles for the same 40 languages in 2019 was 142,620,084 - an increase of ~78%.
Within these article records, in 2013 it was possible under the old processing steps to attach latitude and longitude numbers to 692,541 articles, while in 2019 it was possible to enrich 7,755,392 records with location information - an increase of ~1,020%. This particular statistic largely reflects an improvement in the source Wikipedia / Wikidata projects.
More exciting, with the old method of linking Wikipedia articles to osm_ids, it was possible to link 313,606 Wikipedia article importance scores to osm_ids, but with the new method that uses both Wikidata item ids, and Wikipedia pages together, the number of Wikipedia article importance scores that can be linked has risen to 4,730,972 - an increase of ~1,409%. This increase is due to both the large number of Wikipedia and WIkidata tags added by OSM contributors since 2013, as well as the inclusion of Wikidata item ids in the linking process for the first time via this project.
Future Work
Although the project technically concludes today, there are obviously always areas of future work where more gains can be made. These include:
… 查看完整日记文章