OpenStreetMap 标志 OpenStreetMap

Google Summer of Code 2022

tareqpi 于 2022年五月24日 以 English 发布

Hi everyone, my name is Tareq Al-Ahdal. I am a computer science undergraduate student at Universiti Teknologi Malaysia. Recently, I got accepted into Google Summer of Code 2022 as an open source contributor with OpenStreetMap. I will work this summer on enhancing Nominatim: OpenStreetMap’s geocoding software that enables us to search and find location addresses based on their names and vice versa.

Nominatim is currently using a computed importance value to rank the search results based on the location’s perceived importance. This importance value is derived from the popularity of the Wikipedia article of each location. However, not every location on earth has its own Wikipedia article. As a result, the locations that do not have their own Wikipedia articles will not have an importance value, thereby the ranking of the search results, in that case, is deemed inaccurate. OpenStreetMap has data regarding the number of times users accessed each location on the map. This data is a good indicator of how popular a place is. The aim of my work is to integrate this data into Nominam’s computation of the importance value so that the search results become more accurate which will help the users find the correct places that they are looking for in less time.

I will use this diary to keep you updated about my work. Please feel free to reach out if you have any questions regarding my work or anything else you have in mind.

位置: Taman Tun Dr Ismail, Kuala Lumpur, 60000, Malaysia
电子邮件图标 Bluesky图标 Facebook图标 LinkedIn图标 Mastodon图标 Telegram图标 X图标

讨论

bryceco2022年05月24日 18:01 的评论

If anyone else is curious about the map access statistics referenced here, it’s the number of times each map tile has been served by the tile server: https://planet.openstreetmap.org/tile_logs/

tareqpi2022年05月24日 21:46 的评论

Yes, thank you @bryceco 🙌

mmd2022年05月25日 15:11 的评论

Tiles are mostly served by Fastly CDN these days. IIRC, tile_logsonly includes tiles which haven’t been cached by the CDN and need to be re-rendered. This might result in quite some skewed data. I’d highly recommend to get in touch with @pnorman to discuss those topics before starting actual implementation work.

tareqpi2022年05月25日 22:57 的评论

Ok, I will to him. Thank you @mmd

pnorman2022年05月27日 05:10 的评论

No, the tile logs are from the CDN and are an accurate count of successful requests for tiles. Prior to 2021-04-13 the logs were only from the second layer of the old CDN.

The logs include successful requests on tiles where there were at least 10 requests, and the requests came from at least 3 distinct IPs. Most of them represent real views, but there are some artifacts.

登录以留下评论