With the new release of more than 59 million points of interest (POIs) from Overture, consisting of Microsoft and Meta POI datasets combined, the natural question arises: how can this be useful for OpenStreetMap?
Challenges to consider
The most important challenge in getting this data into OSM is making sure the place labels in Overture have an equivalent in OSM. This is mostly doable with automation, but many cases require context.
Validation of these is a forthcoming challenge: street-level imagery from Mapillary will be especially helpful, but being there in person to validate is also a big advantage. That aside, even if the data can be added to OSM one-by-one (not imported) with validation, the tags need to have a proper format.
Loading up the data to analyze
I got started by referencing Feye Andal’s great and succinct guide on viewing the data in AWS Athena. I found a slight lack of clarity in the instructions: you need to make sure your Athena instance, and your S3 bucket where queries are saved, are on us-west-2 region, same as the Overture dataset, unless you copy the dataset first to a bucket in your other region. So make sure the regions are the same, and the instructions should work flawlessly!
Analyzing the data
Exploring the dataset, there are 1037 unique place labels in it. 86,000+ are structure_and_geography
which can refer to a wide range of natural geography or built structures in OSM, difficult to match with any specific tag without context. Others translate directly, such as a laundromat.
Some example tags include: "forest", "stadium_arena", "farm", "professional_services", "baptist_church", "park", "print_media", "spas", "passport_and_visa_services", "restaurant", "dentist"
To get most of the tags matched, I used Python to import the OpenAI module, and connect to my OpenAI account, which charges a few fractions of a penny per request.
I set a system message, which defines the role the AI should play or assume. My message was:
… 查看完整日記項目