Finding SEO spam in OSM
Написано от Friendly_Ghost на 18 юни 2023 на English. Last updated on 23 юни 2023.After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.
Introduction
People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.
An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.
I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the description
tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.