Logo OpenStreetMap OpenStreetMap

Finding SEO spam in OSM

Zapsal Friendly_Ghost 18. 6. 2023 v jazyce English. Naposledy aktualizováno 23. 6. 2023

After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.

Introduction

People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.

An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.

I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the description tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.

Image 1: Distribution of the results in Overpass Turbo Image 1: Distribution of the results in Overpass Turbo

Results

Many businesses are properly tagged apart from the questionable description, and for those it’s a quick and easy process to delete the description and move on. Example

The more complicated cases in be categorised as follows:

Places that are missing a main feature tag

Some businesses are not tagged as anything, but instead they just have a name, address, description and with some luck a website. A close inspection is needed to figure out how to tag these businesses. Example

Descriptions with additional information to tag

Some descriptions contain both the unwanted spam and some useful information. It might be a hotel that offers free wi-fi, a pet-friendly café or an insurance company that mentions its phone number in the description tag. With enough understanding of OSM’s tagging practices it’s possible to turn this SEO spam into proper tags like internet_access, dog=yes or (contact:)phone=*. Example

Chaos

Sometimes a person manages to fit too much SEO spam into an OSM object. There might be emojis, inviting messages in bold text, names fully capitalised, 15+ payment options, cuisine tags packed with all the drinks that are offered, an image that’s just the logo, website tags that lead to a review page and address tags that link to Google Maps, all on top of the usual shenanigans. There is no way to speedrun a cleanup of these objects; they need to be inspected one tag at a time. I have seen too many of these and am now contemplating a position as monk at the nearest monastery. Example

False positives

Sometimes buzzwords like “famous for” and “the best” are not intended to allure potential customers, but are somehow part of neutral descriptions of places. I saved the IDs of the false positives I found to exclude them from the query. Example

Editing

To edit the tags of these objects I mainly use Overpass Turbo in conjunction with the OSM tags editor, which is an extension for Google Chrome that lets you edit tags with a minimalistic UI directly on osm.org. My main considerations are speed and simplicity, but for more versatility, like the ability to remove duplicate POIs or to have a validator tool, it pays off to choose JOSM or iD/Rapid instead.

MapRoulette

I have created MapRoulette challenges to ask for help with reviewing and removing business descriptions. So far, some helpful mappers have removed roughly 700 unwanted descriptions globally. These challenges only feature nodes for now. I just uploaded a new version of the challenge here.

Conclusion

OpenStreetMap is becoming an increasingly interesting medium for firms to make their presence known to the world. We generally welcome their contributions to the map, but since these people usually don’t return to OSM after their initial effort to map their businesses, we need to have a good look at their work to assure that it meets community standards. I am taking a deep dive into the descriptions they add, and after I worked my way through hundreds of them I can conclude that there is a lot of room for improvement, either through removing spam or through converting it to other useful tags. As with everything else in OSM it is an effort to which anyone can contribute.

Congratulations for making it to the end of my essay. Thank you for reading this.

P.s. I created a forum thread in which we can discuss this topic.

Ikona e-mailu Ikona Bluesky Ikona Facebooku Ikona LinkedIn Ikona Mastodonu Ikona Telegramu Ikona X

Diskuse

Komentář od Endres Pelka z 18. 06. 2023 v 21:11

Businesses that stuff their advertising rubbish-text anywhere they can, even on OpenStreetMap, are not trustworthy in any aspect. Their object on OSM might be misplaced many kilometers, or the business might already be bankrupt or moved somewhere else, long before we notice the dubious edit.

I’d just delete such objects right away (preserving the address or building outline, if it looks plausible). If the business cares, they would respond then. If not, why bother?

Komentář od ivanbranco z 18. 06. 2023 v 22:49

Cool query! Some of it could be a cool Osmose check imho

Komentář od b-unicycling z 18. 06. 2023 v 23:28

Thanks for taking the time to put so much effort into it.

Komentář od adreamy z 19. 06. 2023 v 03:56

What a wonderful article.
Please introduce to Weekly OSM.

Komentář od 快乐的老鼠宝宝 z 19. 06. 2023 v 08:48

I usually remove the promotional information from this element because it is shameful to advertise your business through a community that is editable by everyone, it should only retain the most basic and neutral informations like name=* and phone=* (or etc.)

Komentář od Glassman z 19. 06. 2023 v 15:33

One easy method of finding SEO Spam is to review new users contributions. Since the main culprits seem to limit their changesets to one or two, they show up as a new users. Around me, they show up with a changeset comment of updated and have a username much like the business being added. I would encourage everyone to start welcoming new users in their area using the Welcome tool.

I would also like to thank user_53959 for their world wide work of cleaning up SEO Spam.

When the changeset has many errors, I usually just revert the edit. Otherwise I try to fix minor issues.

Over the years I’ve tried contacting businesses to find out who is adding their business to OSM. So far no luck.

Komentář od Kai Johnson z 21. 06. 2023 v 16:57

Nice work! I don’t see much of this spam where I’m mapping, but I’ll keep an eye out for it!

Komentář od Friendly_Ghost z 21. 06. 2023 v 22:05

Thanks for all the nice comments and for your perspectives on the subject.

Komentář od Msiipola z 22. 06. 2023 v 06:24

I tried to open the Overpass query, but got an error. Maybe you could copy the query text here? I assume it must be adjusted for a specific countrys language.

One problem with these POI’s is how to verify if the business is still there? If can’t verify it yourself by visiting the address, you can try to google. But if google doesn’t return anything useful, should you delete the POI or not?

Komentář od Friendly_Ghost z 22. 06. 2023 v 15:24

I have no idea what happened to that link, but here is my latest version: https://overpass-turbo.eu/s/1woJ

Komentář od Friendly_Ghost z 22. 06. 2023 v 15:25

Also, you’re not allowed to use Google as source for OSM mapping because of copyright.

Komentář od Msiipola z 22. 06. 2023 v 17:17

About Google, you can see if the businesses is till going by looking at the businesses pages. Are they updated or are they several years old? Is the adress given there same as in OSM etc.

Komentář od hfs z 23. 06. 2023 v 12:26

Great initiative!

I think you can remove amenity=telephone as false positives. Someone copy and pasted the same description to 1500 (!) public telephones in Germany which contains “comfort” and matches the query.

Komentář od Friendly_Ghost z 23. 06. 2023 v 12:38

Thank you! I updated the query to exclude these results.

Komentář od Nearby0051 z 25. 06. 2023 v 15:22

Interesting Would it also be possible to output and then check POIs that were created by a user with the same name as the POI created by them?

Seems like many businesses sign up using the same name as the POI they end up creating, at least here where I live

Přihlaste se k zanechání komentáře