OpenStreetMap 로고 OpenStreetMap

Finding SEO spam in OSM

Friendly_Ghost님이 English로 2023년 6월 18일에 게시함. 최근 2023년 6월 23일에 업데이트됨.

After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.

Introduction

People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.

An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.

I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the description tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.

Image 1: Distribution of the results in Overpass Turbo Image 1: Distribution of the results in Overpass Turbo

Results

Many businesses are properly tagged apart from the questionable description, and for those it’s a quick and easy process to delete the description and move on. Example

The more complicated cases in be categorised as follows:

Places that are missing a main feature tag

Some businesses are not tagged as anything, but instead they just have a name, address, description and with some luck a website. A close inspection is needed to figure out how to tag these businesses. Example

Descriptions with additional information to tag

Some descriptions contain both the unwanted spam and some useful information. It might be a hotel that offers free wi-fi, a pet-friendly café or an insurance company that mentions its phone number in the description tag. With enough understanding of OSM’s tagging practices it’s possible to turn this SEO spam into proper tags like internet_access, dog=yes or (contact:)phone=*. Example

Chaos

Sometimes a person manages to fit too much SEO spam into an OSM object. There might be emojis, inviting messages in bold text, names fully capitalised, 15+ payment options, cuisine tags packed with all the drinks that are offered, an image that’s just the logo, website tags that lead to a review page and address tags that link to Google Maps, all on top of the usual shenanigans. There is no way to speedrun a cleanup of these objects; they need to be inspected one tag at a time. I have seen too many of these and am now contemplating a position as monk at the nearest monastery. Example

False positives

Sometimes buzzwords like “famous for” and “the best” are not intended to allure potential customers, but are somehow part of neutral descriptions of places. I saved the IDs of the false positives I found to exclude them from the query. Example

Editing

To edit the tags of these objects I mainly use Overpass Turbo in conjunction with the OSM tags editor, which is an extension for Google Chrome that lets you edit tags with a minimalistic UI directly on osm.org. My main considerations are speed and simplicity, but for more versatility, like the ability to remove duplicate POIs or to have a validator tool, it pays off to choose JOSM or iD/Rapid instead.

MapRoulette

I have created MapRoulette challenges to ask for help with reviewing and removing business descriptions. So far, some helpful mappers have removed roughly 700 unwanted descriptions globally. These challenges only feature nodes for now. I just uploaded a new version of the challenge here.

Conclusion

OpenStreetMap is becoming an increasingly interesting medium for firms to make their presence known to the world. We generally welcome their contributions to the map, but since these people usually don’t return to OSM after their initial effort to map their businesses, we need to have a good look at their work to assure that it meets community standards. I am taking a deep dive into the descriptions they add, and after I worked my way through hundreds of them I can conclude that there is a lot of room for improvement, either through removing spam or through converting it to other useful tags. As with everything else in OSM it is an effort to which anyone can contribute.

Congratulations for making it to the end of my essay. Thank you for reading this.

P.s. I created a forum thread in which we can discuss this topic.

이메일 아이콘 Bluesky 아이콘 Facebook 아이콘 LinkedIn 아이콘 마스토돈 아이콘 텔레그램 아이콘 X 아이콘

토론

2023년 6월 18일 21:11Endres Pelka님의 의견

Businesses that stuff their advertising rubbish-text anywhere they can, even on OpenStreetMap, are not trustworthy in any aspect. Their object on OSM might be misplaced many kilometers, or the business might already be bankrupt or moved somewhere else, long before we notice the dubious edit.

I’d just delete such objects right away (preserving the address or building outline, if it looks plausible). If the business cares, they would respond then. If not, why bother?

2023년 6월 18일 22:49ivanbranco님의 의견

Cool query! Some of it could be a cool Osmose check imho

2023년 6월 18일 23:28b-unicycling님의 의견

Thanks for taking the time to put so much effort into it.

2023년 6월 19일 03:56adreamy님의 의견

What a wonderful article.
Please introduce to Weekly OSM.

2023년 6월 19일 08:48快乐的老鼠宝宝님의 의견

I usually remove the promotional information from this element because it is shameful to advertise your business through a community that is editable by everyone, it should only retain the most basic and neutral informations like name=* and phone=* (or etc.)

2023년 6월 19일 15:33Glassman님의 의견

One easy method of finding SEO Spam is to review new users contributions. Since the main culprits seem to limit their changesets to one or two, they show up as a new users. Around me, they show up with a changeset comment of updated and have a username much like the business being added. I would encourage everyone to start welcoming new users in their area using the Welcome tool.

I would also like to thank user_53959 for their world wide work of cleaning up SEO Spam.

When the changeset has many errors, I usually just revert the edit. Otherwise I try to fix minor issues.

Over the years I’ve tried contacting businesses to find out who is adding their business to OSM. So far no luck.

2023년 6월 21일 16:57Kai Johnson님의 의견

Nice work! I don’t see much of this spam where I’m mapping, but I’ll keep an eye out for it!

2023년 6월 21일 22:05Friendly_Ghost님의 의견

Thanks for all the nice comments and for your perspectives on the subject.

2023년 6월 22일 06:24Msiipola님의 의견

I tried to open the Overpass query, but got an error. Maybe you could copy the query text here? I assume it must be adjusted for a specific countrys language.

One problem with these POI’s is how to verify if the business is still there? If can’t verify it yourself by visiting the address, you can try to google. But if google doesn’t return anything useful, should you delete the POI or not?

2023년 6월 22일 15:24Friendly_Ghost님의 의견

I have no idea what happened to that link, but here is my latest version: https://overpass-turbo.eu/s/1woJ

2023년 6월 22일 15:25Friendly_Ghost님의 의견

Also, you’re not allowed to use Google as source for OSM mapping because of copyright.

2023년 6월 22일 17:17Msiipola님의 의견

About Google, you can see if the businesses is till going by looking at the businesses pages. Are they updated or are they several years old? Is the adress given there same as in OSM etc.

2023년 6월 23일 12:26hfs님의 의견

Great initiative!

I think you can remove amenity=telephone as false positives. Someone copy and pasted the same description to 1500 (!) public telephones in Germany which contains “comfort” and matches the query.

2023년 6월 23일 12:38Friendly_Ghost님의 의견

Thank you! I updated the query to exclude these results.

2023년 6월 25일 15:22Nearby0051님의 의견

Interesting Would it also be possible to output and then check POIs that were created by a user with the same name as the POI created by them?

Seems like many businesses sign up using the same name as the POI they end up creating, at least here where I live

댓글을 남기려면 로그인하세요