Лагатып OpenStreetMap OpenStreetMap

What is an import?

Апублікавана карыстальнікам jremillard 17 Красавік 2018 на мове English

I have been working on some code to detect if a changeset is an import, SPAM, or if it has a tagging error.

https://github.com/jremillard/osm-changeset-classification

Detecting SPAM and tagging errors is pretty straight forward. However, detecting imports is much more challenging. Before I started, I thought I knew what an import was. I was looking for large changesets, that only added 1 or two kinds of data. However, this criteria performs poorly in practice. In OSM many import changesets are not large, also it is not uncommon that the imported data has some hand editing mixed it.

My new definition of an import

An import is any addition to OSM that directly derives from other digital map sources.

Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Абмеркаванне

Каментар ад Glassman у 17 Красавік 2018 у 05:39

If I use the TIGER background image, provided in both iD and JOSM, to determine geometry as well as road name, is this an import?

Каментар ад DevonshireBoy42 у 17 Красавік 2018 у 08:25

What do you want to do with flagged imports? If I do a small import of one village or town and manually check, conflate and edit every building then it lacks the issue that large or automated imports have.

Каментар ад Zverik у 17 Красавік 2018 у 10:19

There are no imports. Import is an invented construct made by Germans to try to keep their map in check. That’s why no matter what algorithm you choose, you’d get tons of false positives and false negatives.

Каментар ад Stereo у 17 Красавік 2018 у 15:07

I think it’s very interesting that the import guidelines don’t actually define what the term means.

Каментар ад Nakaner у 17 Красавік 2018 у 17:24

I agree that the size alone is not helpful. I regularly check my OSMCha filters for changesets with more than 9000 additions and many of them are HOT mappers tracing buildings and uploading them after they finished editing.

jremillard wrote: > An import is any addition to OSM that directly derives from other digital map sources.

I would append:

without or with limited use of ground surveys and aerial/satellite imagery.

Otherwise people will try to define Bing imagery as a “digital map source”. :-)

However, that criteria is difficult to translate into rules a computer can apply. That’s my personal list of criteria to define a bad import:

  • Use of strange tags
  • uppercase tags
  • coordinates as tags
  • no tag is longer than 10 characters
  • no discussion on imports@ mailing list
  • too short time between first posting to imports@ and start of the import
  • obvious and large copyright violation
  • no entry in the imports catalogue, no documentation on the wiki
  • no usage of a dedicated account for imports

Unfortunately, our rules don’t require users to add a tag to the changeset indicating the documentation and discussion of the import. If so, we could look for changesets which look like imports but lack that tags. I would call these tags:

  • import:documentation=<page title at wiki>
  • import:discussed:<mailing_list>=<date of first posting on imports@ mailing list>

Каментар ад Glassman у 17 Красавік 2018 у 18:18

@Nakaner - At a minimum having a tag: import= should be sufficient. Or even the import page url to simplify getting to the page to see details of the import.

I applaud the effort to use software to detect imports. However, we need to be careful. False positives could cause angry comment directed at the editor who did nothing wrong.

Clifford

Каментар ад Zverik у 17 Красавік 2018 у 18:48

that criteria is difficult to translate into rules a computer can apply

Well, this applies to all but the first four items on your list. And the fourth one is questionable.

And you are starting to discuss imports, not their detection.

Again, I am pretty sure you cannot tell a proper import from a regular edit. Regarding the source cirteria, you never know what a mapper used for tracing or tagging, the same as with imports.

Каментар ад Nakaner у 17 Красавік 2018 у 18:56

Well, this applies to all but the first four items on your list. And the fourth one is questionable.

The forth item (I should have written “key”, not “tag) is an easy way to find users importing shape files. As you might know, field names of shape files are limited to 10 characters. Sometimes things go completely wrong and people end up uploading objects with uppercase keys or keys ending with ~1.

Again, I am pretty sure you cannot tell a proper import from a regular edit. Regarding the source cirteria, you never know what a mapper used for tracing or tagging, the same as with imports.

That’s not wrong. I have difficulties and write changeset comments even if I am sure. There are HOT mappers uploading thousands of buildings in one large changeset.

Каментар ад dieterdreist у 17 Красавік 2018 у 22:22

jremillard wrote:

An import is any addition to OSM that directly derives from other digital map sources.

I think this definition has to be extended, because you can also import other information if you are able to assign positions to it (or relate it to OSM objects)

Каментар ад dieterdreist у 17 Красавік 2018 у 22:24

for me an import is adding data from somewhere when you didn’t check every part individually

Каментар ад jremillard у 18 Красавік 2018 у 02:20

Thanks for all the comments!

@Zverik - The vast majority of imports (probably over 95%) are detectable. However, a knowledgeable person that wishes to make the import hard to detect certainly can. Obliviously, it is impossible to know how often this happens.

@Stereo - I agree that the fact that the term isn’t clearly defined is interesting.

@DevonshireBoy42 - I have no plans on what to do with the detector and we will see if it goes anywhere useful.

@Glassman - Pulling road names from Tiger is a kind of import, but it doesn’t need to follow the import guidelines we all know that it is OK because Tiger is public domain. However, pulling road names from google, isn’t ok. For small imports we skip the import guidelines and deal with them by reverting them after the fact if they have problems.

Finally, the word “directly” would exclude tracing over an image layer.

Увайсці каб пакінуць каментар