The TIGER import in the USA is one of the largest and messiest imports in OSM, but of course it is getting cleaned up over time. It can be hard to estimate progress, but one rough metric is the number of nodes and ways that haven’t been changed since import. This number goes down over time. For the past 3 years, I’ve been tracking these numbers in a spreadsheet.
- For nodes, it is the last-modified-by for accounts ‘woodpeck_fixbot’ and ‘TIGERcnl’
- For ways, it is the last-modified-by for accounts ‘bot-mode’ and ‘DaveHansenTiger’
Over the last 3 years (since June 2015):
- Nodes have decreased from 139.6 to 127.6 million (average 11k/day)
- Ways have decreased from 8.00 to 6.08 million (average 1688/day)
What’s remarkable to me, as you can see from the trendlines, is how steady the rates are. At this rate, all of TIGER won’t be cleaned up (or at least touched) for another 31.7 years (for nodes) or 9.9 years (for ways).
Discussion
Comment from iandees on 19 June 2018 at 16:45
There’s plenty of TIGER data that is fine as-is, and this measure won’t take that into account. Maybe a bit of spatial analysis could be done to mark TIGER nodes and ways as “touched” if other activity was made in the immediate vicinity (within a few dozen meters)?
Comment from SimonPoole on 19 June 2018 at 17:57
While I would agree with iandees that there is lots of “good” TIGER (aka geometry and topology more or less correct), even good TIGER didn’t have speed limits, turn restrictions, lane tags and so on, so we should expect even good TIGER to get touched at some point in time.
The other side of the story is that we know that a fair bit of the data is nonsense, unluckily the hoarding disorder that often manifests itself around imports is stopping us from taking the sane actions to clean up those bits.
Comment from iandees on 19 June 2018 at 18:10
By “sane actions” you mean “delete everything and remap”?
Comment from SimonPoole on 19 June 2018 at 19:38
I was thinking more of removing stuff that is silly: as in doesn’t have any, even residual, value, like the masses of residentials in rural Texas where there is literally nothing. For example any untouched residential without a name should a deletion candidate (that could likely be further refined by excluding counties that had already been part ot the TIGER improvement program).
Comment from Omnific on 20 June 2018 at 01:15
I think the burndown for nodes is pretty unimportant except in certain states/counties (looking at West Virginia).
However, 10 years is still a while to someone just to touch all the ways.
Comment from Mateusz Konieczny on 23 June 2018 at 08:56
“like the masses of residentials in rural Texas where there is literally nothing” - is it doable by an automatic process?
Because manual mass-purging of faulty imported data, done manually seems to be accepted.
At least I made some edits like that based on aerial images. I do it rarely, as I try to limit armchair-edits and I live on a different continent but so far nobody complained.
Comment from Mateusz Konieczny on 23 June 2018 at 08:59
“untouched residential without a name should a deletion candidate” - so someone would still need to look at it…
Is there an Overpass Query to find such candidates? I am sometimes in a mood to go through faulty imports and delete stuff :)
Comment from SimonPoole on 23 June 2018 at 10:07
Note I wasn’t proposing to automatically delete anything, but simply having less qualms about stuff that is clearly nonsense (and could be included say in a maproulette task or similar for relatively fast review).
Comment from Mateusz Konieczny on 23 June 2018 at 10:48
I see nothing wrong with mass deleting low or zero quality imports. In fact I consider it as a very valuable edits.
And I encourage everybody to do that, there is quite hard to find something more frustrating than editing in area spammed by useless data.
I encountered
Cases mentioned above may be still present elsewhere but in other places as I deleted all of them. In some cases where imported data was useful I used it, in some cases (barcode forest) I simply deleted everything and remapped from scratch.