I’ve just counted some statistics on a planet file from 14th of October. Here it is:
This table shows a number of nodes, both tagged and untagged, that are referenced by ways and relations. You can see that nearly 97% of 3.5 billion nodes are untagged, and most of these — 88% — are part of exactly one way or relation. Like, when you trace a building, you add four untagged nodes that are part of that closed way.
98.4% of all nodes are part of something, but only 12% (424 million) have two or more parent objects. This could help with designing a data storage for nodes.
There are equal amount of tagged nodes that are not part of anything, and part of an element. Interesting are these 9 million tagged nodes that are part of two or more ways. The taginfo says there are 2.5 million crossings and 860 thousand traffic signals, so that’s a ⅓ of that.
Finally, we have a million of nodes with no tags not being a part of anything. I wonder when someone puts on their OSM saviour cape and a programmer’s hat and rids us of these.
讨论
ff5722 于 2016年10月18日 22:20 的评论
I haven’t bothered to learn overpass syntax yet, but I found these two scripts;
Combining these should give all nodes without tags and not part of a way…
SimonPoole 于 2016年10月18日 22:39 的评论
The redaction process created a large number of orphan untagged nodes, typical example of that happening would be when a road was redacted away, but the nodes not (because they where created / moved by somebody that accepted the CTs). As a result the nodes may still have residual geometry information (by how they are arranged) and should only be removed when that aspect has been checked.
The other source of such nodes are naturally (broken) imports, unluckily there is no penalty for not cleaning up after you have messed up.
ImreSamu 于 2016年10月18日 22:57 的评论
The Taginfo version : http://taginfo.openstreetmap.org/reports/database_statistics
few days later : ( 2016-10-18 00:58 UTC )
Zverik 于 2016年10月19日 08:23 的评论
Thanks Imre, I didn’t know Taginfo had that statistics. I did this because of the number of references though.
Simon, thanks for reminding of the redaction, I forgot how many orphaned nodes it left. Of course my last remark about removing these is sarcasm: I certainly do not want for anybody to do mass-deletions.
ff5722, nice scripts, thanks for sharing!
SK53 于 2016年10月20日 12:47 的评论
Only a million lonely nodes seems quite small by older standards. When Cadastre first came out I was cleaning up a hundred thousand or so at a time. Matt (zere) used to have a duplicated node map too which was a big problem particularly with TIGER, NHD & landuse imports in the US (more or less until ogr2osm fixed most of those isseus).
ianlopez1115 于 2016年10月21日 08:00 的评论
@ff5722, I did a bit of research and some tweaking based on previous examples, and here’s what I was able to come up: an overpass query looking for nodes without tags not belonging to ways or areas here.