Shape file to minimal ways, preparation for the relations
Posted by h4ck3rm1k3 on 30 December 2009 in English.WaysToRelation program:
This new tool, in c++ with a two pass high speed, low memory sweep of the xml using sax.
It does not produce the relations just yet. All I have to do now is add each way that I process in the second pass to a relation. I build up already a vector of nodes and each node is looked up on a map of nodes by coords and id.
The ways are not stored longer than to process one.
The array of nodes is needed to collect first the tagged attributes.
Tags each node as to how many ways reference it, and declares a way as owner, and also points duplicate points based on the location to the first declaration of the node.
I am thinking about how to do this all in one pass, because it should be possible to process each way the first time. When a way is split, the relations are created. When the next way references that node, the relation will have to split that line. that is why we need a second pass. Otherwise we would have to store the ways and split them in memory. I have not considered that, but still you would have to process them again. It could be less processing time than two passes.
The second pass emits new ways based on contiguous blocks of nodes.
The connection to the other paths are also established.
The ways are given new ID numbers, counting down.
Invoked like this:
./WaysToRelations ../nj_zip/new/tl_2009_34_zcta3.osm > tl_2009_34_zcta3.osm 2>err.txt
The file produced is uploaded here to :
time ./WaysToRelations ../nj_zip/new/tl_2009_34_zcta5.osm > tl_2009_34_zcta5.osm 2>err.txt
real 2m0.315s
user 0m48.235s
sys 1m11.832s
time wc ../nj_zip/new/tl_2009_34_zcta5.osm
1,249,669 lines 5,598,205 words 59,986,381 mb ../nj_zip/new/tl_2009_34_zcta5.osm
real 0m3.638s
user 0m3.536s
sys 0m0.040s
So you can see that it uses alot time. 2 Minutes is alot, but 1,2 million lines processed in xml. The resulting file :
http://www.archive.org/details/Tl_2009_34_zcta5.osmSplit
I am uploading to archive.org
wc tl_2009_34_zcta5.osm
3,667,814 lines 13,740,608 words 126,942,198mb tl_2009_34_zcta5.osm
Is even bigger, cause there are more ways, and comments.
The code is checked in here :
Using saved push location: bzr+ssh://bazaar.launchpad.net/~jamesmikedupont/%2Bjunk/EPANatReg/
Pushed up to revision 40.
The dependencies are :
apt-get install libxerces-c-dev
apt-get install gcc
apt-get install g++
I am also using libsdbx from roadnav:
Path: ../libsdbx
URL: https://roadnav.svn.sourceforge.net/svnroot/roadnav/libsdbx/trunk
Repository Root: https://roadnav.svn.sourceforge.net/svnroot/roadnav
to get my code:
bzr branch lp:~jamesmikedupont/+junk/EPANatReg
if you want to update the source :
bzr revert
bzr merge --pull
See also the README and the writeup here:
http://flossk.tuxfamily.org/foswiki/bin/view.pl/Projects/OsmTagger
Discussion