h4ck3rm1k3's Diary

Very fast osm processing in C++

Posted by h4ck3rm1k3 on 24 December 2009 in English.

Hi there, I have started a to process the OSM files in c++
here is an example of a very fast processor to look for duplicate coords and IDS
http://bazaar.launchpad.net/~jamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/CheckDuplicates.cpp

For the new_jersey_admin.osm from cloudmade, it processes in 5 seconds
time ./a.out ../new_jersey_admin.osm 2> report.txt
real 0m5.127s

With wordcount: 2.1 seconds (393773 lines)
time wc ../new_jersey_admin.osm
393773 1974640 30893709 ../new_jersey_admin.osm
real 0m2.196s

with xmllint :
time xmllint ../new_jersey_admin.osm > lint.osm
real 0m3.192s

I have started with the algorithm for the processing of ways and creating relations, but will be writing this in C++ instead of perl.
Here is the start, I only tested in on two counties so far :

http://bazaar.launchpad.net/~jamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/WayToRelation.pl

It just looks for points in a row that are shared. The first version was bruteforce, comparing all the points. This one looks ascending in one, and descending in the other. I did not test on many counties because I would like to do this with a large osm file in C++.

In perl, with the sax parser, it takes 1 minute to process the file
time perl testsax.pl ../new_jersey_admin.osm 2> report.txt
real 1m1.225s

http://bazaar.launchpad.net/~jamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/testsax.pl

Merry Christmas,
mike

Discussion

Comment from !i! on 24 December 2009 at 15:37

Wow great! I allways asked myself how much optimization will happen if we use a "real" coding language for OSM :) Very impressiv!

Comment from h4ck3rm1k3 on 24 December 2009 at 15:59

Yes, I am rewriting that perl script in c++ now,
In the end you will be able to define filters on what attributes you want to collect and the get them in a callback.

I dont want to collect any huge memory structure in the parser, the client should be able to do that.

mike

Comment from h4ck3rm1k3 on 24 December 2009 at 17:31

OK, I have a perl script to generate xml constants here:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/makenames.pl

A scheme file here:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/schema.txt

The latest version has a makefile and also I have generated a list of fields:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/OSMAttributes.h

This is just a first version, will need to put more work into creating an optimum recogniser for the schema. It should be possible to generate a lex like structure to process the rest.

but for now, I am doing switches based on the field names.

Now, this version looks up each node reference in the id -> coords table and also outputs the entire names database of the nodes, ways and relations.

it runs in 10 seconds on my computer with a larger version of the osm file with some duplicates where i tried to resolve the missing nodes in the extract file.
real 0m10.667s

for comparison, wordcount needs 5x less.
time wc lint.osm
393773 1974640 30893704 lint.osm
real 0m1.896s

So, it is still fast even though it is doing much more processing.
I think this is a real winner folks.

I am going to make some template classes for the processing of fields and defining structures... here is a start that I have not even compiled :
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/Field.h

Comment from !i! on 25 December 2009 at 09:35

What compiler do you use? What about Multithreading, SIMD,.... and further optimisations?

Comment from h4ck3rm1k3 on 25 December 2009 at 09:54

I have checked in the Makefile, using gcc -O4, have now the bounding box calculation and refactored the classes.
The time make test produces:
real 0m5.927s

here are my system details :
Ubuntu 9.10
ProblemType: Bug
Architecture: i386
Date: Fri Dec 25 10:52:00 2009
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu-Netbook-Remix 9.10 "Karmic Koala" - Release i386 (20091028.4)
NonfreeKernelModules: nvidia
Package: gcc 4:4.4.1-1ubuntu2
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
SourcePackage: gcc-defaults
Tags: ubuntu-unr
Uname: Linux 2.6.31-17-generic i686

processor : 3
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 265
stepping : 2
cpu MHz : 1808.403
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips : 3616.92
clflush size : 64
power management: ts fid vid ttp

Comment from h4ck3rm1k3 on 25 December 2009 at 11:07

Now, to be fair, the osm2pgsql does process the osm files in a very similar way, but it is not intended on being a generic processor.

I am working on processing a much larger file now, the entire state file and also have started looking into the bzip2 processing inline and the sax reader.

We should be able to fetch just parts of a world.osm.bz2 file and process it while downloading (using the blocks as they complete)

But that is for the future, for now I will focus on the county processing.

Well, here are the results of wordcount on the uncompressed NJ file: 18 million nodes, counted in 1.5 minutes.

wget http://downloads.cloudmade.com/north_america/united_states/new_jersey/new_jersey.osm.bz2
bunzip2 new_jersey.osm.bz2

time wc new_jersey.osm
18612601 78918048 1249629139 new_jersey.osm
real 1m21.038s

One thing that I have observed, the processing takes up the entire processor, but only one of four. That is why we need these splitting routines in general so that we can process on mutiple processors easily. Osmosis is nice, but I dont feel confortable with using it, it is pretty complex. Ideally

Now, just running my county extractor on that takes a long time. I need to find out why....

mike

Comment from h4ck3rm1k3 on 25 December 2009 at 13:15

Memory leaks using xerces transcode. It was making copies of all the strings. Have replaced/removed all the unneeded copies of the strings for comparison and parsing of ints.

I have used valgrind to debug the memory leaks. The problems are removed. Also I turned off the reverse lookup by the string of the Point and removed it from the memory representation. Will look into using a rtree or a quadtree for that later.

mike

OpenStreetMap

h4ck3rm1k3's Diary

Very fast osm processing in C++

Discussion

Log in to leave a comment