OpenStreetMap logo OpenStreetMap

Post When Comment
Very fast osm processing in C++

OK, I have a perl script to generate xml constants here:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/makenames.pl

A scheme file here:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/schema.txt

The latest version has a makefile and also I have generated a list of fields:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/OSMAttributes.h

This is just a first version, will need to put more work into creating an optimum recogniser for the schema. It should be possible to generate a lex like structure to process the rest.

but for now, I am doing switches based on the field names.

Now, this version looks up each node reference in the id -> coords table and also outputs the entire names database of the nodes, ways and relations.

it runs in 10 seconds on my computer with a larger version of the osm file with some duplicates where i tried to resolve the missing nodes in the extract file.
real 0m10.667s

for comparison, wordcount needs 5x less.
time wc lint.osm
393773 1974640 30893704 lint.osm
real 0m1.896s

So, it is still fast even though it is doing much more processing.
I think this is a real winner folks.

I am going to make some template classes for the processing of fields and defining structures... here is a start that I have not even compiled :
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/EPANatReg/annotate/head%3A/Field.h

Very fast osm processing in C++

Yes, I am rewriting that perl script in c++ now,
In the end you will be able to define filters on what attributes you want to collect and the get them in a callback.

I dont want to collect any huge memory structure in the parser, the client should be able to do that.

mike

New version of osm2poly.pl to extract from the cloudmade admin borders

The upload is finished :
http://www.archive.org/details/NJ_Counties

Polygon files for NJ ZCTA on the way

Here is the second part!
http://www.archive.org/details/ZCTA_NJ2

Polygon files for NJ ZCTA on the way

You can see the difference between the zcta and the "zip codes"

Here are the ztcas :
http://maps.huge.info/zcta.htm

There are differences that you can see between the two versions.
I understand the panic better now.

But we have to start somewhere!

Polygon files for NJ ZCTA on the way

The first part is finished uploading :
http://ia341335.us.archive.org/2/items/ZCTA_NJ/
07001- 07878

Polygon files for NJ ZCTA on the way

I found a mashup that shows just want I am planning on doing,
http://maps.huge.info/zip.htm

Here are more infos on zipcodes :
http://en.wikipedia.org/wiki/ZIP_Code_Tabulation_Area
http://en.wikipedia.org/wiki/ZIP_code

So if anyone wants to add any information about them, do it there.

mike

Hacking the OSM tools today Osm2PgSql and Osm2Poly

http://zip4.usps.com/zip4/welcome.jsp Here is a nice tool to double check the zip code if there are any questions.

New Host for OSM data , archive.org

I have been playing with qgis, and it looks like there is a feature to create a convex hull based on an attribute value.

So, you could take these attribute values (post codes) and create a convex hull and then compare this to the ZCTA. That would give you a good start because you could compare the areas that have the biggest difference first.

The other thing is that you can flag the nodes and ways that are outside of the ZCTA, that is what I was doing to check them. Maybe other states have more problems with the zipcodes, but NJ looks very stable.

mike

New Host for OSM data , archive.org

I was just following the wiki,
personally I would like zipcode, but
here :osm.wiki/Key:postal_code

New Host for OSM data , archive.org

Yes, of course. In Germany I found power lines, security cameras and trees.
But we still need a better staging system. Why should we just throw it all into a single database. We could have many databases for various layers. This is a design issue. In fact, why do we need a monster database at all? Cannot we deal with lots of small files and a smart editor that commits them in the right way so that we don't need anything more than a smart distributed version control system?

New Host for OSM data , archive.org

I have hacked osm2pgsql so that it imports the data from my feeds :
http://fmtyewtk.blogspot.com/2009/12/osm2pgsql-hack-for-importing-id-ways.html

The data is loaded in qgis.

I will be creating some postgres queries to split up the data and process it. that is at least my plan.

I dont care if the monolithic OSM database stores this data or not. In fact, I think it would be better to keep it separate until we find a better way to add in layers.

Ideally the chunks of data will be usable directly from some GIT repository and we split them into very small but useful peices.

mike

New Host for OSM data , archive.org

yes, Well we will be able to check them all out.
My plan is to create a hierarchy of data, where each region (state) that contains another region(county) and so forth (relations and ways that contain each other).

If we find data that does not match or is crossing the border then it can be split up or marked for being manually fixed

Given a hierarchy of data, then we would match it based on the attributes of the EPA datapoints. Does the county match the county from tiger? Does the zipcode match the zipcode from the census.

The census said that they will not update this data, but we can. Given enough test data (zip code attributes) we can find all the ones that break the model and fix it.

Anyway, there is a huge market for this type of processing and I think that OSM or something like it is the right way to go.

I will not commit this data to osm, but keep the osm files on archive.org

if we get enough updates, we can out them into a git repository...

I am starting to think that the monster database idea is not a very good one anyway..

mike

If it turns out that the zip code from the zcta produces bad data,

New Host for OSM data , archive.org

Yes, I have been looking at the data. There are cases were the boundries are not exactly matching, this will have to all be reviewed.

My idea is to make a program to look for containment hierarchies in the data, this region contains this one and to flag errors...

mike

New Host for OSM data , archive.org

Yes, We have two levels. 3 digit ones and 5 digit ones.
the 3 digits contain the 4 digits.

Of course I can import them.... But I will first send a mail to the list.
mike

EPA Bulk Import

I am going to remove all the data that has not been updated by someone manually.
I am working atm on downloading and processing all the points and will setup a separate hosting for the datafiles.

mike

open letter to the EPA

it is not junk, if you think it is junk then revert the changesets and we dont need to talk about it anymore.

With a modify command, I will modify the data and adjust it. But I think the data is still usable as it is, not perfect but a good start.

mike

open letter to the EPA

I have started a wikipage. osm.wiki/EPAGeospatial please add in your comments there.

EPA Bulk Import

Hi tomh,

I understand you concerns. We will see how the community reacts. I have gotten mixed messages.

mike

Next Project for the EPA and Mine data

Well, of course. I am thinking about just using a standard module.

there are other things to do with these nodes :

1. looking for duplicates (pre existing)
2. looking for out of date information.
3. looking for better ways to render.