baditaflorin의 일기

Postigs question - find out the unnecessary points that exist on the map

baditaflorin님이 English로 2015년 9월 16일에 게시함.

question

My hypothesis is that we could reduce the planet file with 10-100 MB by only removing the unnecessary points that exist on the map.

I am trying to figure it out the correct postgis query to find out exactly this.

I am trying to compare the points of a linesting, and if the degrees between 3 adjacent points is 0 degrees, that means that the point in the middle can be deleted.

The only check in place that i see is to check that the point in the middle is not connected to a point and check that the hstore tag of that node is empty, meaning that there is no value added to the node ( pedestrian crossing, motorway_jucntion, exit_to, etc )

토론

2015년 9월 16일 08:53에 gileri님의 의견

Good idea, sounds doable !

However watch out, some people hate automated changes, and would revert your edit, even if done perfectly. So in order to avoid having your work thrown out by such persons, maybe you could add do a map to display such nodes and allow their manual deletion, even if this means slowing the correction process.

2015년 9월 16일 09:21에 baditaflorin님의 의견

I don`t want to do it, especially in a automated way and without consulting the community. I had not find yet a way to import the whole planet, only part of the planet

But i want to have the proof of concept, to show.

Example 53405432 nodes could be removed, saving us XX MB , reducing the import time of the planet file by XXX seconds …

2015년 9월 16일 09:48에 SK53님의 의견

I did actually do a calculation on this for part of the USA (TIGER data suffers particularly from over-noding).

To do so you need a snapshot database. Only nodes without tags and only appearing once in the way_nodes table need to be checked. I just compared length of the shortest line from the node to a straight line between the adjacent nodes. I can’t find the code I ran, but certainly there’s great scope for reducing the size of data in the US.

2015년 9월 16일 10:07에 Sanderd17님의 의견

This shouldn’t be done inside the OSM database, but osm editors should make sure this doesn’t happen (by showing a warning when there’s such a node f.e.).

Also, wrt your example, it depends on how you measure angles. To take an extreme example, if you have a straight line from London to Beijing, the shortest distance is through Finland, and not through central Europe. So if you have a centre point somewhere in Ukraine, it might be a zero-angle point on a Mercator map, but it won’t be a zero-angle point when using exact shortest-distance lines. This effect also always play up on shorter lines, so every point in the database will be a non-zero angle according to some projection and a certain precision.

Btw, many data consumers already use the Douglas-Peucker algorithm to remove any points up to a certain precision when processing it. Like OsmAnd compiles their obf files while applying Douglas-Peucker in order to shrink the files.

2015년 9월 16일 10:24에 baditaflorin님의 의견

Thanks, i did not know about the Douglas-Peucker algorithm, i will check it out and see if maybe this can be used for this simple task.

I had used this in the past http://pastebin.com/2kH0mAWG to calculate in Qgis the angles that are more then 50 degrees ,when i wanted to create a map roulette challenge for Romania and

Now, it`s kind of the same idea, but with a different threshold

romanian discussion about the topic

2015년 9월 16일 10:37에 Vincent de Phily님의 의견

In case you hadn’t found it already: http://postgis.net/docs/ST_Simplify.html uses douglas-peucker and you should be able to do the conversion using a simple sql update. Remember to vacuum full before and after if you want to know exactly what space savings this gets you.

2015년 9월 16일 10:39에 Vincent de Phily님의 의견

Hum, thinking more about it, ST_Simplify probably doesn’t pay attention to connected and tagged node, so it’s not that straightforward to use.

2015년 9월 16일 10:40에 SimonPoole님의 의견

Besides that reducing the size of the planet file by such a small amount woudn’t justify anything, removing the nodes creates a new version of the way adding to general database bloat.

Further none of the editors nor the API support downloading elements that have no nodes in the requested bounding box. As a result it makes sense to have not all too far apart nodes on even completly straight ways..

It should be noted that we have massively (as in multple nodes per meter) overnoded ways from imports that naturally can be simplified when detected.

2015년 9월 16일 11:56에 baditaflorin님의 의견

@SimonPoole that is a valid point of view.

Anyhow, i think i should change the initial hypothesis, to the first thing that i am interested, that is, to be able to detect and count this. then, publish the metrics. Then we will know some numbers.

I am trying to develop different metrics that could help us find errors on the map. One of the aims is to be able to detect the overnoded ways from import.

repetitious script

2015년 9월 16일 13:30에 baditaflorin님의 의견

I will leave the code here and try to work on it after i will find some postgis gurus that will be able to help me :P

https://gist.github.com/anonymous/3669b78e0898bde4f638

2015년 9월 16일 18:30에 butrus_butrus님의 의견

Hi!

I’m not sure this is such a good idea. I leave sometimes such points as a preparation for future mapping.

Maybe you can add a condition that the point to be deleted is older than (at least) two weeks?

2015년 9월 16일 19:28에 baditaflorin님의 의견

Butrus sure this can be filtered from the query.

Anyhow, this is a hobby project, i don`t have yet the postgis skills to do it, so no worry

Also, i just want to see the result, i will not act upon it.

When i will be able, i will just highlight the extreme cases, and maybe do a maproulette challenge when a road have more then 10 nodes that can be deleted, so that users can check them

2015년 9월 16일 19:57에 pnorman님의 의견

Doing a way simplification with a threshold of 0 is unlikely yield any practical speed improvements.

It takes about one to two days to import the planet with osm2pgsql, assuming reasonable hardware. The node parsing stage takes about 15 minutes. It’s unlikely to speed up multipolygon-related computations, the slowest part of the import. It won’t speed up clustering or rendering table index creation.

My guess is that there would be no detectable speed increase.

2015년 9월 17일 10:09에 AndersAndersson님의 의견

I don’t like this idea I’m afraid.

Sometimes you leave a node for a not yet mapped crossing road.

You also reduce the trustworthiness of the data. On a straight way without nodes, you don’t know what the real road does between the mapped nodes. But if you have a node in the middle, you know that the road is probably straight, and that is not just a lack of “resolution”.

Data storage and computer speed will increase parallell to a growing database. So I can’t see the problem.

2015년 9월 19일 20:28에 karussell님의 의견

For routing we do the same in GraphHopper import as we only need the junctions and end points. But I doubt this makes or will reduce much of the data as the ways are not always this straight in real world. And even if you want to reduce this which difference via douglas-peucker is acceptable 1m, 0.1m or 0.0000m? Reducing the data if it is not 0.00000m difference will reduce the quality in certain cases.

OpenStreetMap

baditaflorin의 일기

Postigs question - find out the unnecessary points that exist on the map

토론

댓글을 남기려면 로그인하세요