OpenStreetMap logó OpenStreetMap

Bejegyzés Időpont Hozzászólás
A New File Format for OSM Data

Do you know GeoDesk (https://www.geodesk.com)? I think they are trying to achieve something similar. They also have a custom file format that groups osm-data by region for faster access. But their files are actually bigger than the originating pbf-files, and not smaller as yours are.

A New File Format for OSM Data

Very impressive design of essentially a query-optimized database for OSM data. Thanks a lot for your efforts and making it public!

You currently have an option to compress parts of the file (zip_chunks in the source code), which uses Java’s DeflaterOutputStream. Essentially, this uses the compression also used by gzip. While it is simple to use as it is directly included by Java, the deflate-compression algorithm is known to be pretty slow and have a low throughput. More modern compression algorithms would be ZStandard (https://facebook.github.io/zstd/) or LZ4 (https://github.com/lz4/lz4-java), which are much faster for both compression and decompression, and Zstd might even result in better compression than the default deflate.

While the library might lose a bit of its appeal as it will no longer be self-contained but require some dependencies on other code, I think it might be worth it by becoming even faster and creating even smaller files.

So, the general feedback from my lines above might be: - include some additional bits or bytes in the header for future needs, just to future proof your format - do not only have 1 bit for compression = true|false, but maybe 3 bits (giving values 0 – 7): 0=uncompressed, 1=deflate, 2=zstandard, 3=lz4, 4-7=future use

Using the Oma Library

Very nice, and very impressive from a performance point of view! I think it looks very promising and I love that there is already a Java API/Library for this!

As you requested feedback in your first blog post, here are some of my thoughts as a software developer (only after looking at your posts, I have not yet tested the library):

TypeFilter

r.setFilter(new AndFilter(power_of_town, new TypeFilter(“A”)));
while (true) { … }

In the TypeFilter, what are possible parameter values? A (area), W (way), N (node). Are there others? What happens if I pass a, Q or # or ?? Instead of taking a char as argument, you probably could use an enum with a fixed set of options

Multiple Queries with same reader

In your example, you seem to re-use your reader r 3 times, each time setting a filter and then just calling r.next(). For me, this is a bit confusing. Traditionally in Java, I have a query method or a method returning an iterator, and then I can iterate over the found values (e.g. with iterator.hasNext(); iterator.next().

In your code, the reader has a next() method that seems to automatically reset when you set a filter. But this “reset” is not really visible in the code.

Maybe something the following would be a nicer API design?

OmaIterator iter = r.query(new AndFilter(power_of_town, new TypeFilter("A")));
while (true) {
  Object o = iter.next();
  if (o == null) break;
 ....
}

Although then people might try to run two queries in parallel on the same reader, and I don’t know if that is supported or not.

I think what confuses me is that I don’t see where/when the query happens: I set a filter, and suddenly I can access next to get results. This does not seem intuitive to me (but others might have a different opinion on that)