OpenStreetMap logo OpenStreetMap

Post When Comment
How do we streamline the import proposal/data quality assessment flow?

I like the general idea of what you are talking about. However, it’s a long journey implementing it: sometimes not merely lack of tools, but how information in different data silos have different ontologies to describe it’s data (not viable automation at all). On the good side: if focusing in one area (such as what can be added to OpenStreetMap) it may reduce a lot of more philosophical problems (like how to categorize abstracts concepts like humans or organizations) , but at same time plotting the data into a map somewhat “allow see” error not obvious on more abstract categorizations.

If you are not yet, please join groups both inside OSM and Wikidata related to terms like ontology, schema, RDF, “ OSM tagging”.

  • https://m.wikidata.org/wiki/Wikidata:WikiProject_Ontology (click the invite link “Telegram Group”) - this is more focused on Wikidata ontology, but worth reading. Really challenging structure information
  • https://meta.m.wikimedia.org/wiki/Wikimaps_User_Group (click the link Wikimaps Telegram Group) - this one have both people from Wikipedia and OpenStreetMap
  • On OpenStreetMap, I’m unsure if there’s any group focused on more Ontology/schema, but most people interested in “OSM Data Items”,  who talk about RDF or who do data integration with Wikidata are good starters.

(Also feel free to add me on telegram, it’s the same username I use on OpenStreetMap, just cite you are from this post)

On Notes and Local Knowledge

I tend to look for OSM Notes too. After some time, most which can be solved without very specific knowledge (not merely live near, but know the specific place in person) tend to be solved.

Also, some notes could be solved remotely (obviously, be in person would speed up, but can be done by open data and news reports), but they are mostly additional notes such as (soon this building will be finished, for a building that is notable enough to be noticed in some website).

And yes, I do agree (this I already do myself) with your idea of avoid adding notes that require local knowledge if I do not plan myself in visit the place. The amount of notes which could be created (by remote mapping) would be far higher than the viability of close them.

PS.: (for the author and other readers of this diary) which tools you personally use to find OSM Notes? I will start: I myself use iD with notes layer opened, however I also use the https://ent8r.github.io/NotesReview/ for search notes in tabular format, in special sort by more recent ones

Early feedback welcomed: open source tool for Spatial Data Matching with OpenStreetMap Schema

I attempted to load the entire alltheplaces datasets into the app, and it worked. 2024-08-06_alltheplaces-run.png

Some of the geojsons from the dump were ignored (I mention a bit more on https://github.com/fititnt/spatial-data-maching/issues/1#issuecomment-2272324591) because could not be parsed.

BTW, I see that my comment is the first one - have you submitted this to OSM Weekly/created OSM forums thread/maybe posted on imports mailing list?

Yes, it was mentioned in one of the OSM weekly (likely they read the diaries). But other than that, since I was busy at the time I published, I didn’t mention it anywhere.

Also, this kind of subject is very, very niche, so I’m not surprised. Also, note the following: while (at least in theory) the interface doesn’t need knowledge of programming language, create the files to to load on it do require some help (for example, have they created on demand and available somewhere (like not only you do with HTML versions).

If we ignore GeoJSON that could be exported from Overpass, almost any dataset that could be compared/conflated into OSM do need preprocessing. The approach I was trying to make more friendly was to implement a CSV importer format, however even this would need the user already name the columns as close as the same column names of other files (in case of comparing to Overpass, the user would need to convert it to OSM schema).

In the big picture, I think over time this SDM tool (or any fork of it if I get inactive) could be improved over time to allow easy to use preview (without for example needing to install complex tools). However, it necessarily need both other tools (or public generated datasets) as input, and whatever is programmed on it as export MUST be usable on editors.

Early feedback welcomed: open source tool for Spatial Data Matching with OpenStreetMap Schema

Oh, sorry for the delay. I missed the notification by mail. The license already was AGPL (trivia, Inspired by Overpass and some repos from Matheuz) and the all-in-one-html page could be simply downloaded, however I also added dedicated repository at https://github.com/fititnt/spatial-data-maching .

You can either use GitHub issues or this diary (in particular the next few days I will check for updates here).

By the way, I discovered a bug with the asynchronous loading process (not the total number of elements, but if divided in many files, like AllthePlaces dump). I will see if I can fix this soon, then reply to other of your comments.

Some quick comments upfront

I definitely prefer to not invent the wheel so I went here to review what exists already and hopefully use this tool or contribute to it.

(From https://codeberg.org/matkoniecz/improving_openstreetmap_using_alltheplaces_dataset/issues/8#issuecomment-2121177 ) I suspect that it is not ideal as ATP-OSM matching needs to be more advanced than merely based on location or 1:1 name matching to work acceptably

On a quick look on the AllthePlaces, it’s less likely to have detected non 1:1 relationships than the datasets I’ve been using (I really got stuck on how to design a preview of complex links). I believe in next weeks I will do another round of updates.

Considering the title of the issue #8 (“review data for cases where Maproulette is potentially viable #8”) I’m openly interested in making it easier to be compatible with other tools (especially if already related to data import/conflation). However (even in the long term) making the same codebase able to edit in some sort of “microtasking mode” (e.g. decide concepts one by one) would be better done by other tools (even if something specific does not exist yet). The closer to this I think maybe would be interesting to do in future would export files with documented conventions on how to read/write (generic GeoJSON alone is insufficient), and the final result if the apps don’t know how to upload, I could make it a conversor to .osm /.osc file.

The text of first topic on #8

(…) (say, convenience shops are not plausible to map based on aerial imagery - but maybe there is way to detect cases where there is recent Bing Street Side or Mapillary imagery for them…)

I thought something similar (maybe inspired by an issue on iD repo that mentions this) however I didn’t investigate how to get locations that have street imagery. I agree that this is highly reusable. Please ping me if I get dataset with this kind of information. I could implement a filter (or document better the existing one) on pre-select values that are near something (just another specific file with this data). So let’s say, the matches for data already not on OpenStreetMap (but near positions of an dataset that represent existence of street imagery) should be displayed/exportable for manual human check, so help with focus

The feasibility of automated suggestions is not already 1:1, unless the reason follows some predictable pattern and happens in a lot of repeated cases, the time spent on automating may be higher than manually fixing. I saw myself several times more using previous versions of this visual tool for quick search than thinking how to focus on specific hardcoded strategies (also different data might return different optimizations) where users change the algorithm.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

(which is not the case here, web frameworks are unlikely to be exactly identical and have different features, etc…).

Yep. This part is relevant. Simplistic benchmarks such as output static content can vary more between frameworks of the same language than if you get a typical framework between two languages (and make a reasonable effort to optimize how to run in production). Both Django (Python) and rails-api (Ruby) as example, are both the more popular between developers, but on benchmark are the slowest:

The typical production use of web applications always has some kind of web framework (even if minimalistic) in special if it is interpreted (which is the case for python and ruby, but may be less C, Go, Rust, etc). And one reason for this is because the alternative to not use any framework means not only the developer, but contributors also have knowledge of how to safely implement things such as authentication, session management, tokens, etc etc etc. It also means to document (and keep up to date) the code conventions on how to organise the code.

I tried to find comparisons between the two frameworks and the difference is small (maybe there’s too few comparisons, however the ones I found, the Ruby on Rails a bit faster than Django on the over simplistic tests). So, I think the idea of assuming a full application written in modern python necessarily is faster than modern ruby may not be significant at all the more features are added. While the current version of his code is not using Django, even if we use as baseline any python framework with more performance, the more features he adds, the less prominent the difference will be.

That is not true. There can be massive differences in speed between programming languages (…)

Trivia: at least on https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/ruby-python3.html (even without web framework, access to database, etc) part of ruby algorithm (using ruby 3.2.0 +YJIT) implementations are more efficient than the python equivalent.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

the benchmark is mostly measuring docker overhead, not the ruby code in production.

I cannot confirm this. When running the Rails server in development mode, you will notice a similar increased runtime even without Docker in place. This is expected behavior as mentioned before. Rails developer mode is not suitable for performance testing.

Hummm. So the benchmark is running both inside Docker and development mode.

One reason to me feel strange such difference is because in general the same algorithm would give similar performance across programming languages which are similar (e.g. interpreted vs interpreted), so unless any alternative is doing more work, a ruby vs python using more recent versions likely would have similar results. So I assumed it would be docker.

By the way, this benchmark is doing something too simple (static page). As soon as it start to work with real data, a heavy part of the work will be from the database (which I assume will be the same for ruby and python), which means any performance difference is more likely to be smaller. And, if not smaller, it might be easy to optimize the queries in the rails port.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

There’s at least one flaw in the methodology: the benchmark is mostly measuring docker overhead, not the ruby code in production.

In cases such as fast running operations, this overhead becomes significant.

But I can understand this might be the first time you are doing such type of benchmarks, so it’s okay make this mistake.

Help purchase 1:50k Topographic series of Swaziland?

Bump! (there’s no like here, so I will leave a comment)

Perhaps this type of map may be unusual among mappers more accustomed to aerial images, but they are useful!!

Here in Brazil I often use the “Cartas Topográficas do Excército Brasileiro”. Although it is outdated (some parts as old as 30 years, not as easy keep updated, because it requires field survey), there is a lot that does not usually change (although it still needs to be reconfirmed with more sources, such as aerial image). Also, sometimes aerial image is not as evident what a feature is, but with these kind of maps, helps a lot.

So, yes, this kind of map, is useful!

Numeração 100%

E reparei que você está até colocando operator:wikidata=, por exemplo osm.org/way/1217672075 . Que lindo isso!

Olha, além de queries do Overpass, alguma coisa também tem as queries na Wikidata no https://www.wikidata.org/wiki/Wikidata:WikiProject_Civil_Defense. Os de polícia no Brasil estou usando essa aqui https://www.wikidata.org/wiki/Wikidata:WikiProject_Civil_Defense/List_of_law_enforcement_agencies/Brazil

No futuro posso tentar ver alguma forma de facilitar interligar melhor as coisas. Mas em geral, é bem mais manual por ponto exato na OpenStreetMap do que seria na Wikidata. Inclusive na OpenStreetMap a costuma ficar mais óbvio erros de posicionamento do que seria na Wikidata (que por exemplo aceita tanto sem endereço/coordenadas, como pode aceitar apenas endereço no modo texto).

Numeração avenida angelica

Olha, se voce tiver telegram, me adiciona nele e diz que me conheceu por esse link (meu usuário lá é fititnt). Também pode usar mensagem direta aqui pela OSM. Estou dando uma ajuda nessas coisas.

Mas sobre o seu comentário, então. Tem coisa que talvez tenha como ter ajuda, mas mesmo que seja para aprovar (uma olhada rápida um por um) as coisas relacionadas a adicionar endereço ou referencia a algo importante, são meio manuais.

Por exemplo, se endereço já estiver quase perfeito, tem como agilizar por algo que uma fonte externa diz que está naquele endereço. Mas as vezes, o endereço é um prédio, mas o que um humano colocaria seria um ponto dentro do prédio (em vez de editar mais tags no prédio).

Nos EUA, que eles trabalham bastante com PoIs, eles preferem deixar as amenity separada dos prédios pois facilita muito.