OpenStreetMap 标志 OpenStreetMap

发布 时间 评论
How do we streamline the import proposal/data quality assessment flow?

I like the general idea of what you are talking about. However, it’s a long journey implementing it: sometimes not merely lack of tools, but how information in different data silos have different ontologies to describe it’s data (not viable automation at all). On the good side: if focusing in one area (such as what can be added to OpenStreetMap) it may reduce a lot of more philosophical problems (like how to categorize abstracts concepts like humans or organizations) , but at same time plotting the data into a map somewhat “allow see” error not obvious on more abstract categorizations.

If you are not yet, please join groups both inside OSM and Wikidata related to terms like ontology, schema, RDF, “ OSM tagging”.

  • https://m.wikidata.org/wiki/Wikidata:WikiProject_Ontology (click the invite link “Telegram Group”) - this is more focused on Wikidata ontology, but worth reading. Really challenging structure information
  • https://meta.m.wikimedia.org/wiki/Wikimaps_User_Group (click the link Wikimaps Telegram Group) - this one have both people from Wikipedia and OpenStreetMap
  • On OpenStreetMap, I’m unsure if there’s any group focused on more Ontology/schema, but most people interested in “OSM Data Items”,  who talk about RDF or who do data integration with Wikidata are good starters.

(Also feel free to add me on telegram, it’s the same username I use on OpenStreetMap, just cite you are from this post)

On Notes and Local Knowledge

I tend to look for OSM Notes too. After some time, most which can be solved without very specific knowledge (not merely live near, but know the specific place in person) tend to be solved.

Also, some notes could be solved remotely (obviously, be in person would speed up, but can be done by open data and news reports), but they are mostly additional notes such as (soon this building will be finished, for a building that is notable enough to be noticed in some website).

And yes, I do agree (this I already do myself) with your idea of avoid adding notes that require local knowledge if I do not plan myself in visit the place. The amount of notes which could be created (by remote mapping) would be far higher than the viability of close them.

PS.: (for the author and other readers of this diary) which tools you personally use to find OSM Notes? I will start: I myself use iD with notes layer opened, however I also use the https://ent8r.github.io/NotesReview/ for search notes in tabular format, in special sort by more recent ones

Early feedback welcomed: open source tool for Spatial Data Matching with OpenStreetMap Schema

I attempted to load the entire alltheplaces datasets into the app, and it worked. 2024-08-06_alltheplaces-run.png

Some of the geojsons from the dump were ignored (I mention a bit more on https://github.com/fititnt/spatial-data-maching/issues/1#issuecomment-2272324591) because could not be parsed.

BTW, I see that my comment is the first one - have you submitted this to OSM Weekly/created OSM forums thread/maybe posted on imports mailing list?

Yes, it was mentioned in one of the OSM weekly (likely they read the diaries). But other than that, since I was busy at the time I published, I didn’t mention it anywhere.

Also, this kind of subject is very, very niche, so I’m not surprised. Also, note the following: while (at least in theory) the interface doesn’t need knowledge of programming language, create the files to to load on it do require some help (for example, have they created on demand and available somewhere (like not only you do with HTML versions).

If we ignore GeoJSON that could be exported from Overpass, almost any dataset that could be compared/conflated into OSM do need preprocessing. The approach I was trying to make more friendly was to implement a CSV importer format, however even this would need the user already name the columns as close as the same column names of other files (in case of comparing to Overpass, the user would need to convert it to OSM schema).

In the big picture, I think over time this SDM tool (or any fork of it if I get inactive) could be improved over time to allow easy to use preview (without for example needing to install complex tools). However, it necessarily need both other tools (or public generated datasets) as input, and whatever is programmed on it as export MUST be usable on editors.

Early feedback welcomed: open source tool for Spatial Data Matching with OpenStreetMap Schema

Oh, sorry for the delay. I missed the notification by mail. The license already was AGPL (trivia, Inspired by Overpass and some repos from Matheuz) and the all-in-one-html page could be simply downloaded, however I also added dedicated repository at https://github.com/fititnt/spatial-data-maching .

You can either use GitHub issues or this diary (in particular the next few days I will check for updates here).

By the way, I discovered a bug with the asynchronous loading process (not the total number of elements, but if divided in many files, like AllthePlaces dump). I will see if I can fix this soon, then reply to other of your comments.

Some quick comments upfront

I definitely prefer to not invent the wheel so I went here to review what exists already and hopefully use this tool or contribute to it.

(From https://codeberg.org/matkoniecz/improving_openstreetmap_using_alltheplaces_dataset/issues/8#issuecomment-2121177 ) I suspect that it is not ideal as ATP-OSM matching needs to be more advanced than merely based on location or 1:1 name matching to work acceptably

On a quick look on the AllthePlaces, it’s less likely to have detected non 1:1 relationships than the datasets I’ve been using (I really got stuck on how to design a preview of complex links). I believe in next weeks I will do another round of updates.

Considering the title of the issue #8 (“review data for cases where Maproulette is potentially viable #8”) I’m openly interested in making it easier to be compatible with other tools (especially if already related to data import/conflation). However (even in the long term) making the same codebase able to edit in some sort of “microtasking mode” (e.g. decide concepts one by one) would be better done by other tools (even if something specific does not exist yet). The closer to this I think maybe would be interesting to do in future would export files with documented conventions on how to read/write (generic GeoJSON alone is insufficient), and the final result if the apps don’t know how to upload, I could make it a conversor to .osm /.osc file.

The text of first topic on #8

(…) (say, convenience shops are not plausible to map based on aerial imagery - but maybe there is way to detect cases where there is recent Bing Street Side or Mapillary imagery for them…)

I thought something similar (maybe inspired by an issue on iD repo that mentions this) however I didn’t investigate how to get locations that have street imagery. I agree that this is highly reusable. Please ping me if I get dataset with this kind of information. I could implement a filter (or document better the existing one) on pre-select values that are near something (just another specific file with this data). So let’s say, the matches for data already not on OpenStreetMap (but near positions of an dataset that represent existence of street imagery) should be displayed/exportable for manual human check, so help with focus

The feasibility of automated suggestions is not already 1:1, unless the reason follows some predictable pattern and happens in a lot of repeated cases, the time spent on automating may be higher than manually fixing. I saw myself several times more using previous versions of this visual tool for quick search than thinking how to focus on specific hardcoded strategies (also different data might return different optimizations) where users change the algorithm.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

(which is not the case here, web frameworks are unlikely to be exactly identical and have different features, etc…).

Yep. This part is relevant. Simplistic benchmarks such as output static content can vary more between frameworks of the same language than if you get a typical framework between two languages (and make a reasonable effort to optimize how to run in production). Both Django (Python) and rails-api (Ruby) as example, are both the more popular between developers, but on benchmark are the slowest:

The typical production use of web applications always has some kind of web framework (even if minimalistic) in special if it is interpreted (which is the case for python and ruby, but may be less C, Go, Rust, etc). And one reason for this is because the alternative to not use any framework means not only the developer, but contributors also have knowledge of how to safely implement things such as authentication, session management, tokens, etc etc etc. It also means to document (and keep up to date) the code conventions on how to organise the code.

I tried to find comparisons between the two frameworks and the difference is small (maybe there’s too few comparisons, however the ones I found, the Ruby on Rails a bit faster than Django on the over simplistic tests). So, I think the idea of assuming a full application written in modern python necessarily is faster than modern ruby may not be significant at all the more features are added. While the current version of his code is not using Django, even if we use as baseline any python framework with more performance, the more features he adds, the less prominent the difference will be.

That is not true. There can be massive differences in speed between programming languages (…)

Trivia: at least on https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/ruby-python3.html (even without web framework, access to database, etc) part of ruby algorithm (using ruby 3.2.0 +YJIT) implementations are more efficient than the python equivalent.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

the benchmark is mostly measuring docker overhead, not the ruby code in production.

I cannot confirm this. When running the Rails server in development mode, you will notice a similar increased runtime even without Docker in place. This is expected behavior as mentioned before. Rails developer mode is not suitable for performance testing.

Hummm. So the benchmark is running both inside Docker and development mode.

One reason to me feel strange such difference is because in general the same algorithm would give similar performance across programming languages which are similar (e.g. interpreted vs interpreted), so unless any alternative is doing more work, a ruby vs python using more recent versions likely would have similar results. So I assumed it would be docker.

By the way, this benchmark is doing something too simple (static page). As soon as it start to work with real data, a heavy part of the work will be from the database (which I assume will be the same for ruby and python), which means any performance difference is more likely to be smaller. And, if not smaller, it might be easy to optimize the queries in the rails port.

OpenStreetMap NextGen Benchmark 1 of 4: Static and unauthenticated requests

There’s at least one flaw in the methodology: the benchmark is mostly measuring docker overhead, not the ruby code in production.

In cases such as fast running operations, this overhead becomes significant.

But I can understand this might be the first time you are doing such type of benchmarks, so it’s okay make this mistake.

Help purchase 1:50k Topographic series of Swaziland?

Bump! (there’s no like here, so I will leave a comment)

Perhaps this type of map may be unusual among mappers more accustomed to aerial images, but they are useful!!

Here in Brazil I often use the “Cartas Topográficas do Excército Brasileiro”. Although it is outdated (some parts as old as 30 years, not as easy keep updated, because it requires field survey), there is a lot that does not usually change (although it still needs to be reconfirmed with more sources, such as aerial image). Also, sometimes aerial image is not as evident what a feature is, but with these kind of maps, helps a lot.

So, yes, this kind of map, is useful!

Numeração 100%

E reparei que você está até colocando operator:wikidata=, por exemplo osm.org/way/1217672075 . Que lindo isso!

Olha, além de queries do Overpass, alguma coisa também tem as queries na Wikidata no https://www.wikidata.org/wiki/Wikidata:WikiProject_Civil_Defense. Os de polícia no Brasil estou usando essa aqui https://www.wikidata.org/wiki/Wikidata:WikiProject_Civil_Defense/List_of_law_enforcement_agencies/Brazil

No futuro posso tentar ver alguma forma de facilitar interligar melhor as coisas. Mas em geral, é bem mais manual por ponto exato na OpenStreetMap do que seria na Wikidata. Inclusive na OpenStreetMap a costuma ficar mais óbvio erros de posicionamento do que seria na Wikidata (que por exemplo aceita tanto sem endereço/coordenadas, como pode aceitar apenas endereço no modo texto).

Numeração avenida angelica

Olha, se voce tiver telegram, me adiciona nele e diz que me conheceu por esse link (meu usuário lá é fititnt). Também pode usar mensagem direta aqui pela OSM. Estou dando uma ajuda nessas coisas.

Mas sobre o seu comentário, então. Tem coisa que talvez tenha como ter ajuda, mas mesmo que seja para aprovar (uma olhada rápida um por um) as coisas relacionadas a adicionar endereço ou referencia a algo importante, são meio manuais.

Por exemplo, se endereço já estiver quase perfeito, tem como agilizar por algo que uma fonte externa diz que está naquele endereço. Mas as vezes, o endereço é um prédio, mas o que um humano colocaria seria um ponto dentro do prédio (em vez de editar mais tags no prédio).

Nos EUA, que eles trabalham bastante com PoIs, eles preferem deixar as amenity separada dos prédios pois facilita muito.

Numeração avenida angelica

Fantástico!

Dados de endereço são muito úteis!

Moving Python scripts to OAuth2

By the way, looking at the https://github.com/Zverik/cli-oauth2/blob/main/src/oauthcli/providers.py is clear there’s some providers.

Did you know if OpenStreetMap Wiki (mediawiki) and Wikidata (the mediawiki/wikibase) have Oauth2? If yes, maybe consider implementing it.

Moving Python scripts to OAuth2

Have things kind of thing already done is helpful, in special if the same programming language the dev like me would likely to do cli tools.

(All my tools still read-only and, if any, they export files to be used with OSM editors)

Numeração avenida angelica

Que legal, tchê.

Como você fez para coletar os números?

헤어질 결심[Decision To Leave] OSM...

Oh, permaneça aqui! Sem problema descansar um pouco.

Você é uma das pessoas mais fantásticas que conheço dentro da OpenStreetMap, e é super ativo. É preciso mais gente como você.

Generalization of extraction of example codes, tabular data and Infoboxes from MediaWikis such as OSM.wiki

Wikitext is only one of the page content models that MediaWiki supports.

Good to know other content models! Maybe I also create some syntax sugar (e.g. instead of raw string , return something else).

But for data-like content, beyond wikitext (in special the tabular data) Wikibase JSON could be abstracted to return at least the labels, which would later be used, for example, for translations. Note that it is complex to convert from/to a RDF-like dataset to other datasets, but the translation part of items might be so common that it could be worth an abstraction.

(…) you can write a wiki page using simple wikitext syntax as long as you avoid breaking several lightly documented tools that place arbitrary constraints on exactly how you write (e.g., whitespace and capitalization) it due to assumptions they make. Writing for the renderer, in other words. ^(new emphasis mine)

Yes, the challenging part of parsing wikitext is exactly this. This is one of the reasons (at least if using the tool to extract data) it is more forgiving for who writes, and strict on what output generates.

(…) I also appreciate your emphasis on reusing existing content without creating extra maintenance overhead. However, we should view this kind of tooling as being complementary to structured data, not in competition with it.

I agree with the complementary. In fact, the implementation cited here is intentionally lower level, without the database part (it does have a sqlite, but for cache requests). The fun part is done outside.

On the reusing existing content without creating extra maintenance overhead, this is really a focus. While the tool is not self-sufficient for a full solution, by making it it focused on the parsing (and allowing be reusable with other wikis) could increase some extra conventions where wikitext alone is insufficient.

Generalization of extraction of example codes, tabular data and Infoboxes from MediaWikis such as OSM.wiki

Humm, interesting the first comment is about how Wikibase abstractes the data! And yes, I found it relevant to explain this internal part, because it really depends on external storage to add the “true” linked data storage.

I mean, when Wikibase stores items data as JSON on a single page, these pages are on a big textarea, so by default, the SQL database cannot really understand its internals. And I mean, it is not even MongoDB where there’s native support for JSON fields.

Another trivia is that if trying to do data mining using MediaWiki with Wikibase API, it’s likely to be item by item (maybe it can allow pre-fetch related, so still very useful) however if somewhat brute force the wikitext (which will be JSON) with vanilla MediaWiki API, then even without special user account (admins or bots) is possible to fetch 50 pages at once. I know this may sound a bit low level, but if we’re talking about synchronizing content, as long as the content stored on the MediaWiki can be exported without need to always work with Full wikidumps available here https://wiki.openstreetmap.org/dump/.

About your comment comparing with VisualEditor, I guess the Wikibase interface is more a form-like interface (enforces some structure, not very advanced integrity check, but does some checks). I have not fully tested the alternatives, but I’m sure there’s other MediaWiki extensions which could enforce a form-like entry, to restrict what users can do. So, the analogy with VisualEditor is not perfect, because the link you passed about the VisualEditor, it still allows more freedom for the user with higher challenges to parse (compared to any form-like interface).

Maybe closer analogy than the MediaWiki VisualEditor, the Wikibase editing page is similar to how iD editor allows users to edit an already well detailed item (depending on the tag, the field changes appearance, suggest different values, etc).

(…) It’s JSON, which explains just how disconnected it actually is to the MediaWiki experience. That’s why it feels so foreign and disorienting, and functions like the completely tacked-onto experience it provides.

I think that from a perspective of a “MediaWiki experience”, even trying to not break the mental flow of editing as text (while still fully machine readable) at least some types of trade offs are necessary. The Wikibase (and any other MediaWiki with a form-like UI) explicitly enforce (sometimes too much, or sometimes not allowing for a full strict validation; I know, both ideals are contradictory) how to add/edit data, but even if we parse wikitext directly, the parser could still benefit from hints (such as suggested filename of a code sample) which might not worth show for the user who only cares about visual text, not metadata.

This part is briefly commented on in the dairy, but both conversion tables (e.g. {{yes}} => true) and explanations of what the parameters on most important infoboxes are may not be in the same page (also, would be too redundant) but would still be in some place (preferable in the same wiki). And the syntax where these instructions may not be possible is only natural language.

But it isn’t a duck.

Yes, I also liked the analogy! But again, “Wikidata” is the project (formal explanation: https://www.wikidata.org/wiki/Wikidata:Introduction, “Wikibase” is an extension for MediaWiki (see also: https://wikiba.se/). So the Wikidata as a project actually is a full linked data. Another interesting fact is that Wikibase (even without triplestore) somewhat still linked data, because it does expose persistent URLs and still fast. So the self description on wikiba.se, “Wikibase is open-source software for creating collaborative knowledge bases, opening the door to the Linked Open Data web.” still very true.

The diary could get more complex, but in theory, a future proxy for each page on OSM.wiki could still somewhat be linked data as soon as the person request some format like RDF/Turtle. Same principle could apply for the main API, which today returns XML, but a pull request started in 2019 by the Overpass main developer added JSON output (link: https://github.com/openstreetmap/openstreetmap-website/pull/2485), so in theory, even the main API, rails-port, could also be explicitly “linked data”. I started a early draft for that 2022 here https://wiki.openstreetmap.org/wiki/User:EmericusPetro/sandbox/RFC_OpenStreetMap_RDF_2023_conventions_(proof_of_concept) which do a very rudimentar conversion from the XML to RDF/Turtle as a proxy (if person request XML or JSON, it does noting, just output the true output of the de faco API). That was the very easy part, the true challenge (beyond the slow process of how to agree on a schema good enough) would be start to build the endpoints for every tag, so if a tool try to fetch PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>, it would work

Olha quem voltou!

Opa, ainda que você não tenha parado por tanto tempo assim, seja bem vindo de novo.

Sou do Brasil também. Comecei há menos de um ano.

Notas do OpenStreetMap

Nem todos os herois usam capas.

Parabéns!!!

Problema em visualizar a camada Maxar. Mais alguém?

Usa ESRI. Maxar costumava não exibir metadados e qual dada era a foto (Bing e ESRI dizem) mas as vezes ESRI e Maxar era perceptível serem o mesmo fundo (porém Maxar não citava data.

E dependendo do estado em que você mora, pode ter secretaria estadual que tenha camadas de fundo on-line.