OpenStreetMap logo OpenStreetMap

Post When Comment
OpenStreetMap Isn't Unicode

I tried the previous example on Python 3.5.3 on Win10, and i’m unable to reproduce your error message. It works just fine here, even without “backslashreplace”. Ubuntu 20.04 is pretty much the same.

>>> "𫙮魚坑溪".encode(encoding="utf-8").decode(encoding="UTF-8")
'𫙮魚坑溪'
>>> "𫙮魚坑溪".encode(encoding="utf-8", errors="backslashreplace").decode(encoding="UTF-8")
'𫙮魚坑溪'
>

TBH, I’m running a bit of of ideas here.

OpenStreetMap Isn't Unicode

I also tried this Overpass query on my patched instance with ICU Regexp support, looking for any signs of surrogate characters: https://overpass-turbo.eu/s/1eNy (this won’t run on any other public instance)

I couldn’t find anything, though.

OpenStreetMap Isn't Unicode

By the way, U+D800 and U+DFFF would be rejected as invalid already: https://coliru.stacked-crooked.com/a/1cf38225f38a1acf

OpenStreetMap Isn't Unicode

So, the fact that OSM contains these surrogates, is at least discouraged from the point of view of UTF-8 conformance (“…have to be rejected.”).

Do you have an example for this? Our previous example https://decodeunicode.org/en/u+2B66E is not part of the U+D800—U+DFFF range.

OpenStreetMap Isn't Unicode

Thanks. I guess this must be the one with two question marks in front (??魚坑溪), which is is probably 𫙮魚坑溪 in osm.org/api/0.6/way/196994995 (version 11)

For the avoidance of doubt, everything in OSM is UTF-8 encoded, even this example. I don’t know exactly what your Python code was doing before. Without actual code we’ll never find out.

OpenStreetMap Isn't Unicode

@mboeringa: apologies, we can’t really help you with your custom Python code. In case you want others to take a look at this, please include the exact OSM object id and version number and ideally also a minimum code snippet to reproduce your issue.

Let's map some road names in Klang Valley, Malaysia!

@lyx: maybe you should take a look at the maproulette instructions:

Add road names in Klang Valley, from the locations provided. Each location contains a KartaView link, in one of the tags, that could represent a road name. Please verify the image, as well as other images in the location’s area and provide one of the tags: name, alt_name or other tags mentioned in the wiki page…

(emphasis mine)

Example KartaView link would be: https://kartaview.org/details/3859749/611/track-info

OpenStreetMap Isn't Unicode

Right, the API would only validate proper UTF-8 character encoding, but can’t validate if the data uses valid Unicode character codes. The issue I see is that the Unicode consortium keeps adding more and more characters over time (mostly Emojis), which is a massive pain for an API to keep up to date with.

However, I still think, we’re not giving any guarantees beyond UTF-8 character encoding, such as input data adheres to Unicode 14.0 Character Code (or whatever happens to be the current version).

OpenStreetMap Isn't Unicode

I pasted the code here so you can give it a try for yourself, and check out different byte strings: https://coliru.stacked-crooked.com/a/9c90312fc8222ea8

In a quick test, it seems to adhere to the table shown on https://lemire.me/blog/2018/05/09/how-quickly-can-you-check-that-a-string-is-valid-unicode-utf-8/

In case you think we need something else, please provide concrete examples, pointers to relevant documentation, ideally even reference implementations.

OpenStreetMap Isn't Unicode

I believe we have similar issues in other languages, like this one here osm.org/node/880591396 highlighting the incorrect é instead of é in the name tag.

It could be a result of copying data from Windows-1252 to UTF-8, as described in https://stackoverflow.com/questions/2014069/windows-1252-to-utf-8-encoding

That’s still all perfectly well UTF-8 on the technical level, but the information itself is crap.

Query: https://overpass-turbo.eu/s/1cqm

OpenStreetMap Isn't Unicode

This one may be useful to start out with a small list of candidates, rather than processing an extract. Use “Export -> raw data directly from Overpass API”, and rename interpreter to data.osm.pbf (or similar): https://overpass-turbo.eu/s/1cq1

OpenStreetMap Isn't Unicode

Your examples include nodes which have been edited by both Rails port and CGImap ( osm.org/node/3890664806/history ). If those strings are fulfilling the technical requirement of being a valid UTF-8 sequence, then that’s probably all the API guarantees at this time. I don’t think any syntax or semantic level validation checks for any language have ever been in scope for the API, hence your issue would be out of scope for the API.

If you like to further discuss this topic, my recommendation would be to start an issue on the Rails port and see how it goes. CGImap can be adjusted only after the Rails port adapted any respective changes first. (that’s a general rule, and applies irrespective of the topic at hand).

OpenStreetMap Isn't Unicode

CGImap does in fact validate, if the string is UTF-8: https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/include/cgimap/util.hpp#L20-L34 and will refuse non-UTF-8 strings with an HTTP 400 error.

If that check isn’t working for some reason, please create an issue on https://github.com/zerebubuth/openstreetmap-cgimap/issues instead.

Please note that CGImap started to process changeset uploads in Jun 2019. Anything that has been last changed before that date is an issue in the Rails port (that may since have been fixed(!))

Anyway, if you find an issue with the Rails port, please create an issue https://github.com/openstreetmap/openstreetmap-website/issues instead.

When reporting an issue, please include a current example how to reproduce the issue. Ancient nodes that haven’t been changed since years would need some fixing for sure, though.

State of the Go Map!!

It’s hard to imagine that the code is even faster now. Editing Lake Huron, one of the larger relations around with about 13k members, worked flawlessly my tests. That’s pretty amazing.

JOSM.de gendert nun also schon im Hauptmenü.

Die meisten anderen Übersetzungen gehen in die Richtung “Erste Schritte” oder “Mit dem Editieren beginnen”.

So etwas lässt sich auch ganz einfach auf folgender Wiki-Seite anpassen: https://josm.openstreetmap.de/wiki/StartupPageSource#de

Übrigens, mit der Art, wie dieses Thema hier angesprochen wird, kann ich nichts anfangen. Meine Sprache ist das nicht.

Triskaidekaphobia in Dublin

https://overpass-turbo.eu/s/1aEi might be good to fetch all addr:housenumber in Dublin along with their count, all in a single query.

Counting housenumber nodes

Indeed the focus on nodes was intentional. Counting ways typically requires more resources and would probably take even more time. This is because for each way you have to find out its geometry and check if it’s within a given country border.

A few pointers for ways:

It’s Easier To Contribute to OSM’s Website Now

(translation tool should be translatewiki.net rather than transifex)

It’s Easier To Contribute to OSM’s Website Now

In fact, Michal introduced a few new text labels in English. As always, those labels need to be translated to different languages in a separate step, which happens in an external application “transifex”. This might take a bit, depending on how fast translators complete their work on new text labels. As long as the translated texts are not yet available, you will see the English text instead.

It’s Easier To Contribute to OSM’s Website Now

French translation will probably be deployed shortly: https://github.com/openstreetmap/openstreetmap-website/commit/1fd5d7f4fece70dfae87ce27644ad31cf972e688#diff-2c5ab6165f7efe573a84107e0e51102ad47cefb0c65629759d7458eee14326e7