OpenStreetMap logo OpenStreetMap

Adventures in Punycode

Posted by jmapb on 31 December 2022 in English. Last updated on 4 January 2023.

This year I mapped my first POI with a Punycode website: a Japanese cafe/ice cream shop in Brooklyn called Amai Bā (osm.org/node/9795586991).

photo of Amai Bā

The website is https://www.xn–amaib-jwa.com. This was posted via QR code, taped to the shop window before the cafe opened. I’ve been hoping it would reappear on the permanent signage, but nothing so far – possibly because the website is still an empty placeholder.

Punycode https://www.xn–amaib-jwa.com is the agreed standard RFC 3492 way to encode https://www.amaibā.com. This is necessary because the Unicode character ā (a with macron, used mostly in rominisation of non-Latin writing systems) is not part of the letter/digit/hyphen character set supported by DNS, making amaibā.com an Internationalized domain name. Under the hood, xn–amaib-jwa.com is the “real” domain name, but most modern browsers are configured to show the site as amaibā.com.

I found only one other POI with a Punycode website in all of North America: osm.org/way/712913728 “7R Events,XV & Bridal Shop & Decor” in Cleburne, Texas, with the website https://www.xn–quinceaerasoutlet-lxb.com – Punycode for https://www.quinceañerasoutlet.com. (Quinceañera is often abbreviated to the Roman number XV, which helps in parsing the name of this business.) That website isn’t live, though. The correct website is https://www.quinceanerasoutlet.com with a standard Latin “n” instead of the eñe. It’s possible that quinceañerasoutlet.com was active at one time, or perhaps there was a typo or overzealous Spanish autocorrect. At any rate, I updated 7R’s website, so as of now Amai Bā is North Americas’s only surviving Punycode website in OSM.

Punycode websites are much more popular abroad, as seen in this Overpass Turbo map (screenshotted before I fixed 7R’s website – the quick-n-dirty query is https://overpass-turbo.eu/s/1pyt):

screenshot of Overpass Turbo map

They’re rare in the Americas, unmapped in Africa and Austrailia… but there are plenty in Europe, encoding Spain’s eñes, France’s accents grave and aigu, Germany’s eszetts and umlauts, Denmark’s æ’s and ø’s, etc. (Spot-checking these revealed a few more examples of incorrect website tags similar to 7R’s in Texas, where the correct url can be found by replacing letters with their plain Latin approximations. If I have time to fix some of these I could boost my number-of-countries-mapped score.)

There are clusters of non-Latin Punycode websites mapped in Ukraine, Belarus, Russia, South Korea, and Japan. There are few to none mapped in other places that feature non-Latin scripts, like China, Greece, and the Middle East.


Unlike DNS, OpenStreetMap itself supports the full Unicode character set, encoded in UTF-8. By convention, all OSM tag names and many tag values use a much smaller subset which excludes any letters that aren’t standard Latin, but website can take Unicode values without problem. Tagging website=https://www.amaibā.com would work just as well as tagging website=https://www.xn--amaib-jwa.com. The browser will convert to Punycode as needed when making network requests, and (normally) back to Unicode for display. (An example in New Zealand: osm.org/node/5507224960)

The OSM wiki currently gives no advice on which form is preferred. Both have pros and cons. For Amai Bā I recorded the Punycode because that’s how it was signed (via QR code… some QR encoders and decoders don’t handle Unicode URLs well, so Punycode might be advantageous in that situation.) Offhand it seems that tagging UTF-8 Unicode URLs is much more popular in Europe, slightly more popular in the USA, and slightly less popular in Japan.

By the way – if anyone reading this knows the how to write “Amai Bā” in Japanese, please be a dear and tag it on node 9795586991 under name:jp, thanks!

Happy New Mapping Year, everyone!

Location: Brooklyn Heights, Brooklyn, Kings County, New York, United States
Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Log in to leave a comment