اوپن سٹریٹ میپ لوگو OpenStreetMap

Character encoding in JOSM and OSM files

31 March 2009 کوں English وچ ArtyCarty479831 دی طرفوں پوسٹ کیتا ڳیا.

As a US user, I don't run into many accented characters. I'm usually blissfully ignorant of character encoding issues, happily using the same ASCII codes as I did 25 years ago on my Apple II.

In loading the USGS points of interest, especially for Puerto Rico, I am encountering various accented vowels and ñ (n~, hope I got it right here :-). The source data is encoded as iso-8859-1, while JOSM works in utf-8. I noticed this when I loaded my OSM file into JOSM, and got little squares instead of legible letters.

No problem, I thought, I can add an encoding line to the start of the OSM file, saying it is iso-8859-1. No variations of upper/lower case or with/without dashes made any difference. It still came up with little boxes for accented chars in JOSM. (This is JOSM version 1504). I think JOSM only takes UTF-8 from disk files, and doesn't obey the XML encoding specified in the file.

Doing a hex dump on the file showed single-byte values for the special characters, so I was sure it was iso-8859-1 and not utf-8.

Plan B: make the file really UTF-8. In Vim, I loaded the file, then did

:set fileencoding=utf-8

and saved the file. That did the trick. Hexdump showed multi-byte characters, and JOSM showed them correctly on the screen. Problem solved.

ای میل آئیکان بلو سکائی دا آئیکن فیس بُک دا آئیکن لنکڈ اِن دا آئیکن میسٹوڈون دا آئیکن ٹیلی گرام دا آئیکن ایکس دا آئیکن

بحث مباحثہ

31 March 2009تے 07:41دے بارے Firefishy دی رائے

Another way...
iconv - codeset conversion

$ iconv -f ISO-8859-1 -t UTF-8 [file]

کمنٹ کرݨ کیتے لاگ ان تھیوو