Character encoding in JOSM and OSM files

Posted by ArtyCarty479831 on 31 March 2009 in English.

As a US user, I don't run into many accented characters. I'm usually blissfully ignorant of character encoding issues, happily using the same ASCII codes as I did 25 years ago on my Apple II.

In loading the USGS points of interest, especially for Puerto Rico, I am encountering various accented vowels and ñ (n~, hope I got it right here :-). The source data is encoded as iso-8859-1, while JOSM works in utf-8. I noticed this when I loaded my OSM file into JOSM, and got little squares instead of legible letters.

No problem, I thought, I can add an encoding line to the start of the OSM file, saying it is iso-8859-1. No variations of upper/lower case or with/without dashes made any difference. It still came up with little boxes for accented chars in JOSM. (This is JOSM version 1504). I think JOSM only takes UTF-8 from disk files, and doesn't obey the XML encoding specified in the file.

Doing a hex dump on the file showed single-byte values for the special characters, so I was sure it was iso-8859-1 and not utf-8.

Plan B: make the file really UTF-8. In Vim, I loaded the file, then did

:set fileencoding=utf-8

and saved the file. That did the trick. Hexdump showed multi-byte characters, and JOSM showed them correctly on the screen. Problem solved.