Biểu trưng OpenStreetMap OpenStreetMap

Character encoding in JOSM and OSM files

Do ArtyCarty479831 đăng vào 31 tháng 03 năm 2009 bằng English.

As a US user, I don't run into many accented characters. I'm usually blissfully ignorant of character encoding issues, happily using the same ASCII codes as I did 25 years ago on my Apple II.

In loading the USGS points of interest, especially for Puerto Rico, I am encountering various accented vowels and ñ (n~, hope I got it right here :-). The source data is encoded as iso-8859-1, while JOSM works in utf-8. I noticed this when I loaded my OSM file into JOSM, and got little squares instead of legible letters.

No problem, I thought, I can add an encoding line to the start of the OSM file, saying it is iso-8859-1. No variations of upper/lower case or with/without dashes made any difference. It still came up with little boxes for accented chars in JOSM. (This is JOSM version 1504). I think JOSM only takes UTF-8 from disk files, and doesn't obey the XML encoding specified in the file.

Doing a hex dump on the file showed single-byte values for the special characters, so I was sure it was iso-8859-1 and not utf-8.

Plan B: make the file really UTF-8. In Vim, I loaded the file, then did

:set fileencoding=utf-8

and saved the file. That did the trick. Hexdump showed multi-byte characters, and JOSM showed them correctly on the screen. Problem solved.

Biểu tượng thư điện tử Biểu tượng Bluesky Biểu tượng Facebook Biểu tượng LinkedIn Biểu tượng Mastodon Biểu tượng Telegram Biểu tượng X

Thảo luận

Bình luận của Firefishy vào 31 tháng 3 năm 2009 lúc 07:41

Another way...
iconv - codeset conversion

$ iconv -f ISO-8859-1 -t UTF-8 [file]

Đăng nhập để nhận xét