When working with OSM it’s generally fair to assume that textual data, like tag values, are encoded in UTF-8. Without this assumption, multilingual mapmaking would be almost impossible - custom fonts or browser settings would need to be specified for every language when displaying geocoding results, routing directions or map labels.
As part of the newly resurrected Engineering Working Group, I’m investigating ways to improve OSM’s software ecosystem. One of the top tasks for the EWG is localization, and standardized text encoding is a prerequisite for this, but OSM does not enforce any particular encoding as policy.
Where is the non-Unicode data?
The most obvious instance of non-Unicode in OSM is the Zawgyi encoding for Burmese text. For background on Zawgyi, see this post on the civil war between fonts in Myanmar.
The default Mapnik-based rendering on OpenStreetMap.org, openstreetmap-carto, uses Unicode fonts. Zawgyi-encoded tags appear obviously garbled on the map, with the combining mark ◌ visible: