Hi everyone, this is the update on the final phase of my project in adding transliteration support to Nominatim’s search results! A quick refresher: this project focused on adding transliteration as an option to users who did not understand the local language of a name, in which an understandable tag was not available.
Background
For background, you can check the overview of the project and the midterm report down below:
The bulk of the work can be found in these pull requests:
- Locales and result class refactorization
- Locales documentation update (for refactored code)
- Transliteration integration
Detailed Report of the Project
The detailed version of the report can be read here (version pending Github Commit).
What I did
- Integrated transliteration into Nominatim so search results in unfamiliar scripts (e.g. 北京市) can be displayed in a user-readable form (e.g. Beijing).
- Built a pluggable transliteration framework supporting Latin script via unidecode, with prototypes for Cantonese, Simplified Chinese, and Traditional Chinese.
- Refactored the
Locales
class and results pipeline for clearer responsibilities, modularity, and maintainability. - Introduced a
languages.yaml
configuration file for language normalization and country-language mapping. - Implemented new logic for parsing browser language headers, including handling of ambiguous codes like zh.
- Wrote extensive unit tests and updated GitHub workflows for optional dependencies.
- Added documentation to explain the new localization and transliteration system.
Possible Next Steps
A summary of a few possible next steps are below:
- Improve regionalization (e.g. Hong Kong and Macau, which Nominatim does not yet recognize as independent from China).
- Refine fallback logic when multiple languages are present.
- Extend the non-Latin transliteration framework with more language-specific implementations.
- Expand testing for robustness and reliability.