Transliteration Final Report: GSOC 2025
ایہہ 30؍August 2025ء English وچ «anqixu» لیکھ چھپیا گیا سی۔Hi everyone, this is the update on the final phase of my project in adding transliteration support to Nominatim’s search results! A quick refresher: this project focused on adding transliteration as an option to users who did not understand the local language of a name, in which an understandable tag was not available.
Background
For background, you can check the overview of the project and the midterm report down below:
The bulk of the work can be found in these pull requests:
- Locales and result class refactorization
- Locales documentation update (for refactored code)
- Transliteration integration
Detailed Report of the Project
The detailed version of the report can be read here (version pending Github Commit).
What I did
- Integrated transliteration into Nominatim so search results in unfamiliar scripts (e.g. 北京市) can be displayed in a user-readable form (e.g. Beijing).
- Built a pluggable transliteration framework supporting Latin script via unidecode, with prototypes for Cantonese, Simplified Chinese, and Traditional Chinese.
- Refactored the
Locales
class and results pipeline for clearer responsibilities, modularity, and maintainability. - Introduced a
languages.yaml
configuration file for language normalization and country-language mapping. - Implemented new logic for parsing browser language headers, including handling of ambiguous codes like zh.
- Wrote extensive unit tests and updated GitHub workflows for optional dependencies.
- Added documentation to explain the new localization and transliteration system.
Possible Next Steps
A summary of a few possible next steps are below:
- Improve regionalization (e.g. Hong Kong and Macau, which Nominatim does not yet recognize as independent from China).
- Refine fallback logic when multiple languages are present.
- Extend the non-Latin transliteration framework with more language-specific implementations.
- Expand testing for robustness and reliability.