Made-up names and how to avoid them
Posted by SomeoneElse on 15 August 2017 in English. Last updated on 11 August 2021.There’s recently been a thread on the talk-gb mailing list where someone has decided that, despite previous custom and practice there, the “name” field in both English- and Welsh-speaking areas of Wales should be a compound of both the English and Welsh names. No-one says “I’m climbing up Snowdon / Yr Wyddfa today”, they’ll use one name or the other, not both together.
In the Welsh-speaking areas the Welsh names are more likely to be used; in the English-speaking areas the English names. It’s not a hard-and-fast rule; this peak in the Black Mountains is referred to about equally by both the Welsh and English names, despite it being in a predominantly English-speaking area.
Wikipedia gives an idea of Welsh-language take-up here. That’s a bit broad-brush; for example I don’t think there’s an isogloss between Carmarthenshire and Swansea where people gain/lose the ability to speak Welsh.
So how is it possible to extract data from OSM with the Welsh name in the Welsh-speaking areas and the English name in English-speaking ones, both when creating e.g. a rendering database for the first time and when updating it as people update OSM? Firstly we’ll just consider the “loading the database” part.
There are a couple of possible solutions to the problem. I used “osmosis”, which has a handy “tag transform” feature. The Welsh one is here; the English one is similar.
Very roughly, the Welsh-speaking area of Wales corresponds to this area. That’s not perfect, but it’s not a bad approximation for a rectangle. I downloaded the latest Welsh data from Geofabrik and cut that area out of it:
osmosis --read-pbf wales-latest.osm.pbf --bounding-box left=-4.82 bottom=52.02 right=-3.34 top=53.69 --write-pbf wales_cy_before.pbf
Convert the “Welsh-speaking” part to names based on “name:cy”:
osmosis --read-pbf wales_cy_before.pbf --tag-transform transform_cy.xml --write-pbf wales_cy_after.pbf
Create a copy of the larger file with names based on “name:en”: