Minh Nguyen's Comments

Artikel	When	Comment
Transliteration Midterm Update!	sekitar 15 jam yang lalu	(To clarify, OpenCC is for converting between traditional and simplified Chinese, not for pinyin.)
Transliteration Midterm Update!	sekitar 15 jam yang lalu	Wonderful to see this progress so far! Since you mentioned unidecode, have you given any thought to how Nominatim could eventually support alternative libraries for certain languages? The Any–Latin transform has very broad coverage, but for example OpenCC is designed specifically for Chinese text (assuming you can distinguish it from Japanese kanji). Some languages have multiple transliteration standards, and users might prefer one or another.
New York minor civil subdivisions - status and progress	2 bulan yang lalu	Thanks for your patience. We’ve all been there. Arbitrary is a good word for it. We can’t quite get rid of the `place=` key, because nothing else would indicate the node represents a population center (even an informal one like a neighborhood). But I do think of it as a backwards compatibility shim for the most part. Kevin’s retagging from a few years ago reset what had previously been a chaotic, difficult to explain patchwork of tagging up to that point – not at all aligned with official designations either. We’ve had to do similar stuff in other parts of the country, especially New England and the Midwest where people came with their own ideas about how to shoehorn local terminology into the existing software-supported tags. The proposal I linked above would attempt to repurpose `place=` values based on Census Bureau concepts, which are their own ball of fun. The idea is to solve another problem we have, which is that you can’t determine based on OSM data whether a given place is a suburb in the North American sense of the word. (`place=suburb` means something different than what you’d expect, because keywords largely use British English for historical reasons.)
New York minor civil subdivisions - status and progress	2 bulan yang lalu	Please don’t get too caught up in the literal word `place`. It’s just an unfortunate keyword in our tagging ontology, a database column name. A data consumer can choose to use or ignore that classification. Most renderers ignore it, only using the node for its location, name, and population. Most geocoders ignore it as long as the node is linked to a boundary relation. And yes, to some extent, any vaguely nationally or internationally harmonized classification system is going to be “imperialistic and uninformative”, to use your words. This particular set of keywords started in the UK, like the rest of OSM. You have no idea how problematic it is for OpenHistoricalMap, which has to shoehorn precolonial societies and more into OSM’s tagging system! Hence `border_type=*`, an utterly locally oriented key. We Americans are big fans of that key and are trying to position it as an alternative. Please use it.
New York minor civil subdivisions - status and progress	2 bulan yang lalu	If you’re looking at a `place=` tag to understand official designations, please consider looking at the associated boundary relation’s `border_type=` tag instead. Tags like `place=town` are intended to represent population centers, somewhat irrespective of government structures. This may make place nodes less useful to you in orienting yourself on the map, but it’s a tradeoff in favor of harmonizing a system beyond state lines. If we are interested in refining the population-based heuristics to account for density and business activity, first we need to choose a geographic extent independent of official municipal boundaries. This proposal calls for a more rigorous basis for place classification that generalizes nationally without as many fudge factors. It is a result of many difficult debates on the forum and elsewhere. But you don’t need to wait for this to happen to start using `border_type=*` today.
OSMgo.org roof:type=gabled	3 bulan yang lalu	Sorry, that was cryptic of me. F4Map apparently uses `crown_type=I` to mean palm tree. I only found out by accident when looking at the browser console, desperate to get a palm tree to come out.
OSMgo.org roof:type=gabled	3 bulan yang lalu	Actually the solution is: “Make it like F4map.org”, because most existing edits are “tagged for the F4-renderer”. But as it is closed source, we have to find things out by examples. All I have to say about that is: 🌴
What do you need from a preprocessed MapLibre style editor?	3 bulan yang lalu	the pre-processed cascading was intended for the days before data-driven styling and is probably less crucial now, so could perhaps be dropped. Yes, in general, you’d want to minimize the number of layers in the stylesheet for performance reasons. For example, if you can draw roads using only one layer, then the style’s size and power usage decrease dramatically and rendering becomes noticeably smoother. The two obstacles are some properties’ lack of support for data-driven styling and some tilesets being overstratified into too many layers.
What do you need from a preprocessed MapLibre style editor?	3 bulan yang lalu	Other style projects like openstreetmap-americana have taken a different route. Their developers have written a program in JavaScript that generates the style. The Americana project has certainly run with this idea, but it started with some very specific pain points, which code generation or preprocessing helps to mitigate but doesn’t solve completely: Some features like roads and railways need to be repeated, along with accessory layers like casing and dashes, in order to depict vertical order (i.e., `layer=`). `line-dash-array` doesn’t support data-driven styling, so the very many kinds of railways need very many separate layers. Route shields are implemented as icons generated on demand per route for more fine-grained text layout. (These modifications don’t make it into the generated style JSON.) To conflate `name=` with a dynamically selected `name:=` for our glossed labels, we needed to parse semicolon-delimited lists. This requires basic text processing functionality that was declined. Instead, we reimplemented string find and replace by generating a string tokenizer made of recursively nested `slice` and `case` expressions. 😱 In principle, some of this work could be offloaded to a tile server that returns personalized tiles, but that would reduce the tile server’s portability. The more we can streamline these steps in static style JSON, the more portable Americana would be, especially to other platforms. Having to repeat definitions instead of using a variable. Something like a color or symbol definition might appear a dozen times in the style. If you want to change it, you need to make sure you got all the occurrences. For what it’s worth, this MapLibre proposal would introduce the notion of global state, which effectively also allows for design-time consolidation as a side effect. If it goes in, it could simplify your preprocessor somewhat. It could also reduce the need for maintaining (or generating) separate styles based on color scheme.
Tagging For The Renderer	3 bulan yang lalu	Lately I’ve been taking to calling it “fudging the data”. 😋 A bit less accusatory, but still gets the point across that one is playing games with the data out of expediency.
Tagging For The Renderer	4 bulan yang lalu	The page was briefly renamed to “Lying to the renderer” in order to more clearly limit the admonition to data hacks. But it soon got renamed back to the original title because “lying” sounded too accusatory and most people had gotten used to saying “tagging”. Imagine commenting on a new mapper’s changeset, saying they’re “lying” – not a particularly welcoming message. That said, “Tagging for the renderer” remains a widely misunderstood phrase. The gist of the article is that we need to balance the needs of all kinds of data consumers, current and future. Essentially that means preferring semantically accurate mapping over something more presentational and shortsighted.
Please stop guessing about highway/waterway crossings	9 bulan yang lalu	Good idea. “Ignore this issue” is an accurate description of what happens, but some users may perceive ignoring to be an act of laziness or even malice, whereas it can actually be more neutral than that. “I don’t know” would be similar to the options available in StreetComplete and MapRoulette, two alternative editing environments that prioritize human factors in their workflow designs. “Can’t be determined at this time” isn’t terribly verbose. There’s already a “Not the same x” option on the warning about missing brand tags, where x is an arbitrarily long brand name. If it gets long enough, it simply wraps to the next line. Please open an issue in the iD issue tracker so it can be triaged and given further consideration. Thanks!
Please stop guessing about highway/waterway crossings	9 bulan yang lalu	Thank you for this important message. Validators work with very little context and even less intelligence. This goes for not only iD’s validator but also JOSM’s, Osmose, etc. “Ignore this issue” is absolutely a valid response. If something is so bad that you shouldn’t ignore it, it would be an error and iD would actively block you from saving your changeset. I’ve been concerned for years about the tendency for these issues to become “gamified” because mappers perceive HDYC as their permanent record, as if editcountitis wasn’t enough of a problem. (At this point, I wear my Osmose issue count as a badge of pride.) Aside from possible UX improvements to iD, we should figure out how to identify any bridges or culverts that were created hastily or carelessly. By default, the “Add a bridge” and “Add a tunnel” suggestions create bridges and tunnels of a certain length based on factors such as the tags on the crossing way and the angle of the crossing. When iD introduced this feature, the developers expected the mapper to manually adjust the bridge to match the real-world length. However, this is unlikely to happen if the mapper can’t see the bridge in imagery. Maybe we can reverse-engineer these heuristics in an Overpass or QLever query.
To name or not to name ...	10 bulan yang lalu	I very belatedly noticed this diary post after you recently mentioned it on a forum thread and someone later pinged me. I’m the one who added the entry for Avalon to NSI. Avalon not only develops and owns apartment complexes but also heavily brands everything about them. The monument sign out front is just the start of it. When I briefly lived at an Avalon apartment complex, their then-ubiquitous fleur-de-lis-like logo really came to aggravate me. The only reason this brand goes with `landuse=residential` is that `landuse=residential` `residential=apartments` is the de facto tagging combination for a multi-structure apartment complex. Fortunately, only a few other “apartment chains” McDonaldize apartment living to such an extent. It’s nothing more than an annoying edge case.
Pathology 101 – A primer into a new science	10 bulan yang lalu	Back in 2008, when the community simultaneously voted on `highway=road` and `highway=path`, we mapped these features based on field surveys, GPS traces, and Yahoo! imagery that was barely legible at zoom level 17 in the U.S., to say nothing of Europe. We couldn’t be as nitpicky about navigation mapping as we are today. Yet at the time, some countries’ tagging guidelines insisted that tags such as `highway=primary` and `unclassified` were strictly tied to official designations, which may or may not be signposted. This created a situation where someone could have valuable data to contribute to OSM but be unable to express it. In part, `highway=path` had the same motivation: someone could see a path in GPS traces and be sure that it wasn’t a road, given its location or curvature, but unsure whether it was intended for pedestrians, cyclists, or horseback riders. Neither of these proposals accounted for the idea that someone might need to trace some smudge in imagery without knowing whether it was a road or path… or aeroway or waterway. The advice would’ve been to leave a note in OpenStreetBugs or MapDust instead. `highway=path` was simultaneously approved for shared use paths. So even though most countries have relaxed their road classification rules and we can see more clearly in street-level imagery, `highway=path` has endured while `highway=road` was largely forgotten – until someone noticed that the latter got redefined to include paths. Anyways, sorry for digressing. As you were saying: the couple married and rode off into the sunset… on a golf cart path. 🤪
Pathology 101 – A primer into a new science	10 bulan yang lalu	So much has been written about how to distinguish one kind of path from another, but the truth has been staring us in the face: everything is a footway. A set of steps is just a very bumpy footway on an incline. A cycleway is just a footway along which your feet push pedals. A busway is just a footway along which you stand, feet astride for stability, because all the seats are occupied. A motorway is just a footway along which your lead foot slams the gas pedal and never lets up. The only exception is bridleways: you don’t really do much of anything with your feet, other than sticking them in stirrups. And obviously the horse has hooves, not feet. Yet in my country, all of these ways – even the bridleways – are measured in feet.
When AI is (not) needed	lebih dari 1 tahun yang lalu	Based on the `source` tag, that building probably came from the French cadaster import. Many government building datasets have errors of this sort because the data collection is based on remote sensing technologies like LiDAR. Cleaning up these errors is the very reason why imports are more difficult than simply loading the data and uploading. Whether an external building dataset comes from computer vision, machine learning classification, LiDAR, or other automated techniques, data consumers tend to prefer OSM data wherever it’s present because we’ve typically paid more individual attention and performed quality control on it. If you use an automated dataset in your product, you need to filter out low-confidence features or else you wind up with an impressive statistic but lots of junk. It’s not just buildings. Every now and then, a navigation software vendor gets the bright idea to detect one-way streets automatically based on whether they have telemetry of people mostly going in one direction along the street but hardly in the other. Great – finally solved the problem of routing people the wrong way down a street! Invariably, they have to back away from this approach, because it turns out that many one-way streets don’t have the traffic volume needed to make a confident prediction about the traffic direction. Instead, they get complaints about having to circle around the entire city just to turn right. This data still makes for a great QA tool with a human in the loop, but it’s only a matter of time before someone sees that QA tool and gets a bright idea…
When AI is (not) needed	lebih dari 1 tahun yang lalu	I’m guessing this is the Microsoft building dataset, which applies computer vision to aerial imagery. Some data consumers like Mapbox and Overture Maps are using this dataset to backfill areas where OSM building coverage is lacking or nonexistent. From their perspective, the increase in coverage in places with fewer OSM mappers probably outweighs individual bloopers like this, and I guess from our perspective, we’d rather not face a bulk automated import of this dataset due to these bloopers. Another thing that commonly occurs is that a building has been demolished, so we’ve deleted the building from OSM. But a data consumer working off outdated aerial imagery can’t distinguish that from a never-before-mapped building, so it restores the building from the Microsoft dataset. Of course, a human mapper could make the same mistake if they happen to be using the same outdated imagery with no local knowledge. To address both cases, I’ve gotten into the habit of retagging buildings as `demolished:building=*`, at least until the local default imagery layer gets updated. These data consumers will omit any Microsoft building that intersects a `building` one OSM, so I hope they’ll do something similar with `demolished:building` in the future. This key also has the benefit of serving as a to-do list for OpenHistoricalMap and as a pre-cleanup step for any building import planned for the area. In theory, we could go around mapping `no:building=yes` for thr buildings on wheels you spotted, but my hands are full already without worrying about something that Microsoft could fix by tuning their noise filter.
Restructure wiki page key:name?	lebih dari 1 tahun yang lalu	Yes, this page and the main “Names” page could use a thorough rewrite. There are a lot of intentional nuances in the text that matter but need to be organized better in order for readers to come away with what they need. The article uses “primary name” in order to give an idea of when to use `name` versus some other name-related key such as `name:en` or `alt_name`. All of these keys hold proper nouns, or proper names as you put it, so replacing “primary name” with “proper name” would be correct but beside the point.
Minutely Shortbread tiles	lebih dari 1 tahun yang lalu	Excited would be an understatement! I know it’s just a demonstration with no guarantees, but it just came in handy for the epic abandoned railway discussion we’re having. It was super simple to take your demo and extend it to demonstrate a mashup of minutely OpenStreetMap tiles and minutely OpenHistoricalMap tiles. There’s nothing quite like a live-updating, interactive map to convince others that it’s more than just talk.

Minh Nguyen's Comments

🌴