Logo OpenStreetMap OpenStreetMap

How to actually invent tags

Zapsal BushmanK 18. 1. 2016 v jazyce English. Naposledy aktualizováno 30. 1. 2017

There is an article How to invent tags written mostly back in 2010, which is a kind of official guideline for making new tags and tagging schemes. An “official” - at least because it’s linked from Any tags you like article, describing one of the fundamental principles of OSM. But it seems like a very minimalist guide because it covers only a small fraction of mistakes. It even gives certain unclear tips, which could be easily misinterpreted. It also tells about “don’ts”, but just a bit about “do’s”. I’d like to discuss it here, just because writing something on Discussion page of that article will not bring anyone’s attention.

The first thing anyone must do before introducing or proposing a new tag is to do a research. There are tons of officially approved, just documented, poorly documented and undocumented tags introduced by unknown contributors. Obviously, it’s not so easy to find common entity which could be added to OSM database, while there is no tag for it. Therefore, the best way to do initial research is to use Taginfo, Wiki search and regular search engine queries with special keywords such as site:openstreetmap.org and site:wiki.openstreetmap.org for Google. It’s better to use different keywords, even barely related to tag you want to introduce. Then, it’s important to thoroughly examine everything you’ve got from search: keys, values, descriptions, statistics of usage and schemes/methods of tagging.

A good example of skipping this step is the recent proposal of leisure=aquatics_centre. There already is the way to tag it with leisure=sports_centre sports=swimming (however, it’s not ideal, but it doesn’t matter in this exact case because proposed tag doesn’t address any issues of a current scheme). An author either wasn’t aware of this existing method or ignored it (these are just two possible reasons, I’m not speculating about his intentions here). In general case, research should help anyone to discover any existing methods of tagging.

Speaking of different keywords, it’s important to understand, that English is quite diverse. And it’s not only about color and colour, center and centre. Sometimes, semantics is quite different. A good example is shop=furnace. Likely, this tag was introduced by someone from the United States, because it was supposed to be a heating equipment store, not industrial metal melting/heat treatment equipment store. Should anyone be surprised, that contributing into an international project by introducing new terms requires certain linguistic skills? I don’t think so. But for people, who invented building:facade:color= or megalith_type=grosssteingrab it wasn’t that obvious.

Regarding of any new key, it should be required to explain, which exact property of a real-world object it describes. Otherwise, we will get more keys like memorial= which specifies a “kind” of memorial (could anyone, please, explain, what is a “kind” exactly?), so we have values for both design (stele, plaque) and for an event it commemorates (war_memorial). Usage of kind, type, sort of anything in descriptions is extremely bad manner, even if you giving an example.

A similar thing applies to values: any new value of key should be homogeneous with the existing ones. Therefore, no stele and war_memorial as values of the same key. Just think about it: how would you tag a stele, built in memory of war?

Value war_memorial of memorial= key is an example of semantic redundancy, because “memorial” is repeating. Another bad practice, not covered by original guide.

Same article also recommends to use descriptive names like hiking_trail, instead of just trail. This example looks misleading - there probably could be trails for other purposes, therefore, having the whole bunch of *_trail tags is not a good idea. Would it be better to have separate key for trail purpose? Probably, yes. Here, that preliminary research helps to find out, if there are other purposes or something.

Original article says: “Avoid elaborate classifications”. Elaborate is not bad, if objects, described by it, are complex enough. Example given there illustrates redundant, not elaborate classification. Complex classifications could be required to describe certain things, just because of nature of these things. You can’t have less classes (values, for example) for something which actually has many different variants. Generalization works sometimes, but it’s not necessarily acceptable decision. For sure, redundancy is a bad thing, but it’s better to explain that properly.

This article also doesn’t tell anything about extending an existing schemes. Another (very common) issue, not covered there, is a problem value lists. Key sport= gives a good example: if there is a leisure=sport_centre or leisure=pitch, it’s impossible to tell, that you can play both volleyball and basketball there. Only option for multiple sports is multi. Theoretically, there is a way to have value lists by using semicolon separator, but people rarely do that because it’s rarely supported by styles and converters. Good practice is to have several keys in the same namespace with yes/no/unset values, just like Healthcare 2.0 uses for different medical specialists.

There are many other common issues, but I think, I already said enough to explain what I’m talking about.

As usual, I just have to say, that my intention is not to discuss specific tags (mentioning any bad scheme often leads to that), but to discuss the problem of better guide for making new tags.

Ikona e-mailu Ikona Bluesky Ikona Facebooku Ikona LinkedIn Ikona Mastodonu Ikona Telegramu Ikona X

Diskuse

Komentář od escada z 18. 01. 2016 v 12:29

Did you read all mails on the tagging mailing list that were written before leisure=aquatics_centre was proposed ? [1] And the mails after that [2] ? There was quite some research before the proposal was written.

[1] https://lists.openstreetmap.org/pipermail/tagging/2016-January/027949.html [2 ]https://lists.openstreetmap.org/pipermail/tagging/2015-December/027824.html

Komentář od BushmanK z 18. 01. 2016 v 19:49

@escada,

Research implies learning from results of it. It didn’t happen, so it doesn’t count for research.

Proposal you’ve mentioned doesn’t address any reasonable counterarguments expressed in mail list discussions. People were telling, that leisure=aquatics_centre is redundant, they were telling about the existing way to tag it, however, proposal description doesn’t have a word to explain, why it is important to introduce this new value keeping in mind these counterarguments. Reading these threads in mail list, I’m getting more convinced, that author intentionally ignored everything he had been told and preferred to take his chance by proceeding to proposal procedure.

There are currently two issues with swimming facilities and sport facilities in general:

  • swimming_pool exists as a value of two different keys, leisure and amenity
  • if you have leisure=pitch, leisure=sports_centre or any other general sport facility, you can’t specifically tell, what sports it’s intended for if there is more than one.

This is a problem. Having specific tag for water sports center with pool doesn’t solve any of these cases.

Komentář od Severak z 18. 01. 2016 v 23:34

yes, tags can be very confusing. My favorite confusion is *something*=no (eg. tunnel=no, electrified=no).

Komentář od BushmanK z 19. 01. 2016 v 00:15

@Severák,

Technically, *=no is okay in case if default value for this property is yes. It should only be clearly described in Wiki.

But tunnel=no is just an example of bad mapping manner, not of bad tag design. In cases like this one we should probably blame JOSM presets and validation rules for allowing it (and people, who put every tag they know on a simple road, which is quite common manner).

Komentář od Warin61 z 19. 01. 2016 v 10:44

Multiple sports has been using the ; as a separator …

e.g. sport=netball;basketball

I agree this is not ideal … but that is the history of it.


A ‘result’ can be negative. And learning from it can be more beneficial than a positive result. ——- The situation is a product of ‘anyone can make new tags’. While liberating .. it can leave a mess behind.

The present situation is that OSM has; ‘tree’ structured tags ‘duck’ structured tags

And combinations of both!! I’d very much like to have a solution that would work. I don’t. If something strikes me … I’ll tell.

Komentář od BushmanK z 19. 01. 2016 v 18:22

@Warin61,

I am aware of semicolon as separator for multiple values (and I have mentioned it), but it’s technologically ugly scheme, rarely used and rarely supported by anything.

There is a solution: - more responsible voting for proposals - better communication (mailing lists are awful ancient technology) to improve abilities to discuss tagging - better notification (most of people here are not even aware of new proposals) - better guidelines (I feel like I can improve current guide, but it’s a private page, not general Wiki page, and I’m not completely sure my edits will not be reverted by formal reason of couple of grammar mistakes instead of being fixed), - tagging refactoring program (probably, with some sort of issue tracker), where everyone can complain about uncertain, contradictory, redundant tags and so on.

Komentář od escada z 20. 01. 2016 v 06:16

Just a thought:

Do we really need more structure in our tagging system ? As you indicated in your reply above some people have already problems to determine whether a swimming pool is a leisure or an amenity. Or they do not agree with the majority. What if we could just say “this thing is a swimming pool” by adding a tag (a real tag, not a key-value pair as we currently do) “swimming_pool” to an object. Let the data consumer decide whether it is an amenity, a leisure or a thing for tourists.

Another question Why do we need to define aquatic centre by a bunch of key-value pairs ? Why can’t we just label it as “aquatics centre” ? If you, as a data consumer want to put it in some category, feel free to do so. Why do I, as a mapper, already decide in which category it belongs ? or try to use a bunch of key-value pair to express what I can simply express with 2 words ?

Some people would like to see restaurant categorized as “shop”. Others not. Simply labeling it as “restaurant” solves this.

This would also allow to tag a “thing” as butcher and bakery.

I’ll admit that for word with two meanings or adjectives such as “primary” of “civic”, this does not work. It will also not work for names, refs, addresses, etc. But for the top level thing, it might help.

Structured tagging is fine for some people, not for others. Too much structure makes things complex. See dentist. “dentist” is very easy, “amenity=dentist” is a less easy (I have to decide between office, amenity, …) , and dentist tagging according to HealthCare 2.0 is complex.

If a word (or group of words) exists in a dictionary we should be able to use it as a tag. The documentation can then add the possible categories (amenity, leisure, …) to which the item belongs. Software can add those categories when needed. So Aquatics Centre would be a valid tag.

I haven’t thought about this for a very long time. There will probably be issues with this way of tagging as well.

Komentář od BushmanK z 20. 01. 2016 v 06:43

@escada,

We probably don’t need more structure for the sake of structure itself. But it would definitely be good to have less chaos.

  • It is better for contributors because it makes tagging more straightforward and requires less guessing.
  • It is better for OSM developers, because it’s easier to make and support presets for editors, map styles, QA tools and so on.
  • It’s better for data consumers because each entity doesn’t require a huge research to find out the whole bunch of different tags it is represented in OSM database. Therefore, it’s easier to make derivative products, support converters etc.

It is a bad manner to make people digging through ambiguous stuff to make something working, that’s why it makes sense to get rid of ambiguity, uncertainty, semantic overlaps and semantic divergence (these are main types of imperfections in tagging schemes I found for myself). Making data consumer searching through taginfo to find all coexisting ways to tag something or making him solving a puzzle when he finds “hut” or “cabin” is an act of ignorance. There are hundreds of categories, and better (cascading) schemes we have, easier it is to work with data. And to contribute, in the same time.

If someone is a rebel, then he can express himself in Wikimapia. OSM is a collaborative project and collaboration means respect. Reasonable respect, not the blind one, therefore, we don’t have to respect anyone’s unreasonable desire (like, to see restaurants in “shop” category). It is currently impossible to afford letting people to use real-world terms without any more or less strict, but definite classification, otherwise data will become completely unusable.

Přihlaste se k zanechání komentáře