mboeringa's Comments
Post | When | Comment | ||
---|---|---|---|---|
OpenStreetMap Isn't Unicode | @mmd, Thanks. None of the code I wrote in Python, actually does anything directly with the OSM ‘name’ tag. It is just general conversions of an entire table / record set, e.g. “PostgreSQL -> SQLite”. It seems likely the issue may be caused by one of the intermediate processing steps and conversions outside the database, but it is unlikely there is anything I personally can do about it. So for now, the workaround I developed, will need to do the job. I’d still be interested to hear a bit more about the “surrogate” thing as mentioned in the error message (osm.org/user/bdon/diary/397922#comment51501), and if that part of the error message makes any sense in this context and with the particular object you pointed out as the possible culprit of the processing error I experienced. |
|||
OpenStreetMap Isn't Unicode | @mmd, No problem, I wasn’t actually seeking help with the Python stuff, just wanting to record here a real-world case where having non-UTF-8 encoded strings in OpenStreetMap can cause issues, as I encountered this thread after searching for a solution. As I wrote, the Python encoding/decoding workaround I now implemented, is OK for my particular use case, and solved the full stop in processing due to the error. I have tried to track down the particular object and OSM ‘name’ causing the issue, but so far can’t be more specific than the list below. The code INSERTs records in batches of 50 each, and these are the DISTINCT names of the OSM ‘waterway=river’ objects that were in the batch. Note that this list of OSM ‘name’ tags was generated based on the encode/decode workaround, so the particular offending character is likely not in this list, but has been replaced. name龍巒潭排水溝 龙胜涌 龍宮溪 龙穴南水道 龍坑支線 龙岗河 龙图河 龙伏涌 龙潭涌 龙江 龙江大涌 龙湾河 龙华江 龙仙水 龙溪河 龙记河 龙滚河 龟咀涌 龙潭细围涌 龙山大涌(北段) 龙迳河 龜頭坑 龙华大涌 龙母河 ??魚坑溪 龜重溪 龙潭河 龙沙涌 |
|||
OpenStreetMap Isn't Unicode | @bdon, I think I’ve been bitten by this encoding issue as well, and I am actually a bit surprised it hasn’t turned up before. I use some custom Python code to process selected data from an OpenStreetMap PostGIS / PostgreSQL database (which uses UTF-8) created with osm2pgsql, outside the database, which requires first exporting the data from the database to an external format, then run the Python code, and re-inserting the data in the database. I have successfully used it to process Planet level data before, but now, with a different selection of data of waterways, the entire process halted at the INSERT stage via ODBC (using pyodbc or psycopg2), with a: “‘utf-8’ codec can’t encode characters in position 26202-26203: surrogates not allowed” type of error. To be honest, I am not to familiar with all the encoding stuff, which is quite difficult to comprehend at times, and I am not entirely sure what a “surrogate” means. Anyway, the error message of the actual SQL statement involved displayed Asian characters, Chinese or Japanese, for the OSM ‘name’ tag. I finally solved the processing issue by using a “.encode(encoding="UTF-8", errors="replace").decode(encoding="UTF-8")" construct in Python before including the data in the SQL statement. Of course, this isn’t a real solution with the “replace” option, but it at least allowed overcoming the frustrating halt in the processing. Considering this likely affects only very few records, I think this is an acceptable solution for my particular use-case. |
|||
Wang–Müller line generalisation | Hi Tomas, Interesting that link to the Swiss society of cartographers website, which I hadn’t seen before. How though, do you actually buy that specific document: No. 16 -Topografische Karten – Kartengrafik und Generalisierung I do not see any link to a webshop for ordering? |
|||
Wang–Müller line generalisation | I know see in the PDF of the project you referenced, which I hadn’t read before posting my comment, that the exact combination of generalization and smoothing using ST_SimplifyVW and ST_ChaikinSmoothing, was also evaluated in the research you mention. Looks like an interesting PDF, I will make some time to properly read it later on… |
|||
Wang–Müller line generalisation | Hi Tomas, Interesting to hear about a new generalization method I hadn’t heard of before. However, as to the specific case of generalizing line elements representing waterways, and especially those tagged as waterway=stream/river in OSM, I have found a combination of ST_SimplifyVW and ST_ChaikinSmoothing, quite effective to produce small scale results. I would not recommend this combination for road networks? Why not, and why does it work for waterways IMO? There is a problem with using ST_ChaikinSmoothing on road networks, as the lines are generalized and smoothed on feature-by-feature basis, unaware of connectivity. Since many highway=x elements in OSM are not split where e.g. alleys connect to a main road, smoothing may and will cause the connectivity of the line network to be lost, as the main road does not have an end node on the junction between the highways. I have shown and explained this issue in these two posts on the openstreetmap-carto issue tracker: https://github.com/gravitystorm/openstreetmap-carto/issues/3551#issuecomment-452436841 https://github.com/gravitystorm/openstreetmap-carto/issues/3551#issuecomment-452650767 So why does it work better for waterways?: natural waterways tagged as waterway=stream/river seem to be more consistently split where waterways connect, especially in well developed countries. This is likely the consequence of people wanting the use the waterway network for detailed hydrological analysis, that requires properly defined topology and tags on individual segment. As a consequence, much less of the network connectivity issues popup if you use ST_ChaikinSmoothing on natural waterways tagged streams and rivers. In addition, even where waterways connect without the main waterway being split at the junction, and smoothing thus potentially causing a visual artefact of non-connected lines, the issue as shown in the images on the openstreetmap-carto issue tracker, is usually much less problematic due to the main waterway usually running at a relatively straight course where the line elements meet, and the side stream usually flowing into the main stream at angles less 90 degrees. In my experience, the visual results of generalization + smoothing on the natural waterways like streams and rivers, is thus pretty good. Of course, for stuff like human made ditches and canals, especially the small ones present in dense artificial network in e.g. “polders”, this is less so, but these are usually only displayed when zoomed in at large scale and do not really need generalization at all. |
|||
OSM is more up to date than the official sources | Unfortunately, the opposite situation where it is better to trust the navigation than your eyes, is also quite common due to poor road design… In another incident here in the Netherlands last year, a bus got stuck in a brand new tunnel (underpass of a massive viaduct). What happened? The bus driver mistook a 4 meter wide dual direction bicycle path - the width of an ordinary motorway lane that can carry even the largest trucks - for a dedicated bus lane. When I looked at the pictures of the junction involved giving access to the bicycle path, I wasn’t surprised at what had happened: in most such situations, one or more reflective red/white or black/white poles are used to cordon off the entrance to such a bicycle lane. None were here… Even worse, no “bicycle” sign had been painted big and in bright white stripes onto path at the entrance. In fact, except for the typical “red” colouring of the bicycle path, which is common but not universal in the Netherlands to distinguish bicycle paths from ordinary roads, and a single road sign on a pole next to the entrance signifying it as a bicycle path, there was nothing to warn the bus driver from the impending danger and mistake he could make. Coming from the opposite side of junction where the bus driver was coming from, it also felt perfectly natural to drive into this “bus lane” when reviewing the photos of the junction. Since the junction was also dozens of meters or maybe even a 100 meter or more from the actual tunnel entrance, there was nothing to give away the mistake he made. And since this part of the tunnel’s height was only designed to let through cyclists or modest height maintaince cars, he was bound to get stuck with a full size and height bus. If he was (partialy) color blind, the mistake would even be more likely, due to bicycles path red colouring possibly being indistinguishable. This is just poor road design, that could have been prevented with a few small adjustments, like the reflective poles at the entrance. Fortunately, only material damage this time. It could have been much, much worse…, with a bus acelerating down a slope into the “tunnel”. Just imagine cycling there and unexpectedly seeing a bus at 50km/hour driving straight at you with nowhere to go… |
|||
OSM is more up to date than the official sources | I am not surprised. And yes, people do end up in horrible situations due to outdated route navigation app maps (and more importantly: not using their brains and eyes but blindly trusting their app ;-)) Here in the Netherlands where I live, a road giving access to an underpass of a railway viaduct was closed permanently due to reconstruction of a junction, and only the bicycle underpass next to it was left open when accessing the underpass from that direction. Guess what?: Cars ended up in the bicycle(!) underpass of the viaduct, where they are strictly prohibited going. Multiple road signage explaining the situation, dedicated bicycle path colouring and a narrow pass through (but unfortunately just wide enough to let through one car in a single direction), couldn’t prevent it happening multiple times. I adjusted the junction just after its reconstruction on OpenStreetMap. It took Google, Here and Bing more than a year to update the situation. Only then did such mess stop, also including a few further adjustments to the junction including planting, making the situation more obvious to car drivers, although they should have comprehended the situation from the beginning had they trusted their eyes instead of just the navigation. |
|||
Improvements on "Palazzina di Caccia di Stupinigi" | Nice work. Always a joy to see these kind of formal gardens on OSM with all their lovely detail. |
|||
Update of portuguese Track & Field stadiums | Nice work, but please be careful with the tagging on the individual elements, and avoid duplications. E.g., here, you’ve added essentially the same tags and name of the track to two different objects:
So now we have duplicate objects (or at least two with the same name and tag). I would suggest to remove the leisure=track and name tags from the inner (possibly replacing leisure=pitch and sport=athletics), and only keep these tags on multipolygon. |
|||
A new digital twin of French power transmission network | Interesting work the French community did on the power network. A major achievement after so many years. |
|||
My Journey From Nigeria to Ivory coast by Road. | A humbling experience and story, and a reminder to myself how lucky I am to live in Europe with good public transport, and that I should not take it for granted because the majority of the world’s population has to do without such basic infrastructure and services. I hope this edition of SotM Africa proves worth the arduous journey for you. |
|||
Bye bye, mapping | If both names are officially recognized by a public authority, one of them could go in the “alt_name=x” tag. And yes, that again means making an arbitrary descision which name should go in “name” and which in “alt_name”, that again could result in conflict because one user would like to see name A displayed in the Standard rendering, and the other B. Alternatively, concatenating the names might be an option: “Peroli Est” / “Bagali Est” But I would personally only do this if it was based on official record that both names are still valid. This Google maps image exactly shows the ambiguity of the situation regarding this fuel station: Clearly, the small blue way-sign says “Bagali Est”, while the ENI fuel station sign says “Peroli Est”. Why you say this “Isn’t the problem” eludes me…, this is at the heart of the conflict you have with the other user. Only persuading ENI or the public authority to adjust the signs to a common name will ultimately solve this. |
|||
Bye bye, mapping | I understand your frustration, but I do think the situation, based on that changeset discussion linked by Vincent de Phily:
is quite nuanced. The problem here is that both OSM users are right to some extent. The specific place, an ENI fuel station, seems to have changed name, and unfortunately, the owner or responsible public authority, did not bother to change the name everywhere, so both the old and new name are still listed on boards / way-signs or tables for directions: “Bagali Est è il nome con cui è stata indicata inizialmente, mentre dopo qualche anno hanno installato il tabellone con il nome Priolo Est, senza però rimuovere il cartello di direzione con la vecchia denominazione” Such problems are unsolvable by individual users in OSM. There is just no “right” or “wrong” here. Personally, instead of wasting time on such an unsolvable conflict, I would spend time urging the owner or public authority to remove the ambiguity by replacing all way-signs and boards with the updated name. I realize this last advice may be a tough call in countries with lots of bureaucracy (which country hasn’t?), but it is the only thing that will ultimately solve such issues. |
|||
State of the Map 2019 (SoTM 2019): I Took All My Chances | Amazing story. Great you managed to finally visit SoTM 2019 against all odds, and present about your own interests and share your experiences. |
|||
Building generalisation: simplification | Well, for someone who claims to not be a coder, you write pretty advanced code ;-) In my case, the data I was writing about is actually not buildings (although I do use them as well), but processed forest polygons. Not that it really matters, the problems of validity checking are the same. However, this is what still confuses me: This page: https://postgis.net/docs/using_postgis_dbmanagement.html#OGC_Validity of the PostGIS online manual says: “By definition, a POLYGON is always simple” This text suggests it does not really make sense checking for simplicity for polygons? ;-()? Yet, you (only) use ST_IsSimple? Or do you use both? My code currently only checks validity using ST_IsValid in some steps, and the commands I wrote about before. No, I haven’t yet attempted to track down the exact geometry causing the issue. It is also a kind of hard, because I know the original geometries are fine, they work well in a GIS. It is just after my processing, that a few features fail (which is unfortunately a big issue, as a GIS generally requires all geometries to be fully valid). Still working on getting it right… |
|||
Building generalisation: simplification | Hi Tomas, With “Simplification for 10m” you mean you use a 10 meters tolerance? One other unrelated question, since you seem very experienced with PostGIS, have you ever had the situation where “ST_CollectionExtract(ST_MakeValid(way),3)” returned invalid polygon geometries? I am kind of at loss at this moment with a problem I have: despite running the above command after a custom generalization process I developed, which even includes a test for a polygon area of 0 (so collapsed geometries are being dropped), and an “ST_Buffer(way,0)” step to potentially fix any other bad stuff, I still get an occasional bad geometry that causes an error in “A” ;-). Have you ever experienced similar issues, and how do you guarantee valid geometries after some complex geometric processing? |
|||
Building generalisation: simplification | And does your code run properly for both lat/long (4326) and Web Mercator (3857)? |
|||
Building generalisation: simplification | Impressive, do you have any figures on the performance of this? How much time of processing for X million buildings? |
|||
Building generalisation: simplification | Thanks for the additional info. Admittedly, I haven’t had a look at the code repository yet, so aren’t yet aware of the exact process you used, but yes, I appreciate it is likely a lot more complex than just buffering. |