OpenStreetMap logo OpenStreetMap

asciipip's Diary

Recent diary entries

I want to talk about how to name the trails in Catoctin Mountain Park, a US national park in Maryland. The available information about trail names is a bit inconsistent. This post serves as a way for me to organize my thoughts and document the conclusions I’ve reached.

Background

Catoctin Mountain Park, as I mentioned, is part of the US National Park system. It’s located in Maryland, at the northern end of the Blue Ridge Mountains and the outer perimeter of the Appalachian Mountains. It has a number of hiking trails. The trails are split between the east and west sides of the park; each side’s trails are interconnected, but the two sides don’t connect directly to each other.

I recently hiked most of the east side trails. It’s those trails I’m primarily focused on. I haven’t (yet) been to the west side, so I don’t personally know the ground truth there.

Ground Truth

On the ground, trails are designated by colored blazes on trees. The blazes use a number of different colors and several different shapes. Some distinct trails use the same colors as each other, but use different blaze shapes. There are sections where two trails overlap; those sections uses blazes that are half the color of one trail and half the color of the other trail.

There are no trail names posted, with one exception. A trail between the visitor center and the Lewis Property part of the park is both blazed with white rectangles and has regularly-placed signs saying “Gateway Trail”.

The park generally uses rectangular blazes for longer trails that form the core of the east side’s trail network. Triangular and circular blazes are used for shorter trails that either form shorter connections or have a specific purpose. For example, there’s a short nature trail with signs pointing out local plant species. That trail is blazed with triangles.

See full entry

Location: Frederick County, Maryland, United States

On Tracing from Poor Imagery

Posted by asciipip on 27 June 2012 in English.

I came across this section of an Interstate today:

The Effects of Non-Orthorectified Imagery

Notice how the Interstate looks a little like it’s been draped across the terrain’s ridges and valleys? That’s probably because the road was originally traced from non- (or not fully) orthorectified imagery. (From perusing the history of the way’s geometry, it looks like the culprit was someone in the US Census Bureau; the waviness was there in the TIGER import.)

I matched the geometry up to Maryland’s six-inch imagery (which is, in my experience, excellently aligned and rectified) with this result:

See full entry

Python Tile Expiration Class

Posted by asciipip on 10 June 2012 in English.

I have periodically needed a memory-efficient way to track tile expirations, so I wrote something to do it. Here, I make it available in case it can be useful to others.

The code is tileexpire; to use it, you just need to import the module, instantiate an OSMTileExpire object, call its expire() method for each expired tile, and then use its expiredAt() method for each zoom level you want to process.

It should be reasonably memory-efficient because it uses a quad tree that collapses it branches as they fill up, so it can handle a lot of tiles as long as they’re reasonably dense (in my experience, that’s a reasonable assumption for OSM tile expiration data).

It’s under a CC0 waiver, so you’re free to use it for anything you want, but if you make improvements, I’d love to hear about them.

Mapnik and loaded icons

Posted by asciipip on 10 April 2012 in English.

I’m posting mostly because the repetition will help me remember it, because I keep getting bitten by this.

Mapnik appears to cache images loaded by ShieldSymbolizers (and, presumably, also for LinePatternSymbolizers, PolygonSymbolizers, PointSymbolizers, and MarkersSymbolizers). This means that if you have a long-running rendering process (like Tirex or mod_tile’s renderd), that process uses an image in its rendering, and you then change the image file, Mapnik will continue rendering with the old image. You’ll have to reload the stylesheet in order to start using the new image.

Maryland and DC are ODbL-clean!

Posted by asciipip on 24 March 2012 in English.

Over the past several months, I’ve been working on replacing or accepting all of the data in Maryland marked as problematic by the OSM Inspector. As of today, I’m done (and I touched up DC, too, though others did most of the work there). I have looked at every problematic bit of data and either marked it clean (for mechanical edits like expanding name abbreviations), made it clean (by removing all data contributed by non-agreeing contributors), or deleted and replaced it (using only OSM-compatible data, of course). The only exceptions are standalong nodes with tags I couldn’t verify remotely that had been created by a non-agreeing contributor and never touched by anyone else. Those will probably be deleted when the license change is effected, but they won’t affect the surrounding data at all.

Next up is seeing what I can do to help clean up Virginia. The area next to DC is decidedly unhappy.

How I Map: TIGER Cleanup

Posted by asciipip on 16 February 2012 in English.

This is the first in what will probably be a very occasional series about how I do things in OpenStreetMap. In this post, I'll discuss how I improve the quality of TIGER-imported roads.

In general, I have at least a two-stage process for working on an area. In the first stage, I armchair-map: I use the USGS orthoimagery available for my state (which is 6-inch resolution and excellently rectified and georeferenced) to trace the roads and other major features in the area. I'll generally double-check that tracing with NAIP to make sure I'm not uploading old features. In the second stage, I use Walking Papers to get a printout of the area (usually in separate, page-sized chunks) and drive through it, verifying the road names.

What I'd like to talk about, though, are a couple things that are specific to how I work with TIGER-sourced ways and how they fit into my general workflow.

tiger:reviewed

There are many different approaches to this tag. I use it as a marker as to which stage of my editing a road is in. If its value is "no", either I haven't worked with the full length of the way or I did so before I settled on my current workflow. When I align the way to aerial imagery (using JOSM's excellent Improve Way Accuracy mode), I change the value to "position". Finally, when I've verified the name via ground survey, I remove the tag.

For roads that don't typically have names, like _link roads or service roads, I just delete the tiger:reviewed tag after aligning them to aerial imagery, unless TIGER appears to have given a name to the road anyway. I see the last case most often with personal named driveways.

I've written a couple of things to assist my particular use of the tiger:reviewed tag. For JOSM editing, I wrote a style that highlights "tiger:reviewed=position" in a light green, similar to the way "tiger:reviewed=no" ways are highlighted in yellow. I've made a screenshot of the highlighting and the CSS is in TIGER-aligned.css.

See full entry

Downloading the USGS's Data

Posted by asciipip on 8 July 2011 in English. Last updated on 13 June 2012.

The US Geological Survey has a lot of data, some of which is available via WMS, and some of which is not. They'll give you any of it for free if you ship them a hard drive to put it on, but that can be both inconvenient (waiting weeks for the drive's round trip) and imprecise (they will only send entire datasets, even if you're only interested in a small portion of one). You can also download data from their seamless server, but downloading is a pain: you have to select an area (which must contain less than 1.5 GB of data in all), then their servers will divide that data into sub-250MB chunks and put the chunks on a webserver with randomly-generated filenames which will be deleted after an hour.

Fortunately, the USGS has recently implemented a set of web services that let you write programs to download chunks of data from them.

I've written a simple python program, get_usgs_data, to download all data for a dataset within a given region.

The program requires wget and the Suds and BeautifulSoup python modules. To use it, run get_usgs_data --bbox=left,bottom,right,top product, where left, right, top, and bottom are bounding coordinates in WGS84 and product is a valid USGS product ID. You can query for product IDs with the online index service. Look for one of the "return_Attributes" calls; the string you need is in the PRODUCTKEY attribute.

Addendum:

More information on getting the right product key.

The return_Attribute_List SOAP call lists the informational fields available for each dataset. It takes no parameters, so just click through to its service page and click the "Invoke" button. I usually use a combination of AREA_NAME, RESOLUTION, TYPE, and of course PRODUCTKEY. Basically, you want enough information to tell which product key corresponds to the dataset you want.

See full entry

I'm happy with some of the tweaks I made to my tile generation setup. I mentioned it on IRC and one person asked for a writeup (not that it's all that complicated), so here it is.

I really like TileStache for tile generation (plus it's a nice tool for turning WMS and ArcGIS REST interfaces into tiles, too). It does on-demand tile rendering with a selection of caching strategies, including caching to S3.

The nice thing about S3 is that it's cheaper, both storage- and bandwidthwise than my webhost. The problem I ran into is that TileStache's caching, for flexibility reasons, doesn't know anything about the particulars of each cache mechanism. It just knows that if a tile is cached, it reads it out of the cache and serves it to the client, otherwise it renders the tile and caches it. That ends up double-billing me for each cached tile: once from Amazon when the tile is read out of S3 and again from my webhost when the tile is sent to the client.

All I did was add a check for the tile on S3 before calling TileStache and, if it's already cached, send the client a redirect to fetch it from S3 directly. The script is on pastebin.

I also hacked my copy of TileStache to use reduced redundancy storage for cached tiles. If they get lost, they can just be rerendered, and it's cheaper this way.

Incidentally, I really like my webhost. If you're looking for cheap, pay-for-what-you-use hosting for a small to medium site, you could do worse than NearlyFreeSpeech.NET.

Baltimore City Landuse Map

Posted by asciipip on 29 May 2011 in English.

After my bad experience with attempting to import Baltimore City landuse data (see previous diary entry), I took another tack and created a map rendering that can be traced from within an editor. I've now got that rendering available at http://aperiodic.net/osm/baltimore.html and documented at http://wiki.openstreetmap.org/wiki/Baltimore,_Maryland/Landuse.

Any suggestions on rendering improvements or other clarity-enhancing changes would be appreciated.

A Brief Dalliance with Imports

Posted by asciipip on 24 May 2011 in English.

A lot of people are, if not opposed, at least strongly skeptical of imports. Last week, there were a few opinions on the subject, including one that offered, "Never trust robots," as a policy statement.

Naturally, this was the same week I found Baltimore City's Open Data Catalog, which is full of public domain data, some of which could be very useful to OpenStreetMap. I decided I wanted to try importing the landuse data, since that would liven up the city's OSM data and wouldn't, I thought, need too much work to integrate it into existing data. I figured I'd convert the shapefile to an OSM file, tag every landuse with some "unprocessed" tag, then go through the whole city and make sure any existing landuses were incorporated into the process, so no one's data would be lost or ignored. I planned on testing out this process in a few areas of the city to make sure it was feasable, emailing the talk-us list and the other people who had contributed to Baltimore mapping, and then proceeding if there weren't any objections.

I didn't get that far. The shapefile was such a mess relative to the topological quality of data I would expect from OpenStreetMap that I decided it wasn't worth the effort it would take to clean it up. There were tons of places with pointlessly overlapping landuses, others with overlaps that might or might not have been pointless, nodes that ought to be shared in OSM but which weren't quite close enough in the shapefile to have been merged during the preprocessing, and quite a lot of tiny slivers of areas an inch or less wide.

See full entry

Location: Inner Harbor, Baltimore, Maryland, United States

Back in 2006, several state, federal, and private groups formed the Statewide Ortho Imagery Partnership, which funded the acquisition of high-resolution (6 inches per pixel) orthoimagery. Partly because the US Geological Survey was one of the funding partners, the imagery was put into the public domain. The images were recorded between fall 2007 and spring 2008, so most of it is leaf-off. On top of that, the georectification seems excellent. In every place where I have multiple GPS traces to consult, the imagery matches up extremely well with the traces, and there seems to be very little distortion in the images. I haven't found any documentation of error margins for the imagery, but I'd be surprised if the error were greater than a meter.

My mapping has benefited enormously from the availability of this imagery. The rectification means that I can largely rely on the images for precise feature alignment and focus my surveying on things not visible in the imagery like addresses and names of things. The completeness of coverage has allowed me to map things that are either difficult to survey in person (like streams that run through remote areas or private property) or impossible (like much of the high voltage power line network in the state).

See full entry

Minutiae about Tile Rendering Queues

Posted by asciipip on 20 April 2011 in English.

I do my own map rendering, primarily to preload my phone's mapping program with offline tiles, but also to have a rendering that highlights things of interest to me for my region. My rendering is based on TopOSM, but with a lot of personal hacks and changes (the more general of which I've contributed back).

Until recently, my update process was very manual: run scripts to render all tiles for my rendering areas (which include overlapping bboxes of different shapes), run another script to combine the tiles for the three TopOSM layers (color-relief, contours, features) into single tiles for my phone, run a third script to use the expire files from my automated minutely mapnik setup to delete any tiles with expired data, run the first script again to rerender any deleted files, run the second script again to recombine any changed files, etc., etc. I wanted something that I could set up and leave running. As far as I could tell, the other render daemons in use (Tirex and mod_tile's renderd) just have a one-to-one correspondence between mapnik stylesheets and rendered tilesets. TopOSM is a little more complex (but oh, so much prettier). So I assembled my own.

The core of my change was externalizing my render queue. Rather than having a python script create a Queue object, fill it with tiles to be rendered, and spinning off render threads to consume the queue, I set up RabbitMQ to manage the work queues. That lets me feed the queues from one process and consume them from completely separate processes. The bit that I'm proud of has been my queue setup, so that's what I'm going to talk about.

See full entry

One Year of OpenStreetMap

Posted by asciipip on 16 March 2011 in English.

Today is the one year anniversary of when I joined OpenStreetMap, so I figured I'd look back at some of the large-scale work I've done in Maryland.

I started out not even knowing about OpenStreetMap; I just wanted a program for my phone (a first generation Palm Pre) that would track my travels and draw pretty lines on a map. I found MapTool, which used OpenStreetMap tiles. At some point I figured out that I could edit the map it was showing and started to do so, beginning with the areas I knew well around my workplace and home.

My first large edits were to the county boundaries in Maryland. I noticed that they were in the database but weren't being rendered, which I realized was because they had no admin_level tag. Because I'm a completionist, rather than fix just the boundary that bugged me initially, I converted all the county boundaries into multipolygons with shared ways and made sure each way had an admin_level.

At some point, I found that the USGS had six inch resolution imagery for all of Maryland, even the really remote counties. That gave me the ability to "see" into places I couldn't walk into and places that were too far to visit and revisit in multiple surveying trips. In particular, the resolution was good enough to trace individual power lines through a messy substation that was very much access=no to me. I decided to try to map all the power lines in Maryland. After several months of work with the aerial imagery, I got the map to the point where every power line in Maryland (as of the 2007/2008 imagery) was mapped, and any data from the TIGER import that did not represent an existing power line was removed. (Or just reduced to a minor line. A lot of sub-100kV lines were tagged as power-line via TIGER.)

Here are the before and after renderings:

See full entry

Selective Tile Expiration

Posted by asciipip on 28 November 2010 in English.

In my experimentation with tile rendering, I often want to rerender only a subset of the tiles that I've previously rendered, usually because I've changed parameters just for secondary roads or something similar.

To facilitate that, I wrote a script: expire-query. It takes an SQL query (or queries) as a parameter, runs the query, and lists the tiles covered by the results of the query in the same format that osm2pgsql does (so any tools that operate on those files will work with this script, too).

To run it, you'll need Ruby, Rubi DBI, and the PostgreSQL Ruby DBI driver (which is included in the Ruby DBI distribution).

The --help option should cover any usage questions.

Rendering Route Shields from Route Relations in Mapnik

Posted by asciipip on 13 October 2010 in English. Last updated on 14 October 2010.

[For some reason I can't get the command and config excerpts below to look like I want, which is basically what I'd get if I put them in a <pre> block. I've done the best I can.]

I've been getting back to map rendering, and one of the things that bothers me about pretty much everyone's rendering is that they either don't render US route shields or they go through hackery (in my opinion) to turn road ref tags into a shield plus a number.

I've wanted for a while to use the data in route relations for shield rendering. At least in the US, they use separate tags for network and route number, which makes picking the right shield easier. They also handle the case of a single road belonging to multiple routes better than the road's ref tag does.

The drawbacks are in the way that osm2pgsql puts relations into the database. When a way is a road and a member of a route relation, osm2pgsql creates one row with the road's tags and a separate row with the same path as the road but the tags from the relation. As far as I can tell, there's no simple link between them. This means that you can't use the road's highway key to determine visibility of the shield, which means, among other things, that you have to possibility of a shield being rendered at a lower zoom level than its road.

If the data was imported in slim mode, the planet_osm_rels table does contain a link between a road and its relations. I tried using it directly, but that proved to be too slow for everyday use. I ended up creating another table and running queries off of that. The rest of this diary is about what I did.

See full entry

Project Completion!

Posted by asciipip on 5 October 2010 in English.

Several months back, I was pleased to find the USGS's high resolution orthophotography; it's public domain, so it can be used in OSM, and almost* the entire state of Maryland is available in 6" (15 cm) resolution. As I was touching up some power lines near where I work, I realized that the resolution was high enough that I could follow individual lines, which let me sort out some things that I couldn't see well enough in the Yahoo! imagery and couldn't get close enough to the lines to see in person.

I decided to set a goal of mapping all of the power lines in the state (more precisely, to edge of the state's aerial photography, which extends a small amount into neighboring states), including cables, wires, and voltages. (The voltages came from some state documents available online.)

As of this evening, I'm done. Every major power line (>100kV) in the 2007 aerial photography is mapped, and every power line from TIGER is either verified to exist, retagged as a minor_line, or deleted (if nothing actually exists where TIGER had the lines).

I think my next project will be the railroads in Maryland. TIGER's not bad with them, but I've been adding all the tracks when there's more than one and giving them appropriate names. (TIGER mostly uses the railroad operators, and often past operators, in the name fields; I've been putting those into the operator tag and putting the lines' actual names into the name tags.)


*The lower half of Harford County is missing. I can only assume this is an oversight, since the entire rest of the state, including the sparsely-populated Garrett and Alleghany Counties, is present, plus the layer for the northern half of the county is named "MD_HarfordCounty_0.5ft_Color_Feb_2008_01", but there's no "..._02" layer.

SRTM and NED elevation data

Posted by asciipip on 15 June 2010 in English.

In the comments on my previous diary entry, people pointed out that the CGIAR data has a no-commercial-use restriction on it, which would be incompatible with OSM's database license if I wanted to derive anything from it. I was only planning on using it for display purposes, but I decided to look into other sources of elevation data anyway. I found the National Elevation Dataset from the US Geological Survey.

What's nice is that there are actually three NED datasets: NED1, NED3, and NED9. NED1 has a resolution of one arc-second, the same as SRTM1. NED3, however, is 1/3 of an arc-second, and NED9 is 1/9 of an arc-second, which ends up being about 3 meters. NED9 isn't available everywhere, so I decided to use NED3 and NED9 together.

A very illustrative test rendering is the Marriottsville Quarry in Howard County. For reference, here's what it looks like with the SRTM data:

Marriottsville Quarry with SRTM1 hillshading

See full entry

Further Adventures in Hillshading

Posted by asciipip on 18 May 2010 in English.

At the suggestion of a comment on my last diary entry, I decided to experiment with putting the hillshading information in the alpha channel of an overlay for the rest of the map, more or less as described in osm.wiki/Hillshading_using_the_Alpha_Channel_of_an_Image .

I also tweaked some of the other settings. I toned down the relief colors. There's still a visual distinction between various elevations, but the background colors don't interfere as much with the foreground colors. I also reorganized some of the rendering layers. I'm rendering things in this order: color relief, landarea/landuse, contour lines, water bodies. I think this ordering gives a nice indication of both elevation and landuse without either interfering with the other.

Finally, I found the scaling=bilinear8 option for Mapnik's RasterSymbolizer, which gives a lot smoother hillshading; the shading now looks good down to zoom 16 or so, and passable all the way down to zoom 18.

Here's one of the rendering areas from the last diary entry under the new rendering rules, both with and without the hillshading:

See full entry

Hillshading Plus Relief Coloring

Posted by asciipip on 10 May 2010 in English.

This weekend, I took a break from Cascadenik rules to play with rendering elevation data, mostly working off osm.wiki/Contours and osm.wiki/HikingBikingMaps . I'm keeping notes on my process, so when I've got something good I'll document everything in more detail, but I'm reasonably pleased with my results so far.

I'd previously set up contour lines and figured out how to have either a greyscale hillshaded or color relief underlay for my rendering, so I decided to work on combining the hillshading and color relief images. My rendering target is the US, so I started with the SRTM1 data and interpolated that from 30x30 meter pixels down to 15x15 meter pixels, then I used the PerryGeo utilities to generate separate hillshaded and colored relief images. Following that, I used the GIMP to combine them.

Unfortunately (for the simplicity of my rendering process) I'd like to render areas that span more than one SRTM tile. I say "unfortunately" because I'm still having problems with that aspect of things: I can't use gdal_merge.py to combine the images before processing them with the GIMP, because it makes BigTIFFs, which the GIMP won't read (my libtiff doesn't support them). That means I have to generate the shaded and colored images separately for each SRTM tile, combine them, and then use gdal_merge.py. I got most of the process down (I learned a little GIMP Script-FU to automate the merging process), but in my final merged image, I have thick black lines at the borders of the original SRTM tiles. It looks like the hillshading program is putting a black border around the edge of its generated image. Weirdly, the lines show up when I use the generate_tiles.py script, but not the generate_image.py script (the first image below spans the boundary between the N39W77 and N39W78 tiles). I need more research here.

See full entry