OpenStreetMap logo OpenStreetMap

pnorman's Diary

Recent diary entries

Aggregating Fastly logs

Posted by pnorman on 1 September 2023 in English.

The Standard Tile Layer has a lot of traffic. On August 1st, a typical day, it had 2.8 billion requests served by Fastly, about 32 thousand a second. The challenges of scaling to this size are documented elsewhere, and we handle the traffic reliably, but something we don’t often talk about is the logging. In some cases, you could log a random sample of requests but that comes with downsides like obscuring low frequency events, and preventing some kinds of log analysis. Critically, we publish data that depends on logging all requests.

We query our logs with Athena, a hosted version of Presto, a SQL engine that, among features, can query files on an object store like S3. Automated queries are run with tilelog, which publishes files daily to generate published files on usage of the standard tile layer.

As you might imagine, 2.8 billion requests is a lot of log data. Fastly offers a number of logging options, and we publish compressed CSV logs to Amazon S3. These logs are large, and suffer a few problems for long-term use because they:

  1. contain personal information like request details and IPs, that, although essential for running the service, cannot be retained forever;
  2. contain invalid requests, making analysis more difficult;
  3. are large, being 136 GB/day; and
  4. become slow to query, being compressed gzip files with the only indexing being the date and hour of the request, which is part of the file path.

To solve these problems we reformat, filter, and aggregate logs which lets us delete old logs. We’ve done the first two for some time, and are now doing the third.

See full entry

Maxar usage over the last year

Posted by pnorman on 8 July 2023 in English. Last updated on 9 July 2023.

I was curious about the usage of Maxar over the last year, so did some quick work to see where it was used. To start, I used a Python 3 version of ChangesetMD to load the latest changesets into PostgreSQL, using the -g option to create geometries.

I then, with a bit of manual work, identified the changesets of the last year are those between 122852000 and 137769483. Using this, and knowledge of tags normally used with maxar, I created a materialized view with just the Maxar changesets

CREATE MATERALIZED VIEW maxar AS
SELECT id,
    num_changes,
    st_centroid(geom)
FROM osm_changeset
WHERE id BETWEEN 122852000 and 137769483
    AND (tags->'source' ILIKE '%maxar%' AND tags->'imagery_used' ILIKE '%maxar%');

This created a table of 2713316 changesets, which is too many to directly view, so I needed to get it by country.

I did this with the border data from country-coder

curl -OL 'https://raw.githubusercontent.com/rapideditor/country-coder/main/src/data/borders.json'
ogr2ogr -f PostgreSQL PG:dbname=changesetmd borders.json

This loaded a quick and dirty method of determining the point a country is in into the DB, allowing me to join the tables together

SELECT COALESCE(iso1a2, country), COUNT(*)
FROM maxar JOIN borders ON ST_Within(maxar.st_centroid, borders.wkb_geometry)
GROUP BY COALESCE(iso1a2, country)
ORDER BY COUNT(*) DESC;

See full entry

Tilelog country data

Posted by pnorman on 22 May 2023 in English.

I added functionality to tilelog to generate per-country usage information for the OSMF Standard Map Layer. The output of this is a CSV file, generated every day, which contains country code, number of unique IPs that day, tiles per second, and tiles per second that were a cache miss, all for each country code.

With a bit of work, I manipulated the files to give me the usage from the 10 countries with the most usage, for the first four months of 2023.

Tile usage per country by date

Perhaps more interesting is looking at the usage for each country by the day of week.

See full entry

The Operations Working Group is looking at what it take to deprecate HTTP Basic Auth and OAuth 1.0a in favour of OAuth 2.0 on the main API in order to improve security and reduce code maintenance.

Some of the libraries that the software powering the API relies on for OAuth 1.0a are unmaintained, there is currently a need to maintain two parallel OAuth interfaces, and HTTP Basic Auth requires bad password management practices. OAuth 2.0 libraries should be available for every major language.

We do not yet have a timeline for this, but do not expect to shut off either this year. Before action is taken, we will send out more notifications. Deprecation may be incremental, e.g., we may shut off creation of new applications as an earlier step.

What can you do to help?

If you are developing new software that interacts with the OSM API, use OAuth 2.0 from the start. Non-editing software can require authentication support, e.g. software that checks if you have an OSM login.

If you maintain existing software, then look into OAuth 2.0 libraries that can replace your OAuth 1.0a ones. We do not recommend implementing support for either protocol version “by hand”, as libraries are readily available and history has shown that implementing your own support is prone to errors.

If you do not develop software that interacts with the OSM API, this change will not directly impact you. You may need to update software you use at some point.

I have been developing Street Spirit, a new style using OpenStreetMap data. It uses Maplibre GL for client side rendering of MVTs generated by Tilekiln, which supports minutely updates using the standard osm2pgsql toolchain.

To focus style development, I have set its aims as being suitable for

  • use as a locator map,
  • to show off what can be done with OpenStreetMap data,
  • to be up-to-date with the latest OpenStreetMap data, and
  • using to orient a viewer to a location they are at.

Although not complete - if a style can ever said to be complete - it is at the point where there’s enough features to give the overall feel of the map, at least for zooms 12 and higher. Lower zooms are missing many features still, particularly roads and rail and some landcover and other fills.

Because the style has a more clearly defined purpose, I’ve been able to use more of the colour pallet than many other styles, particularly compared to styles designed for overlaying other data on top of.

I’ve set up a dev instance on one of my servers, using OSM data from 2023-02-27. Have an explore around.

Some of the bigger areas that need work are

  • Missing mid- and low-zoom features
  • Missing fills
  • A consistent set of POI icons
  • More POIs

If you’re interested in contributing to the work, let me know. Contributing will require some technical knowledge in the following areas

  • MapLibre GL style specification, focusing on layers and expressions, including data-driven expressions;
  • YAML, in particular appropriate indentation for arrays. MapLibre GL styles tend to feature deeply nested arrays; and
  • SQL for writing read-only PostGIS queries if modifying vector tiles.

See full entry

OpenStreetMap Carto release v5.7.0

Posted by pnorman on 11 January 2023 in English.

Dear all,

Today, v5.7.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include - Unpaved roads are now indicated on the map (#3399)

  • Country label placement improved, particularly for countries in the north (#4616)

  • Added elevation to wilderness huts (#4648)

  • New index for low-zoom performance (#4617)

  • Added a script to switch between script variations for CJK languages (#4707)

  • Ordering fixes for piers (#4703)

  • Numerous CI improvements

Thanks to all the contributors for this release, including wyskoj, tjur0, depth221, SlowMo24, altilunium, and cklein05, all new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.6.2…v5.7.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OSM usage by country

Posted by pnorman on 22 November 2022 in English.

I gathered some statistics about usage of the website and tiles in 2022Q3.

I looked at total tile.osm.org usage, tile.osm.org usage from osm.org itself, osm.org visits, and osm.org unique visitors.

Here’s the data for the top 20 countries.

country osm.org tile requests total tile requests Website visits Website unique Visitors
DE 17.79% 7.78% 8.27% 7.98%
RU 12.23% 8.49% 2.47% 2.43%
US 8.72% 9.22% 13.11% 13.56%
PL 7.69% 4.99% 3.09% 2.80%
GB 4.85% 3.68% 4.42% 4.42%
FR 4.79% 7.00% 3.91% 3.94%
NL 3.62% 3.31% 2.17% 2.09%
IT 3.49% 3.46% 4.74% 4.86%
IN 2.64% 2.66% 3.67% 3.16%
CN 2.62% 0.79% 2.65% 2.72%
AT 2.03% 0.89% 0.98% 0.91%
UA 1.78% 1.98% 1.20% 1.21%
CH 1.41% 0.71% 0.83% 0.82%
CA 1.29% 1.59% 1.36% 1.39%
BE 1.29% 1.06% 1.10% 1.03%
ES 1.27% 2.41% 2.32% 2.39%
JP 1.10% 1.54% 1.74% 1.71%
AU 1.09% 0.92% 0.88% 0.82%
SE 0.91% 0.95% 0.87% 0.88%
FI 0.89% 0.74% 0.74% 0.71%

I’ve put the full data into a gist on github

OpenStreetMap Carto release v5.6.1

Posted by pnorman on 12 August 2022 in English.

Dear all,

Today, v5.6.1 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Fixing rendering of water bodies on zooms 0 to 4

Thanks to all the contributors for this release.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.6.0…v5.6.1

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Carto release v5.6.0

Posted by pnorman on 3 August 2022 in English.

Dear all,

Today, v5.6.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • using locally installed fonts instead of system fonts, for more up to date fonts;
  • changing tree and tree row colours to the same colour as areas with trees;
  • rendering parcel lockers; and
  • rendering name labels of bays and straights from z14 only, and lakes from z5

Thanks to all the contributors for this release including GoutamVerma, yvecai, ttomasz, and Indieberrie, new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.5.1…v5.6.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Carto could use more help reviewing pull requests, so if you’re able to, please head over to Github and review some of the open PRs.

This is a bit less OpenStreetMap related then normal, but has to do with the Standard Tile Layer and an outage we had this month.

On July 18th, the Standard Tile Layer experienced degraded service, with 4% of traffic resulting in errors for 2.5 hours. A significant factor in the time to resolve the incident was a lack of visibility of the health status of the rendering servers. The architecture consists of a content delivery network (CDN) hosted by Fastly, backed by 7 rendering servers. Fastly, like most CDNs, offers automatic failover of backends by fetching a URL on the backend server and checking its response. If the response fails, it will shift traffic to a different backend.

A bug in Apache resulted in the servers being able to handle only a reduced number of connections, causing a server to fail the health check, diverting all load to another server. This repeated with multiple servers, sending the load between them until the first server responded to the health check again because it had zero load. Because the servers were responding to most of the manually issued health checks and we had no visibility into how each Fastly node was directing its traffic, it took longer to find the cause than it should have.

Our normal monitoring is provided by Statuscake, but this wasn’t enough here. Instead of increasing the monitoring, we wanted to make use of the existing Fastly healthchecks, which probe the servers from 90 different CDN points. Besides being a vastly higher volume of checks, this more directly monitors the health checks that matter for the service

During the incident, Fastly support provided some details on how to monitor health check status. Based on this guide, the OWG has set up an API on the tile CDN to indicate backend health, and monitoring to track this across all POPs.

See full entry

OpenStreetMap Carto Release v5.5.1

Posted by pnorman on 13 July 2022 in English.

Dear all,

Today, v5.5.1 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

The one change is a bugfix to the colour of gates (#4600)

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.5.0…v5.5.1

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Carto release v5.5.0

Posted by pnorman on 10 July 2022 in English.

Dear all,

Today, v5.5.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Fixed colour mismatch of car repair shop icon and text (#4535)

  • Cleaned up SVG files to better align with Mapnik requirements (#4457)

  • Allow Docker builds on ARM machines (e.g. new Apple laptops) (#4539)

  • Allow file:// URLs in external data config and caching of downloaded files (#4468, #4153, #4584)

  • Render mountain passes (#4121)

  • Don’t use a cross symbol for more Christian denominations that don’t use a cross (#4587)

Thanks to all the contributors for this release, including stephan2012, endim8, danieldegroot2, and jacekkow, new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.4.0…v5.5.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

I’m working on publishing a summary of sites using tile.osm.org and want to know what format would be most useful for people.

The information I’ll be publishing is requests/second, requests/second that were cache misses, and domain. The first two are guaranteed to be numbers, while the last one is a string that will typically be a domain name like www.openstreetmap.org, but could theoretically contain a poisoned value like a space.

The existing logs which have tiles and number of requests are formatted as z/x/y N where z/x/y are tile coordinates and N is the number of accesses.

My first thought was TPS TPS_MISS DOMAIN, space-separated like the existing logs. This would work, with the downside that it’s not very future proof. Because the domain can theoretically have a space, it has to be last. This means that any future additions will require re-ordering the columns, breaking existing usage. Additionally, I’d really prefer to have the domain at the start of the line.

A couple of options are - CSV, with escaping - tab-delimited

Potential users, what would work well with the languages and libraries you prefer?

An example of the output right now is

1453.99 464.1 www.openstreetmap.org  
310.3 26.29 localhost
136.46 39.68 dro.routesmart.com
123.65 18.54 www.openrailwaymap.org
107.98 0.05 www.ad-production-stage.com
96.64 1.78 r.onliner.by
91.42 0.16 solagro.org
87.83 1.53 tvil.ru
84.88 12.98 eae.opekepe.gov.gr
74.0 2.32 www.mondialrelay.fr
63.44 1.93 www.lightningmaps.org
63.22 14.01 nakarte.me
55.1 0.74 qualp.com.br
52.77 11.25 apps.sentinel-hub.com
46.68 4.07 127.0.0.1
46.3 1.96 www.gites-de-france.com
43.47 1.15 www.anwb.nl
42.46 10.52 dacota.lyft.net
41.13 6.63 www.esri.com
40.84 0.69 busti.me

The OpenStreetMap Foundation runs several services subject to usage policies.

If you violate the policies, you might be automatically or manually blocked, so I decided to write a post to help community members answering questions from people who got blocked. If you’re a blocked user, the best place to ask is in the IRC channel #osm-dev on irc.oftc.net. Stick around awhile to get an answer.

The most important question is which API is being used. For this, look at the URL you’re calling.

If the URL contains nominatim.openstreetmap.org, review the usage policy. The most common cause of being blocked is bulk geocoding exceeding 1 request per second. Going over this will trigger automatic IP blocks. These are automatically lifted after several hours, so stop your process, fix it, wait, and then you won’t be blocked.

If you’re using nominatim but not exceeding 1 request per second, to get help you should provide the URL you’re calling, the HTTP User-Agent or Referer you’re sending, the IP you’re requesting from, and the HTTP response code.

If you’re calling tile.openstreetmap.org or displaying a map, review the tile usage policy. The most common causes of being blocked is tile scraping or apps that don’t follow the usage policy.

To get help you should provide where the map is being viewed (e.g. an app, website, or something else), the HTTP User-Agent or Referer you’re sending, the IP you’re requesting from, and the HTTP response code. For a website, you can generally get this information through the browser’s developer tools. The tile.openstreetmap.org debug page will also show you you this information.

If you’re having problems with an app that you’re not the developer of, you’ll often need to contact them, as they are responsible for correctly calling the services.

OpenStreetMap Carto release v5.4.0

Posted by pnorman on 23 September 2021 in English.

Dear all,

Today, v5.4.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Added a new planet_osm_line_label index (#4381)
  • Updated Docker development setup to use offical PostGIS images (#4294)
  • Fixed endline conversion issues with python setup scripts on Windows (#4330)
  • Added detailed rendering of golf courses (#4381, #4467)
  • De-emphasized street-side parking (#4301)
  • Changed subway stations to start text rendering at z15 (#4392)
  • Updated road shield generation scripts to Python 3 (#4453)
  • Updated external data loading script to support pyscopg2 2.9.1 (#4451)
  • Stopped displaying tourism=information with unknown information values
  • Switched the Natural Earth URL to point at its new location (#4466)
  • Added more logging to the external data loading script (#4472)

Thanks to all the contributors for this release including ZeLonewolf, kolgza, and map-per, new contributors

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.3.1…v5.4.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Standard Layer: Requests

Posted by pnorman on 29 July 2021 in English. Last updated on 30 July 2021.

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

With the switch to a commercial CDN, we’ve improved our logging significantly and now have the tools to log and analyze logs. We log information on both the incoming request and our response to it.

We log

  • user-agent, the program requesting the map tile;
  • referrer, the website containing a map;
  • some additional headers;
  • country and region;
  • network information;
  • HTTP protocol and TLS version;
  • response type;
  • duration;
  • size;
  • cache hit status;
  • datacenter;
  • and backend rendering server

We log enough information to see what sites and programs are using the map, and additional debugging information. Our logs can easily be analyzed with a hosted Presto system, which allows querying large amounts of data in logfiles.

I couldn’t do this talk without the ability to easily query this data and dive into the logs. So, let’s take a look at what the logs tell us for two weeks in May.

See full entry

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

The OpenStreetMap Standard Layer is the default layer on openstreetmap.org, using most of the front page. It’s run by the OpenStreetMap Foundation, and the Operations Working Group is responsible for the planning, organisation and budgeting of OSMF-run services like this one and servers running it. There are other map layers on the front page like Cycle Map and Transport Map, and I encourage you to try them, but they’re not hosted or planned by us.

Technology

At the high level, this is the overview of the technology the OWG is responsible for. The standard layer is divided into million of parts, each of which is called a tile, and we serve tiles.

Flowchart of rendering

See full entry

As part of figuring out how to best process standard tile layer logs I had a chance to generate some charts for usage of the OpenStreetMap Standard tile layer on the day of 2021-03-14, UTC time. This was over a weekend, so there are probably differences on a weekday. I’m also only looking at tiles delivered and not including blocked tiles from scrapers and similar usage. All traffic is in tiles per second, averaged over the day.

Countries

I first looked at usage of the layer from users on openstreetmap.org and all users, by country.

See full entry

I’ve been doing some log analysis on requests to the OSMF-hosted standard tile layer on tile.openstreetmap.org. To do this I downloaded two hours worth of logs and loaded them into PostgreSQL to run some queries. The logs start at ###, and total 11GB compressed starting at 1600 UTC on 2021-02-25.

My main concern has been backend server load, so to analyze that I looked at cache misses - requests where the cache has to request a tile from the OSMF-operated backend servers. Typically the number of tiles requested is going to be five to ten times higher than the number of misses, but it will vary by zoom.

Total cache misses were 3437.4 per second.

The top five referers, as well as some interesting ones are

Referer domain Cache misses per second
None 1254.3
www.openstreetmap.org 418.7
www.openrailwaymap.org 16.0
apps.sentinel-hub.com 15.0
m.turkiye.gov.tr 13.1
localhost, on various ports 30.7
10.* IPs 14.2
Other 1675.4

The top sites vary by time and what parts of the world are awake, but most of the traffic is from the long tail of small sites, OpenStreetMap itself, or an app which should be sending a custom user-agent instead of a website with a referer.

For user-agents, I grouped different versions of some apps together. Like before, I’ve got some of the top ones, then a few interesting ones.

See full entry

Location: 0.000, 0.000

OpenStreetMap Survey by visits

Posted by pnorman on 21 February 2021 in English.

In my last post I looked at survey responses by country and their correlation with mappers eligible for a fee waver as an active contributor.

I wanted to look at the correlation with OSM.org views. I already had a full day’s worth of logs on tile.openstreetmap.org accesses, so I filtered them for requests from www.openstreetmap.org and got a per-country count. This is from December 29th, 2020. Ideally it would be from a complete week, and not a holiday, but this is the data I had downloaded.

Preview image

The big outlier is Italy. It has more visits than I would expect, so I wonder if the holiday had an influence. Like before, the US is overrepresented in the results, Russia and Poland are underrepresented, and Germany is about average.

Like before, I made a graph of the smaller countries.

See full entry