OpenStreetMap-logo OpenStreetMap

It’s important to see what exactly happened to features in a changeset. This means identifying the state of each feature, the history, including geometry and tags that changed. The OSM changeset page doesn’t give you a clear idea of what happened in a changeset - you see a list of features that changed, and the bounding box of the changeset.

image

The changeset XML from OpenStreetMap only has current version of the features that changed in the changeset.

Overpass offers augmented diffs between two timestamps that contains current and previous versions of each feature that changed in that period. We put together an infrastructure that queries Overpass minutely, prepares changeset representation as a JSON, and stashes them on S3. The augmented diffs are also cached on S3. This means that the load to Overpass instance can reduce drastically while many of us are looking at the same changeset.

image

This is directly used in changeset-map - a utility to visualise OSM changesets.

JSON

The cached changeset JSONs are available here: https://s3.amazonaws.com/mapbox/real-changesets/production/changeset-id.json. The JSON looks like this for a changeset by user Rezhin Ali.

This is inspired by the work Development Seed did with Planet Stream. We use osm-adiff-parser to convert the augmented diff to changeset JSON.

// 20170411184718
// https://s3.amazonaws.com/mapbox/real-changesets/production/47656996.json

{
  "elements": [
    {
      "id": "4787752634",
      "lat": "36.1823442",
      "lon": "44.0158941",
      "version": "2",
      "timestamp": "2017-04-11T13:12:35Z",
      "changeset": "47656996",
      "uid": "5323129",
      "user": "Rezhin Ali",
      "old": {
        "id": "4787752634",
        "lat": "36.1823442",
        "lon": "44.0158941",
        "version": "1",
        "timestamp": "2017-04-11T08:02:21Z",
        "changeset": "47649032",
        "uid": "5323129",
        "user": "Rezhin Ali",
        "action": "modify",
        "type": "node",
        "tags": {
          "name": "ێەبد مەنان",
          "name:ar": "ێەبد مەنان",
          "shop": "car"
        }
      },
      "action": "modify",
      "type": "node",
      "tags": {
        "name": "Abd Manan",
        "name:ar": "Abd Manan",
        "shop": "car"
      }
    }
  ],
  "metadata": {
    "id": "47656996",
    "created_at": "2017-04-11T13:12:34Z",
    "open": "true",
    "user": "Rezhin Ali",
    "uid": "5323129",
    "min_lat": "36.1823442",
    "min_lon": "44.0158941",
    "max_lat": "36.1823442",
    "max_lon": "44.0158941",
    "comments_count": "0",
    "tag": [
      {
        "k": "created_by",
        "v": "MAPS.ME ios 7.2.3"
      },
      {
        "k": "comment",
        "v": "Updated a car shop"
      },
      {
        "k": "bundle_id",
        "v": "com.mapswithme.full"
      }
    ]
  }
}

Empty changesets

It’s possible that certain changesets are empty. They could have been opened, but failed to upload changes due to unreliable network, and eventually gets closed in 60 minutes. Empty changesets are not cached.

Long changesets

Changesets can also remain open for a long time. For example this one from user Manuchehr was opened 36 mins. Experienced users like to survey outdoors, and upload data in bulk. Some editors also don’t close changesets automatically. Idle changesets get closed eventually after 60 mins.

When features of changeset comes through in a later minutely diff, we update the cache on S3. This will ensure, changeset remain complete.

Database transactions and augmented diffs

A changeset being closed doesn’t mean that all features that changed have been committed to the OSM database, and appear in the minutely diff right after. Some features may take longer to commit to the database, we handle these by updating the augmented diff from S3, and then recreating the changeset JSON. You can read more about this case here.

Missing changesets

Changesets that are after March 1, 2017 are cached. We are considering doing a slow backfill, but this is entirely dependent on Overpass. If you see something missing, or unclear, please open a ticket and let us know!

Sted: Indiranagar 1st Stage, Bengaluru, Bangalore North, Bengaluru Urban, Karnataka, 560038, India
E-mail-ikon Bluesky-ikon Facebook-ikon LinkedIn-ikon Mastodon-ikon Telegram-ikon X-ikon

Diskussion

Kommentar fra tyr_asd skrevet 12. april 2017 kl. 07:47

Hey. Great work and thanks for providing this as a service!

PS: are the cached (raw) augmented diffs also publicly available?

Kommentar fra mmd skrevet 12. april 2017 kl. 08:14

Does this imply that an augmented diff may be updated after it has been published? Is there some mechanism to find out as a data consumer that there has been an update? I think for chsngeset analysis this is fine, I see some issues to keep a local db up to date with this approach. Would you agree to this?

Kommentar fra geohacker skrevet 12. april 2017 kl. 08:27

mmd - Yes. An augmented diff may be updated after it has been published. There’s currently no way for consumers to know when a file has been updated. We do this using S3 notifications through AWS SNS, but I’m not sure how best to expose this externally.

Kommentar fra Stereo skrevet 12. april 2017 kl. 10:13

How cool. SNS can’t be exposed externally directly, but it can be used to trigger actions, e.g. push notifications, email, calls to another trigger script…

Kommentar fra umphrey1012 skrevet 12. april 2017 kl. 14:58

Awesome geohacker. Just wanted to point out that SNS topics can be exposed externally. This is how we provide notifications for a number of the datasets on https://aws.amazon.com/earth/ (like Landsat 8, Sentinel-2, etc). Below is a sample topic policy that allows S3 to post an event and anyone to subscribe from SQS and Lambda services. Can be used as a base to open up more access.

{ “Version”: “2008-10-17”, “Id”: “PublicSQSandLambdaSNS”, “Statement”: [ { “Sid”: “AllowLandsatPDSPublication”, “Effect”: “Allow”, “Principal”: { “Service”: “s3.amazonaws.com” }, “Action”: “SNS:Publish”, “Resource”: “arn:aws:sns:us-west-2:xxxx:NewSceneHTML”, “Condition”: { “ArnLike”: { “aws:SourceArn”: “arn:aws:s3:::landsat-pds” } } }, { “Sid”: “allowOnlySQSandLambdaSubscription”, “Effect”: “Allow”, “Principal”: { “AWS”: “*” }, “Action”: [ “SNS:Subscribe”, “SNS:Receive” ], “Resource”: “arn:aws:sns:us-west-2:xxxxxx:NewSceneHTML”, “Condition”: { “StringEquals”: { “SNS:Protocol”: [ “lambda”, “sqs” ] } } } ] }

Kommentar fra PierZen skrevet 21. april 2017 kl. 21:18

Great geohacker.

To monitor an area, we need to query for the changesets for a given BBOX and datetime period. We cannot rely on the OSM API to provide the list of changesetid since the Api service limits the no. of changesets to query. And Overpass do not provide this facility.

Do you plan to provide such service? This would let develop tools without first installing a server and loading all OSM data.

Kommentar fra geohacker skrevet 25. april 2017 kl. 03:33

Hey PierZen - have you tried using https://osmcha.mapbox.com/ - OSMCha let’s you query between time periods and filter by a bbox, and then visualise each changeset.

Kommentar fra PierZen skrevet 25. april 2017 kl. 03:50

I talk about developping scripts to analyze the data. If we could obtain this data as geojson outputs, it would be great.

Kommentar fra Zverik skrevet 27. april 2017 kl. 08:21

PierZen, you can use WhoDidIt for that. That won’t be too precise (it stores tiles with 0.01 degree granularity), but it is quite fast and has data since 2012.

Kommentar fra mmd skrevet 21. august 2017 kl. 16:31

A new Endless Achavi demo is up, see this post for details.

It’s giving a different perspective to the question: is caching really worth it? Should we rather spend more time improving Overpass performance instead?

Kommentar fra mmd skrevet 3. juni 2020 kl. 05:17

I got some feedback that https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/ is no longer being updated since quite some time. Is this a known issue? Is the URL still correct?

Log ind for at kommentere