Logo OpenStreetMap OpenStreetMap

A minute of facts about the duration of changesets

Diposkan oleh TrickyFoxy pada 31 Januari 2024 dalam English

Disclaimer: I used changesets through August 2023 to calculate the statistics

  • 84% of changesets closed within a minute

  • 99.6% closed within two hours

Diagram of the distribution of changesets durations (count/duration in seconds):

The upper part is in the form of a table:

  • only 1201 were open for more than 12 hours

  • from 2020 to August 2023, only 53 more than 12 hours (of which 39 were made by wheelmap_visitor, 2 by StreetComplete, and the rest by JOSM).


Warning, question:

  • Are we sure we want to spend the whole day monitoring what the user does in their changeset?
Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Ulasan Xvtn terhadap 31 Januari 2024 pada 16:51

I had no idea until now that changeset creation time was even recorded. Interesting!

Ulasan Pieter Vander Vennet terhadap 1 Febuari 2024 pada 01:07

A changeset is opened by sending some metadata to the server (e.g. contributor ID, editor, comment, …).

Then, one or more ‘changeset XML’-files can be uploaded.

Then, the changeset is closed if:

  • the editing program sends a ‘close changeset’-signal
  • one hour after the last upload of a changeset XML OR
  • 24 hours after the changeset is opened (but I wouldn’t be surprised if one changeset pops up that seems to be longer then 24hours, either by a bug or by some timezone shenanigans)

https://mapcomplete.org (which I develop) exploits this behaviour by not closing a changeset and by trying to reuse a changeset as much as possible.

Ulasan kmpoppe terhadap 1 Febuari 2024 pada 05:38

Hi, and thank you for taking the time to analyze this!

I Have seen that table before, haven’t I? On Github or on Discourse?

Anyway:

  • Are we sure we want to spend the whole day monitoring what the user does in their changeset?

Granted, I don’t know where to find the code exactly, but I guess there’s not much “monitoring” involved. You’ll probably see a process that checks every N seconds, whether there are changesets that match Pieter’s description of points 2 and 3 (either 1 hours since the last upload or 24 hours since creation) and then shuts those changesets down. What happens before those times isn’t really something I guess is monitored in any shape, form or fashion.

https://mapcomplete.org (which I develop) exploits this behaviour by not closing a changeset and by trying to reuse a changeset as much as possible.

StreetComplete does the same, it creates it’s own little database of “OpenChangesets”, grouped by the “ElementEditType” (i.e. the “Task” or “Question” that the user was asked) and updates the changesets with changes that fit the same Edit Type, as long as the Changeset isn’t older than 20 minutes. By that time it closes its own Changesets automatically.

Ulasan TrickyFoxy terhadap 1 Febuari 2024 pada 08:56

I forgot to write about one unpleasant feature of open changesets: while they are open, you can’t comment on them. I wouldn’t risk rolling them back either. You can find a full story about the problems of changesets here https://youtu.be/aRcHLKbXlcM

I Have seen that table before, haven’t I? On Github or on Discourse?

I published it only in the Telegram chats: https://t.me/OpenStreetMapDev/7668 https://t.me/ruosm/792186

but I guess there’s not much “monitoring” involved.

The problem is that you already suggest that open changesets should be handled in a special way. This already sounds weird.


As you can see, after a glimpse of 3600 seconds, these are changesets that are closed automatically. These are either StreetComplete users who no longer made any edits as part of the quest. Or connection breaks.

@NorthCrab, in its implementation of API 0.7, offers one interesting thing: sending changes with a single HTTP request. Take a look at the current API and realize how overcomplicated it is: (


Open changesets in their current form, a strange and inconvenient thing. But I think they can be made better.

Ulasan Pieter Vander Vennet terhadap 1 Febuari 2024 pada 11:01

Tbf, for some usecases (e.g. the *Complete-editors) it allows to avoid many changesets

Ulasan mmd terhadap 1 Febuari 2024 pada 11:15

5 years ago we’ve already discussed to add an optional “close_changeset=true” attribute to the osmChange header. This would, as the name says, close the changeset as part of the upload, without the need to send an additional changeset close message. Unlike the proposed API 0.7 changes, it wouldn’t introduce an incompatible change, since it’s an optional attribute only.

Link: https://github.com/openstreetmap/openstreetmap-website/issues/2201

Ulasan Andy Allan terhadap 1 Febuari 2024 pada 14:17

Granted, I don’t know where to find the code exactly, but I guess there’s not much “monitoring” involved. You’ll probably see a process that checks every N seconds, whether there are changesets that match Pieter’s description of points 2 and 3 (either 1 hours since the last upload or 24 hours since creation) and then shuts those changesets down.

It’s much simpler than that - there’s no extra monitoring process involved. Whenever something happens to the changeset (e.g. open, diff upload, individual element update, etc), its closed_at attribute is updated.

https://github.com/openstreetmap/openstreetmap-website/blob/e83f0bd13121ab520c68d3a49a3f0f59a1266cd2/app/models/changeset.rb#L186-L198

Then the next time you try to do something (e.g. another diff upload) the code just checks if the changeset closed_at has already passed - if so, the changeset is closed, if not, the closed_at is updated again, etc. The “close changeset” method just checks if the changeset is still open, and if so, sets the closed_at to right now.

https://github.com/openstreetmap/openstreetmap-website/blob/e83f0bd13121ab520c68d3a49a3f0f59a1266cd2/app/models/changeset.rb#L69-L76

So there’s no moving parts within the codebase, no ‘watch’ process and not even an extra update to the db to close each changeset. It’s a clever design (and not something I was involved with!).

I think the more important bits is the side effects on other systems, for example changeset comments, or 3rd-party analysis tools, that might be waiting for a changeset to close before triggering an alert etc. There’s a case to be explored if 24 hours is too high an upper bound for changesets to be kept open (of course, a changeset also needs activity every 60 minutes for every one of those 24 hours, since the changeset closed_at is only extended 60 minutes at a time - so the default is to keep it open for 1 hour (reasonable?) with an upper limit of 24 hours (debatable?)).

Ulasan kmpoppe terhadap 1 Febuari 2024 pada 14:51

So there’s no moving parts within the codebase, no ‘watch’ process and not even an extra update to the db to close each changeset.

I think the more important bits is the side effects on other systems, for example changeset comments, or 3rd-party analysis tools, that might be waiting for a changeset to close before triggering an alert etc.

So, if the opening client hasn’t closed the changeset and it would be closeable (1h, 24h) even loading the CS on the website or via the API wouldn’t trigger the closing but only trying to upload data into the CS again?

Ulasan Andy Allan terhadap 1 Febuari 2024 pada 15:32

loading the CS on the website or via the API wouldn’t trigger the closing but only trying to upload data into the CS again?

Hmm, not quite.

Remember that all changesets - open or closed - have a closed_at date, it’s just that initially it’s one hour in the future (you can think of it more like “will_be_closed_at”) and often that time has passed already (so more like “was_closed_at”) and the only difference is whether that timestamp is before or after Time.utc.now. There are no updates to the database when a changeset automatically closes, the “will_be_closed_at” timestamp was already saved in the database, either during changeset open or during the last successful update.

The only ways to close a changeset are to a) wait for the closed_at timestamp to pass or b) update the closed_at timestamp to be Time.now.utc by calling the changeset/close API method - which is just an express version of a) for the impatient!

It’s one of these parts of the API where the mental model of a changeset (two states, open vs closed, and various actions like ‘close’ and ‘automatically close’) and the actual code implementation (a predetermined closed_at time, which can be in the future, and can be updated in certain limited circumstances) are quite different. The mental model is useful for mappers and there’s nothing wrong with it, but when you look at the code / database it’s quite different.

Ulasan kmpoppe terhadap 1 Febuari 2024 pada 15:45

all changesets - open or closed - have a closed_at date, it’s just that initially it’s one hour in the future

NOW I got you! Thanks for clearing that up.

Log masuk untuk meninggalkan komen