OpenStreetMap logo OpenStreetMap

Post When Comment
🌂 The Past, The Present, The Future

It also depends on the storage region

The wal bucket is in eu-west-2 due to latency (accessed primarily from Dublin & Amsterdam). cloudtrail bucket is in eu-west-2 (unsure why), the replicated backup buckets for rails user assets (gpx-trace, gpx-images, avatars) are in eu-north-1 (greenest and cheapest in EU). All other buckets are in eu-west-1.

🌂 The Past, The Present, The Future

However, it’d be worthwhile to have a comprehensive comparison between various cloud services, especially considering other potential costs.

Do you want to produce it? I can give you all the raw export data. It would need to calculate for risks and the effort required to switch solutions.

For example, expenses like “openstreetmap-wal - 28.7 TB - $400/month” still seem quite substantial.

Snippet change from our private terraform AWS repo (summary: halved the days stored and move to cheaper storage tier sooner). It should reduce the billing to less than $200/month

PS: We’ll likely move to terraform -> opentf once it is better established.

🌂 The Past, The Present, The Future

On why we use AWS: As mentioned previously we use Rails ActiveStorage for the osm.org website. AWS S3 has the best compatibility with Active Storage. We started storing AWS S3 for avatar images back in July 2019. Prior to that we used NFS which was tied to a single export host (single point of failure) and not reliable running across data centres (IO + net latency issues causing timeouts). I built & tested self-hosted Ceph storage clusters but designing & running a multi-site replicated cluster for OSM would be an extreme burden for the Ops team, we have a lot more to get on with than worry just about storage. Much of the discussion is here. There is likely also discussion in the Ops meeting minutes where we thrashed out options a few years back.

Backblaze B2 only started offering an S3 compatible object storage API in May 2020. Hetzner is expected to offer S3 compatible object storage API in 2024. I see someone has written a dedicated Backblaze B2 gem for Active Storage support, but it doesn’t look particularly well supported.

Professionally I have 10 years of real experience building and consulting on creating secure AWS, Azure & GCP solutions. I’ve also built and consulted large on-premise hosting solutions for US/UK financial and security services where using cloud hosting was not permitted. At my previous employer I was AWS certified. I understand the real costs involved in building different solutions.

On backups…

As of today (excluding logs) we have 230.8 TB backed up. 200.8 TB is AWS S3 is in Glacier Deep Archive storage tier ($0.00099 per GB) 27.4 TB is AWS S3 is in Standard-Infrequent Access storage tier ($0.0125 per GB)

Total AWS cost: $554/month ($0/month after free credits we’re using)*

By comparison Backblaze B2 costs $0.005 per GB.

Total Backblaze cost: $1168/month.

Yes, we could likely save money by deduplicating and moving from our TGZ backups to borgbackup / borgmatic. I had an informal discussion about this a few months back during our Ops fortnightly call. I’ve said I’d come up with a proper proposal, but it is extremely low priority and needs to be proven on a small scale before we can start trust it. Multiple TGZ files is simple and reliable.

In summary: AWS is cheap, reliable, we have previous experience with it and it is well matched to our limited needs allowing us to focus our time/effort on the many other parts of OSM infrastructure which require attention. The ops team run things we can realistically support. Our usage of cloud services is limited, pragmatic and we are not tied/locked-in to any cloud provider.

Why didn’t we publish all the decision detail before? We did or at least tried to, it maybe isn’t ideally collated / minuted, wanna help? Why did I discuss AWS sponsorship on Slack? Because that is where the OSM US community live and the sponsorship is primarily used for the render server for USA. Why didn’t we publish all the metrics / cost breakdowns / other details for AWS usage? Because nobody prior to you ever asked about it, but we have now added backlog tickets to automate publishing some of it. We are an open project we do not intentionally hide anything.

This has been extremely draining for me. I am human.

I am on irc oftc network in #osmf-operations or https://en.osm.town/@osm_tech or https://twitter.com/osm_tech if you have any specific follow up questions.

🌂 The Past, The Present, The Future

@NorthCrab You and I are both Linux admins, we both map for fun, we both are in agreement that wasting money on cloud services, especially when there is physical hardware to hand, would be a dishonourable way to spend donated project money.

The OSM ops team is extremely small. We don’t have the resources (human capital, time, electrical power or even large enterprise disks) to run a large on-premise ceph / gluster / NFS / whatever cluster to store the shared data (across DCs) and backup data we store. I would not feel comfortable locating important backups in the same racks that we are backing up. Building a cluster that would survive our requirement of surviving a data centre outage would be difficult. In the past online and offline we have spent a long time discussing options for example: https://github.com/openstreetmap/operations/issues/169

Our use of AWS (and S3) is limited and is the pragmatic choice. The small amount of money we spend on AWS is small and I believe justified. Our AWS costs currently being 100% covered by free credits helps with the value proposition and allows us the ops team to focus on running the many other aspects of the OSM infrastructure require our attention.

It would be great if we could meet up and find common ground. A few times a year I visit family in Lithuania, I see you map in Poland, maybe we could meet up with drinks in Warszawa? Or video call or whatever.

Hardweg 17

Wow, very nice work!

By the way, this whole account is a joke but...

It isn’t just a G with colour, it is a logo trademarked by Google. Please change it.

Thanks

I’ll bite… What happened that you wasted your time? OSM can be great fun, gets me out doors mapping and discovering my neighbourhood.

Community.osm.org - how's it going?

I keep a close eye on the translation feature of community.osm.org. It is an important feature for our community and am happy to adjust any rate limits if they are causing any problems. I would prefer if the Tips remained where they are.

Community.osm.org - how's it going?

The 4 second page load time is a difficult one. I cannot replicate it. Google report Good “Core web vitals” for 99%+ of the site for both Mobile and Desktop. There might be some edge case, but I am not seeing it.

Community.osm.org - how's it going?

I completed big tidy of some of the old forum imported categories today. Tagged the topics as relevant and then merged them into existing categories eg: Category: General Tag: Garmin.

Peering into Yesteryear

I quite like the https://every-door.app/ for updating POIs. You can also “green tick” places to verify that they still exist.

I hope the app becomes more popular with the community.

By the way, this whole account is a joke but...

Also change your image. You are using a copyrighted/trademarked logo without permission.

--

❤️ 100% agree. It is such an awesome app for adding detail and missing features/POI.

There are many “pro” features like being able to switch a disused shop to another type by clicking on the title bar when the feature is open…. Or the “hidden” in plain sight button for adding social links to a feature.

How to use Every Door

Every Door v2.0 isn’t yet available in the Google Play Store. Is Google worried about this awesome OpenStreetMap mapping app?!? ;-)

системные требования к серверу

It depends on what feature you want to run? Nominatim? Tile rendering? API? Something else? They all use a different setup.

What the robots.txt file does

Thank you for the write-up.

The /diary disallow is a recent temporary measure to mitigate against some of the spam we’ve recently had and will be removed in a few days.

A Stranger at your Table

Google Webmaster Tools currently reports we have 3,107,519 items indexed by Google Bot in Google’s index with a further 1,969,842 items pending indexing.

Here are some examples of what we exclude from the indexing (top 10 from Webmaster Tools list):

osm.org/login?referer=/search?query%3DREST osm.org/login?referer=/search?query%3Dcog osm.org/user/David Dean/traces/3021308 osm.org/browse/way/282520804 osm.org/login?referer=/search?query%3DVLI osm.org/?lat=47.2315515&lon=7.8399569&zoom=13 osm.org/login?referer=/search?query%3Dsave osm.org/?mlat=49.5864&mlon=34.4553 osm.org/?lat=43.8341464&lon=4.3776398&zoom=13 osm.org/user/FVbike/traces/3021322

Alex, StopForumSpam was very unstable for around 2 hours today because of the Verizon BGP issue. You also share all your customer data with Cloudflare. You are clearly passionate, why not focus time on improving things there.

A Stranger at your Table

To correct a few items:

  • I cannot see any diary entries on the first few pages with spam. The sysadmin team are successfully mitigating spam via non user-privacy invading methods. We are not sending user details (IP etc) to https://www.stopforumspam.com/ or similar sites. It is my understand that Alex is involved with the running of stopforumspam.com and he is very aggressively promoting the service with us. Have the sysadmin team solved the spam? Nope, it will require on-going remediation. If the spam problems grows out of hands, we may to consider using 3rd party services. (Pending privacy policy update)

  • SEO wrong. Even the latest diary entry page are in Google Search results.We receive 1,000s of requests an hour from Googlebot and other search bots. We do not wildly deny content from being index.

  • Alex appears to have misconstrued what we said about the piwiki. Alex contacting our hosting company to complain about piwiki is not a cool move on his part. We are reliant on the generosity of the institutions which provide us with hosting.

Sigh. Now it is _主管Q (“_SupervisorQ”) Spammers

@alexkemp I have emailed you and cc’ed the DWG. Please read the email.

To be clear, “Unilaterally actions” by TomH… he has been the lead developer and primary sysadmin for over 10 years of the project. It is well within his domain to make these sorts of changes (they are all public and immediately reported via IRC to reset of the team). Like him, the operations team all have a best intentions in keeping the project up & running and running reliably. If there was a simple way to deal with the low level spam issue, while ensuring the extremely high bar of user privacy we maintain, we’d have done it already.

Sigh. Now it is _主管Q (“_SupervisorQ”) Spammers

What spam?

Please refrain from perpetually insulting our voluntee admin team, of which I am a part.