Beep boop. I’m working on a project to update website tags (mostly in the U.S.) that use the http protocol instead of the https protocol when the website is already forcing you to use the https protocol. You can find more information at osm.wiki/Automated_Edits/b-jazz
討論
由 Nmxosm 於 2019年02月18日 20時07分 發表的評論
Thanks for these edits. I’ve tried to do that manually on objects I touched anyway.
Great work.
Nmxosm
由 amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️🌈 於 2019年02月19日 15時15分 發表的評論
Thanks for some of the details. Can you go into more detail. Can you include the script you’re using? What tags are you changing? How are you comparing the URLs? What are you doing to protocol-less domains (e.g.
website=www.example.com
)? Why not try it on the whole world, not just USA?由 Wynndale 於 2019年02月19日 17時36分 發表的評論
The Slack page linked from the wiki isn’t publicly readable. Do you distinguish between permanent and temporary moves or whether HSTS is set?
由 b-jazz 於 2019年02月19日 18時59分 發表的評論
@rorym: You can find the python code at https://gitlab.com/b-jazz/https_all_the_things/. It’s not meant for others to run just yet, but is there for review and comments. I’m currently just touching the “website” tag, but will likely add “url” and “contact:website” for the next go-around. I’m not sure I’ll do more than those as they make up the vast majority of http urls that are tagged. I’m happy to hear arguments on others that should really be included. When comparing the urls: I’m currently doing four checks. For http://example.com, I’m looking for https://example.com, https://example.com/, https://www.example.com, and https://www.example.com/. Those are the most common variations when specifying redirect urls. At this point, I’m not tackling protocol-less urls, but I certainly could. I should do some research and find out how common it is to leave off the http://. As for the U.S. vs. the entire planet, I’m open to running it on the rest of the world, but I just started with the U.S. as I know that community better than the rest of the world and only posted there looking for feedback. I could built up the script a little more and document how to run it and let others do their own countries. What I worry about most is getting buy-off from the larger community across the globe. If someone gives me the go ahead, I’ll happily run it world wide.
由 b-jazz 於 2019年02月19日 19時08分 發表的評論
@Wynndale: Thanks for pointing that out. I’ll redact the names of the slack thread and post the rest of the content in the wiki so that people not on the US Slack server can see comments. I am currently only rewriting 301 (Moved Permanently) and 302 (Found). As you probably know, 302 has been known at times as Moved Temporarily. So it’s arguable that I shouldn’t be rewriting any of the 302 redirects, but IMO most website operators are using 302 when they really should be doing 301. It is the reason though that I’m avoiding touching anything that is much different from the original url. I’ve seen a bunch of domains redirecting to a facebook page or a google site temporarily. Those remain untouched. As for HSTS, I wasn’t familiar with that, but did a little reading. I’m not sure how you think that could be incorporated into what I’m doing. Can you explain?
由 amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️🌈 於 2019年02月19日 19時56分 發表的評論
HSTS is where a website says “Always contact this website over HTTPS”. If an OSM object’s website tag URL returns that, then you can be much more confidence that you should change the OSM object, the RFC says that you should always use HTTPS from now on.
由 b-jazz 於 2019年02月19日 20時00分 發表的評論
I agree that it would be pretty clear at that point that you can use HTTPS, but I think a simple HTTPS redirect is pretty convincing. Especially in this day and age when more and more websites are getting clued in about the importance of secure transmissions.
由 Nakaner 於 2019年02月20日 20時44分 發表的評論
@b-jazz I recommend you to discuss this edit on a public mailing list with a proper archive (Talk-us in your case). Otherwise people not participating in proprietary communication channels can complain that the edit was not discussed, i.e. violating the Automated Edits Code of Conduct.
由 b-jazz 於 2019年02月21日 07時01分 發表的評論
Thanks for the feedback @Nakaner. I’ll make sure I mention it in both the talk-us mailing list and the Slack channel in the future. Do you want to edit the AECoC page to point out that discussions shouldn’t take place solely on “proprietary communication channels”? Maybe we can prevent someone else from interpreting the page as I did in the future.
由 amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️🌈 於 2019年02月21日 13時00分 發表的評論
I think the AECoC is relatively clear that you should at least always post to a mailing list?:
由 b-jazz 於 2019年02月21日 17時36分 發表的評論
As clear as mud. ;-)
My argument is that osmus.slack.com is a national-language forum for the U.S. with excellent representation. If that isn’t good enough for one reason or another, the wiki should call that out.
由 amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️🌈 於 2019年02月21日 17時45分 發表的評論
Upon closer reading, you are technically correct, the best kind of correct! I was influenced by the Organised Editing Guidelines which have more explicit rules. But that’s a different document. Perhaps the community should think on this.
Regardless, posting to mailing lists would be helpful.
由 amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️🌈 於 2019年02月21日 17時53分 發表的評論
Comments on your script.
http://
, what aboutwebsite=www.example.com
(i.e. no protocol defined). You could check if it answers on https, and add that protocol, so more people will default to the secure version.由 b-jazz 於 2019年02月21日 18時07分 發表的評論
Three excellent questions/suggestions. Thanks!
由 b-jazz 於 2019年02月21日 19時13分 發表的評論
I’ve found about 3000 instances of http://www.example.com redirecting to https://example.com in the lower 48. This makes me happy (because I abhor ‘www’). I’ll put a fix and run batches again as soon as I implement www.example.com to http://www.example.com as well. Great find @rorym.
由 escada 於 2019年02月22日 07時20分 發表的評論
The tags “heritage:website” and “image” can also contain URLs. Might be worth looking at them too in a future version of your script.
由 b-jazz 於 2019年02月23日 02時08分 發表的評論
Thanks @escada. I only thought about “website”, “:website”, “url”, and “:url”. I wasn’t aware of “image”. Looks like there are over 100,000 image tags. I’ll look into it and see if they are predominantly URLs.
由 Nmxosm 於 2019年05月30日 11時59分 發表的評論
Hi,
have there been any updates with regard to broadening the tags this is applied to?
From your editing history and frequency I assume, that the whole planet is done and your new edits just focus on new additions. Is that correct?
Greetings Nmxosm
由 b-jazz 於 2019年05月30日 16時19分 發表的評論
I’m refactoring the code as we speak to apply to a broader set of tags. I expect I’ll start a run of those in the next week or two. And yes, you’re right that the whole planet has been completely looking for the website key. I’m just rolling across the entire planet about once a week looking for new additions.