OpenStreetMap 标志 OpenStreetMap

Wikipedia extract

h4ck3rm1k3 于 2009年七月23日 以 English 发布

So,
I found out yesterday that I dont need to parse the whole wikipedia to get the points.
http://www.webkuehn.de/hobbys/wikipedia/geokoordinaten/Wikipedia_en_2008-03-12.zip

So, now I have split that file by the area of interest.
First I parsed out my kosovo boundries file and extracte d the min/max of lat and lon.

http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/2/extract.pl#extract.pl

using the openstreetmapkosova/kosovaadmin.osm from my osm branch, also in lp.

This produced :
LAT Avg 42.3764065194805
cnt 539
Min 41.8534278
Max 43.2723636
size 1.4189358

LON Avg 20.9177833755102
cnt 539
Min 20.0722732
Max 21.8005791
size 1.7283059

I modified those coords, and used my stripkml to extract all the points in that box :
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/2/stripkml.pl

Then I used kml2osm from here
http://osmlib.rubyforge.org/ http://rubyforge.org/projects/osmlib/

The modified version is for the wikipedia points only.
You need to run it with the exisiting osmlib, i run it in the examples dir.
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/4/kml2osm

Here is the result:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/4/KosovoWP.osm

Here is my changeset, uploaded with josm.
osm.org/browse/changeset/1911214

Now I would like to have a way to process the wikipedia
data in chunks like this. There should be a way to extract just parts of the zipfile or bz2 file.

Thanks,
mike

电子邮件图标 Bluesky图标 Facebook图标 LinkedIn图标 Mastodon图标 Telegram图标 X图标

讨论

lyx2009年07月23日 07:23 的评论

Nicely done. I noticed notes labeled "list of tripoints" and "extreme points of montenegro" in your changeset that indicate that these where more than one point originally. Might be worth checking again.

TomH2009年07月23日 08:13 的评论

Importing POIs from wikipedia is generally considered to be a bad idea due to the way much of wikipedia's geodata has been sourced - some of it at least is believed to have been derived from sources like google maps. See this recent mailing list thread for a discussion of the issue:

http://www.nabble.com/Wikipedia-POI-import--to23392791.html#a23394016

Pieren2009年07月23日 09:33 的评论

Yes, this is really a bad idea. Many of the Wikipedia geodata are imported from sources not allowing commercial reuse or from googlemaps.

drlizau2009年07月23日 10:52 的评论

I've had private conversations with Mike.
Just to make it clear, this is not for import into the main OSM map, this is for informing mappers in Kosovo what may exist on the ground. If you checked the google map of Prishtina you would see that it is hopeless, so don't imagine anything will be copied from there.

h4ck3rm1k32009年07月23日 10:57 的评论

Well,
you can see that I am done.
If you think any of these 10 points have been copied from somewhere illegally I will removed them.
Also the wikipedia people said to me that facts are not copyrightable.
Yes, I used a bounding box and not the polygon. I could use osmosis to filter this dataset again with the boundry.
the result of this exercise is that there is hardly any data for kosovo in the wikipedia.
thanks,
mike

h4ck3rm1k32009年07月23日 10:58 的评论

and you don't have to approve this changeset,. btw.
it looks like only there are two or three interesting points.

HannesHH2009年07月24日 14:51 的评论

Facts not, data yes. OSM is just factual data too. ;)

Richard2009年07月25日 16:34 的评论

"the wikipedia people said to me that facts are not copyrightable"

The wikipedia people are not renowned for their understanding of the complexities of geodata law. ;)

登录以留下评论