So,
I found out yesterday that I dont need to parse the whole wikipedia to get the points.
http://www.webkuehn.de/hobbys/wikipedia/geokoordinaten/Wikipedia_en_2008-03-12.zip
So, now I have split that file by the area of interest.
First I parsed out my kosovo boundries file and extracte d the min/max of lat and lon.
using the openstreetmapkosova/kosovaadmin.osm from my osm branch, also in lp.
This produced :
LAT Avg 42.3764065194805
cnt 539
Min 41.8534278
Max 43.2723636
size 1.4189358
LON Avg 20.9177833755102
cnt 539
Min 20.0722732
Max 21.8005791
size 1.7283059
I modified those coords, and used my stripkml to extract all the points in that box :
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/2/stripkml.pl
Then I used kml2osm from here
http://osmlib.rubyforge.org/ http://rubyforge.org/projects/osmlib/
The modified version is for the wikipedia points only.
You need to run it with the exisiting osmlib, i run it in the examples dir.
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/4/kml2osm
Here is the result:
http://bazaar.launchpad.net/%7Ejamesmikedupont/%2Bjunk/openstreetmap-wikipedia/revision/4/KosovoWP.osm
Here is my changeset, uploaded with josm.
osm.org/browse/changeset/1911214
Now I would like to have a way to process the wikipedia
data in chunks like this. There should be a way to extract just parts of the zipfile or bz2 file.
Thanks,
mike
Абмеркаванне
Камэнтар lyx ад 23 Ліпень 2009 у 07:23
Nicely done. I noticed notes labeled "list of tripoints" and "extreme points of montenegro" in your changeset that indicate that these where more than one point originally. Might be worth checking again.
Камэнтар TomH ад 23 Ліпень 2009 у 08:13
Importing POIs from wikipedia is generally considered to be a bad idea due to the way much of wikipedia's geodata has been sourced - some of it at least is believed to have been derived from sources like google maps. See this recent mailing list thread for a discussion of the issue:
http://www.nabble.com/Wikipedia-POI-import--to23392791.html#a23394016
Камэнтар Pieren ад 23 Ліпень 2009 у 09:33
Yes, this is really a bad idea. Many of the Wikipedia geodata are imported from sources not allowing commercial reuse or from googlemaps.
Камэнтар drlizau ад 23 Ліпень 2009 у 10:52
I've had private conversations with Mike.
Just to make it clear, this is not for import into the main OSM map, this is for informing mappers in Kosovo what may exist on the ground. If you checked the google map of Prishtina you would see that it is hopeless, so don't imagine anything will be copied from there.
Камэнтар h4ck3rm1k3 ад 23 Ліпень 2009 у 10:57
Well,
you can see that I am done.
If you think any of these 10 points have been copied from somewhere illegally I will removed them.
Also the wikipedia people said to me that facts are not copyrightable.
Yes, I used a bounding box and not the polygon. I could use osmosis to filter this dataset again with the boundry.
the result of this exercise is that there is hardly any data for kosovo in the wikipedia.
thanks,
mike
Камэнтар h4ck3rm1k3 ад 23 Ліпень 2009 у 10:58
and you don't have to approve this changeset,. btw.
it looks like only there are two or three interesting points.
Камэнтар HannesHH ад 24 Ліпень 2009 у 14:51
Facts not, data yes. OSM is just factual data too. ;)
Камэнтар Richard ад 25 Ліпень 2009 у 16:34
"the wikipedia people said to me that facts are not copyrightable"
The wikipedia people are not renowned for their understanding of the complexities of geodata law. ;)