OpenStreetMap 로고 OpenStreetMap

Using Global Building Data

coolmule0님이 English로 2022년 8월 6일에 게시함. 최근 2022년 8월 8일에 업데이트됨.

Microsoft have recently released their “Worldwide building footprints derived from satellite imagery”. This is a dataset containing the shape and positions of houses. It covers many parts of the world and is derived from Bing aerial imagery with the use of deep learning.

Data within JOSM

I find these footprints interesting for numerous reasons. The focus here will be to use the data to aid in mapping. There is further information on the OSM wiki.

In this diary entry I want to explain how I got the data into JOSM. In a following entry I will explain how I use this data to speed up building mapping.

Obtaining the data

The data is freely available on Github. It is licensed under ODbL and is compatible with OpenStreetMap. Each available country has its own zip file. Some of the files are of the order of gigabytes - far larger than can be opened in regular software. Pick a country and download the zip file.

Issues with the data

Care should be taken to examine and understand the data, and this set is no different.

  • I found that numerous buildings were repeated in the data. This meant that JOSM had duplicated buildings and duplicated nodes after importing.

  • There are various false-positives and true-negatives in the data. Some things that are buildings are not given in the data. Likewise the dataset thinks some objects are buildings which are not.

  • The rotation of buildings can be off.

  • The data may contain overlapping buildings, such as a garage overlapping with the house next to it.

Unzip and geoJSONL

The zip file contains a .geojsonl file (notice the ‘l’ at the end of the filename). This file type cannot be directly imported into JOSM. In addition, files on the order of gigabytes will clog up JOSM too much to be usable. (*Edit: see vorpalblade-kaart’s comment below for better information)

I found suggestions about splitting the large file into smaller files using the split command. However, the data is not spatially ordered within the file, so each split chunk has data from all over the country.

In order to create a small file of a subset of the data localised in an area of interest I turned to python and geopandas.

Python and Geopandas

Geopandas is a python library that can read and convert geoJSON(L) files, and can limit items to within a bounding box. I can specify the coordinates of a box and only the buildings within will be kept. This allows creation of a manageable file size focused on an area of interest. I’m sure what I used this for can be done with software like QGIS, but I’ve never used that type of software before.

A script

In the end I created a python script that would read in an unzipped .geojsonl file downloaded from the repository, cut out a selected region, remove duplicate geometries, and save the resulting geometries to a .geojson file that can be safely imported into JOSM.

The script is available on GitHub. You should have python installed. It needs the geopandas and numpy libraries, which can be obtained through the command pip install geopandas numpy. The script can be called with python extract_region.

Conclusion

The image below shows the data imported into JOSM

Import of dataset into JOSM

I’ve been able to extract geospatially grouped building footprints into JOSM. In testing how large files can be, and I’ve found files on the size of 300MB to be the limit that JOSM on my computer can handle. I have 24 GB ram.

My next entry will talk about how I use this data to speed up the mapping of buildings in OpenStreetMap and JOSM.

이메일 아이콘 Bluesky 아이콘 Facebook 아이콘 LinkedIn 아이콘 마스토돈 아이콘 텔레그램 아이콘 X 아이콘

토론

2022년 8월 7일 06:45MatthiasMatthias님의 의견

Very nice!

2022년 8월 7일 15:13DoubleA님의 의견

Is there any discussion going on concerning automated imports and tracking the covered area?

2022년 8월 7일 15:16CjMalone님의 의견

Yeah very nice, I had missed the release of this dataset.

Just set it up to with with MapWithAI in JOSM locally. It’s got some great footprints, and some weird ones.

2022년 8월 7일 23:41scruss님의 의견

You must follow the import guidelines or your edits risk being removed.

You must use a dedicated import account for this, not your regular one.

Problem changesets include:

2022년 8월 8일 05:49SimonPoole님의 의견

Besides everything that has been said, there are regions, for example the UK, where if the outlines were generated from Bing imagery, the imagery is obviously no longer available making it very difficult to determine if the outlines are even just half correct.

The other issue is naturally, as has been pointed out many times, that the licensing isn’t ideal and will long term cause issues.

2022년 8월 8일 11:14coolmule0님의 의견

@DoubleA. I’m not away of any ongoing discussion on this. The Wiki page had no mention until I made a small edit about it. I would very much like to know of any locations of discussion about this.

@scruss. A very good point. I wanted to expand in more detail about how I made the changesets in my next post, but would appreciate some feedback on it here beforehand if possible. I thought the way it was used was not an import, as I considered each footprint individually. In a similar way to having a map layer like cadastral parcels as an overlay, it was used as a tool to aid in mapping houses, rather than simply copy-pasting large data across. Would this be considered an import due to use of external data?

@SimonPoole. The database license is ODbL, which is the same as OSM, and hence should not have issues between them. Is there some further license issue?

2022년 8월 8일 13:38SimonPoole님의 의견

@coolmule0 while the ODbL is nominal compatible with itself, that is not the point. Incorporating ODbL licensed data makes a future licence change of any kind (and if it is just fixing the couple of minor issues the current version of the ODbL has), completely dependent on the good will of the original licensors (if they even still exist at that point in time), or removing the data in question.

There’s a further issue with all licenses that don’t allow sub-licensing that makes all such sources problematic, but that isn’t an ODbL specific issue.

2022년 8월 8일 14:12vorpalblade-kaart님의 의견

The zip file contains a .geojsonl file (notice the ‘l’ at the end of the filename). This file type cannot be directly imported into JOSM. In addition, files on the order of gigabytes will clog up JOSM too much to be usable.

That is roughly correct. JOSM reads delimited geojson files that follow the RFC 8142 proposed standard, which depends upon the RS (0x1e) record separator character. Assuming you have jq installed, you can run the following one-liner to convert the geojsonl file to something JOSM will open:

$ cat 'United Kingdom.geojsonl' | sed -e 's/^{/'$(printf "\x1e")'{/' | jq -c --seq . > 'United Kingdom.geojson'

You’ll still have to deal with the data slowing down JOSM (it tries to draw everything), but that can be fixed by zooming in once everything loads, assuming you have allocated enough memory to JOSM.

With a 6.0 GB file, you are going to want to allocate more memory to JOSM (see JOSM OutOfMemory), since the default memory given for JOSM is typically 4 GB or less.

Yes, we don’t have to keep the whole file in memory, but we do have to keep the data from the whole file in memory, since we aren’t just processing it and forgetting about it (which is where the line-delimited geojson format makes a difference).

2022년 8월 9일 00:38scruss님의 의견

Not just an import, a destructive one at that. You overwrote existing building that had metadata with bare outlines, e.g.: osm.org/way/559909735/history

I would very much like to know of any locations of discussion about this.

If you plan an import, it’s you that starts the discussion before you do it

2022년 8월 9일 09:32coolmule0님의 의견

@scruss Thank you for pointing this out. This was a manual accident I made. I am still learning how to use JOSM efficiently, and sometimes my attempts at pressing shortcuts don’t do what I expect. Rather than delete the buildings I made, I accidentally deleted everyone else’s! I have endeavoured to fix the mistake I made.

2022년 8월 10일 10:59skquinn님의 의견

How does this differ from the existing data from Microsoft available via MapWithAI?

2022년 8월 14일 13:28Cascafico님의 의견

Please, find my considerations and tests at osm.org/user/Cascafico/diary/399590

2022년 8월 24일 21:48Michi님의 의견

Since geopandas used in the Python script is using huge amounts of memory, which I don’t have, I wrote a small program which does not load the file into RAM. Instead it processes the file line by line, feature by feature. This way the program is mostly I/O & CPU intensive. It also outputs directly in GeoJSON.

I uploaded the program to GitHub, if anybody else has not enough memory and wants to give it a try…

댓글을 남기려면 로그인하세요