Earlier this month during Wikidata Data Modeling Days Hannah Bast from University of Freiburg presented QLever, a SPARQL query engine with some really cool features (slides, recording). After a month of using it, in this post I’ll discuss how it’s relevant for the OSM community and my experience so far.
What’s SPARQL and why should we care?
SPARQL is a query language for RDF data, which can come from RDF-native knowledge graphs (there are thousands of them, public and private, the best known in the OSM community is Wikidata) or other sources (for example OSM) converted in RDF with some tool or middleware
SPARQL includes an optional extension for geospatial data, geoSPARQL. Query services implementing it allow to run all kinds of spatial queries with a naming similar to other query languages based on OGC standards (like SQL on PostGIS).
One notable feature of RDF and SPARQL is that they are made from the ground-up for interoperability between different data sources (“linked data”). SPARQL natively supports querying multiple RDF data sources in one single query through “federated queries”. This works by specifying inside the query to the first service the URL of the second service and its query, then specifying how the result should be merged or joined to the data from first service. If the second service is not blacklisted, the first service will handle autonomously the communication, merge the data and return directly the final output.
QLever
QLever and osm2rdf are two projects by the University of Freiburg; they were introduced respectively in 2017 and in 2021 but only recently started getting attention in the OSM world.
osm2rdf is a tool for converting OSM data into RDF. It transforms geometries from OSM’s node-way-relation format to Well Known Text (WKT) and can indirectly materialize containment and intersection relations between elements to improve spatial querying speed. It’s FOSS and extracts of the data it generates are available online.