krahulreddy's Diary

Recent diary entries

GSoC Final Documentation

Posted by krahulreddy on 31 August 2020 in English.

All important links

Code

Diary entries

Hosted Server

Server with Nominatim-UI (This will be offline after few days of completion of GSoC)
API with suggestions (This will be offline after few days of completion of GSoC)

About the project

This is a GSoC project, which has been developed over the Summer of 2020 by K Rahul Reddy (krahulreddy). This project is mentored by Sarah Hoffmann (lonvia) and Marc Tobias (mtmail).

What are we doing: The problem statement

OSM’s main search engine Nominatim does not support search suggestions. A separate database, which should be derived from the Nominatim database, should be set up for search suggestions. This DB should support regular updates from Nominatim DB. This must handle various languages. It must be small enough to run alongside Nominatim.

This project aims at setting up such a search suggester by comparing various alternative implementations like Elasticsearch, Solr. These suggesters set up indexing to facilitate quick suggestions to the users. The finalized stack will be integrated with the Nominatim search API. Complete installation and setup documentation along with a test suite will be created as a part of this project.

Why is it required?

Suggestions during search help a lot in terms of finding the right place. Adding suggestions to the Nominatim search will help the users of Nominatim and OpenStreetMap to easily find the right place without performing a Nominatim DB search.

What are the approaches taken?

The following steps were taken to provide suggestions:

… See full entry

Nominatim suggestions update

Posted by krahulreddy on 30 July 2020 in English.

As part of phase two of the GSoC project, following work has been done.

Server is Online

At the end of phase one, we had a server up and running with nominatim-ui and elasticsearch server. This is from the machine provided to me as part of this project. It currently has the entire planet DB set up. Suggesstions were setup for few smaller DBs, but not for the planet DB as of now. There is constant debugging and changes going on, so the suggestions might not be available at all times. I will post another dairy update once the suggestions are completely available.

The server is hosted at http://95.217.117.45/nominatim/ui. This will be available only during the course of this GSoC project (till 31 August 2020). The suggestions are provided from a hug API call. This can be accessed at http://95.217.117.45:8000/pref?q=. This internally queries elasticsearch on the server and returns the results.

Elasticsearch configurations

Elasticsearch provides a lot of configurations, which can be tweaked to obtain optimal performance. For our setup, the requirements include: * Require less space. * Fast indexing

The following are few of the options in elasticsearch that were explored during this phase.

These will be tested out with the planed DB indexing, which will be done soon!

Address formation

The address is being formed in all the languages entered in the Nominatim DB. You can look at all the languages by looking at the tags at http://95.217.117.45:8000/pref. For example: http://gsoc2020.nominatim.org:8000/pref/?q=bangalore%20north has the result: "addr": "Bangalore North, Bangalore Urban, Karnataka", "addr:kn": "ಬೆಂಗಳೂರು ಉತ್ತರ, ಬೆಂಗಳೂರು ನಗರ, ಕರ್ನಾಟಕ", "country_code": "in"

Indexing rate:

… See full entry

Proof of Concepts

Posted by krahulreddy on 30 June 2020 in English.

The code for these POCs can be found here. This code will still be modified as we test and tweak various options available. This phase is important to establish that our project is going to work as expected, and there are no missing/misbehaving components.

As a part of this, the following components are designed and tested:

Getting input:

Psycopg2 Python library is used to connect to postgresql and fetch data.
DictCursors used to fetch data in a dictionary format. (This is necessary to ensure that the hstore data structure used to store name, address fields are fetched correctly.)

Important note: These cursors are not thread-safe. So, going ahead, if multithreading is used, this must be kept in mind.

Formatting:

Created a Doc class with necessary fields. (Fields discussed in the last article)
Forming addresses using the place_addressline table. These will be finally indexed in elasticsearch along with other necessary fields.

Indexing:

Setting up an elasticsearch server and running it.
Creating an index, Deleting an index.
Inserting documents into the index. Indexing with looping and using bulk indexing.
Try indexing with varying numbers of records.

Hug API:

Create a hug API client to add an extra layer of security by avoiding exposure of elasticsearch endpoints.

Searching:

Setting up the front end on Nominatim. Available here
Fetching results from the hug API endpoint to Nominatim.
Displaying the results as an option list.
Selection of results by the user.

Outcomes/Observations:

… See full entry

Elasticsearch vs Solr

Posted by krahulreddy on 18 June 2020 in English.

Comparison:

For indexing the nominatim data, we have two major contenders- Solr and Elasticsearch. Both are based on the Apache Lucene library and provide a wide range of search options. As a part of my GSoC project, a comparison of both of these has been done. A small project was set up to compare the functionality offered.

We have listed a few requirements for this project:

Required

Indexing: Name, Postcode fields.
Handle multiple names -> eg:- OSMNames
Handle postcodes
Scoring based on importance(0-1).
Handle data update
Store but do not index: Type, Class, id
Avoid copy fields

Desirable

Normalization
Tokenization for suggestion improvement
Consider browser defaults for language
Typo tolerance

The following table is a brief description of the results of the comparison of Solr and Elasticsearch for our requirements. This table also contains brief information about how different parts will be implemented.

… See full entry

Community Bonding

Posted by krahulreddy on 14 June 2020 in English.

May 4-May 31 was the bonding period for my GSoC. I utilized this time to learn more about the codebase and started digging deeper into the specifics of the project requirements. I was provided a server (sponsored by OpenCage) to set up the Nominatim planet database. It was my first time working with such massive data. My mentors, Sarah Hoffmann and Marc Tobias helped me in setting up the server.

Installation and server setup:

The following is a brief of the steps I followed.

Created nominatim user without login permission

sudo useradd -d /srv/nominatim -s /bin/false -m nominatim

Postgres configuration changes:

 shared_buffers = 2GB
 maintenance_work_mem = 10GB
 autovacuum_work_mem = 2GB
 work_mem = 150MB
 effective_cache_size = 24GB
 synchronous_commit = off
 max_wal_size = 1GB
 checkpoint_timeout = 10min
 checkpoint_completion_target = 0.9
 effective_io_concurrency = 500
 random_page_cost = 1.5

 # reenabled these two after initial import 
 fsync = off
 full_page_writes = off

sudo systemctl restart postgresql

The data import stopped during the indexing (created issue https://github.com/osm-search/Nominatim/issues/1785)
The indexing was resumed using:

./utils/setup.php --index --create-search-indices --create-country-names
Then faced the issue https://github.com/osm-search/Nominatim/issues/1476.

I had to use SELECT pg_terminate_backend( ) FROM pg_stat_activity; 7-8 times.
After a successful setup of planet DB, the next step was to import Wikipedia and Wikidata. This was done by following steps described at https://nominatim.org/release-docs/latest/admin/Import-and-Update/#wikipediawikidata-rankings
The photon project was also set up on the server for testing and playing around with the code.

With the completion of server setup, I am all set to go ahead with the next phases of the project.

Open issues:

… See full entry

Location: Somasundarapalya, Haralur, Bangalore South, Bengaluru Urban, Karnataka, 560102, India

OSM journey begins!

Posted by krahulreddy on 5 May 2020 in English.

Who am I?

I am Rahul, a Computer Science undergraduate student from the National Institute of Technology, Karnataka, India. I am interested in writing code and solving problems.

My preferred type of vacation is a trek. My college is on the western coast of India. We have a splendid view of the beaches, that’s why I go cycling once a month😅 (sometimes more than that).

Interesting fact: Most of my projects in 2019 included music in some or the other form. In 2020, my ventures have been towards Maps.

How did I come across OSM?

In February, I was at a 36-hour hackathon with a team of 3. Our idea involved maps. It was our first time working with Maps, and we decided to use Google Maps API and started with the app.

We implemented with most of the functionality without facing significant hurdles. When we decided to improve the app and involve a few more features, billing struck us hard. We did not have enough time to set up APIs that required billing. I was searching for alternatives. That’s when I first came across OSM. It was too late at that point to switch, so we ended up not using it.

GSoC’20 Project: Add search suggestions on openstreetmap.org.

The complete project proposal can be found here.

OSM’s main search engine Nominatim can not directly support search suggestions for partially written words. Thus, we aim to set up a database to handle suggestions. Moreover, this database needs to meet the following functional requirements:

It should be derivable from the Nominatim DB.
It should handle various input languages.
It should be regularly updatable.

The following are a few non-functional requirements:

Accurate suggestions.
The Database should be as small as possible.
The service should not be overwhelmed by the requests.

… See full entry