OpenStreetMap 로고 OpenStreetMap

Proof of Concepts

krahulreddy님이 English로 2020년 6월 30일에 게시함.

The code for these POCs can be found here. This code will still be modified as we test and tweak various options available. This phase is important to establish that our project is going to work as expected, and there are no missing/misbehaving components.

As a part of this, the following components are designed and tested:

Getting input:

  • Psycopg2 Python library is used to connect to postgresql and fetch data.
  • DictCursors used to fetch data in a dictionary format. (This is necessary to ensure that the hstore data structure used to store name, address fields are fetched correctly.)

    Important note: These cursors are not thread-safe. So, going ahead, if multithreading is used, this must be kept in mind.

Formatting:

  • Created a Doc class with necessary fields. (Fields discussed in the last article)
  • Forming addresses using the place_addressline table. These will be finally indexed in elasticsearch along with other necessary fields.

Indexing:

  • Setting up an elasticsearch server and running it.
  • Creating an index, Deleting an index.
  • Inserting documents into the index. Indexing with looping and using bulk indexing.
  • Try indexing with varying numbers of records.

Hug API:

  • Create a hug API client to add an extra layer of security by avoiding exposure of elasticsearch endpoints.

Searching:

  • Setting up the front end on Nominatim. Available here
  • Fetching results from the hug API endpoint to Nominatim.
  • Displaying the results as an option list.
  • Selection of results by the user.

Outcomes/Observations:

  1. All the parts work well as expected.
  2. The indexing speed differed from system to system. On the server, we can index at a rate of >1500 documents per second. This is something we can work with at the moment, but further changes are to be made, and the goal is to reach 2000 documents per second.
  3. Bulk indexing works at exceptional rates for smaller extracts. But the rate goes down as more and more data is given. This needs a bit more work.
  4. With our indexing and frontend trials, we got good results using just the match_phrase_prefix option while querying elasticsearch.

Next steps:

With the POCs now completed, we move forward to the actual planning and design part of the project. Most of the parts are available. Planning will help structure the project and connect all the dots.

이메일 아이콘 Bluesky 아이콘 Facebook 아이콘 LinkedIn 아이콘 마스토돈 아이콘 텔레그램 아이콘 X 아이콘

토론

2020년 7월 1일 11:18spiregrain님의 의견

What does it do?

댓글을 남기려면 로그인하세요