Sarah Hoffmann
d75a235c1f
use address tokens in SQL
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9e92759ac7
extract address tokens in tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e
move postcode normalization into tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60
move houseunumber handling to tokenizer
...
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d711f5a81e
move name token creation into tokenizer
...
Name tokens are now handed in via token_info and used from there.
Also moves the generic search name insertion function back to
placex_triggers.sql.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fa2bc60468
introduce name analyzer
...
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3
require tokeinzer for indexer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
1b1ed820c3
introduce index for finding surrounding buildings
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd
add extra column for tokenizer
...
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8
introduce external processing in indexer
...
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.
The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399
move word table and normalisation SQL into tokenizer
...
Creating and populating the word table is now the responsibility
of the tokenizer.
The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c
add migration for configurable tokenizer
...
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f
move module installation to legacy tokenizer
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
af968d4903
introduce tokenizer modules
...
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
5c7b9ef909
Merge pull request #2303 from lonvia/remove-aux-support
...
Remove support for AUX housenumber tables
2021-04-30 11:19:35 +02:00
Sarah Hoffmann
185d369404
remove support for AUX housenumber tables
...
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.
This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6
Merge pull request #2299 from lonvia/update-actions
...
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112
Merge pull request #2291 from AntoJvlt/special-phrases-statistics
...
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a
do not check for extra housenumber index for reverse-only
...
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
1fd483643b
add tests for different scripts
2021-04-26 23:01:06 +02:00
Sarah Hoffmann
a21a0864f1
Merge pull request #2298 from lonvia/add-warming-to-ci
...
Add warming to CI import tests and fix more Python 3.5 compatibility issues
2021-04-26 11:21:44 +02:00
Sarah Hoffmann
4457bf7528
avoid Path in subprocess parameters
...
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
Sarah Hoffmann
5ed6f18d83
add warming to CI import test
2021-04-26 09:54:09 +02:00
AntoJvlt
abb3d56b20
Switching to log info and only send warning for invalid phrases
2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0
Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class
2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2
reorganization of folder/file for the special phrases importer
2021-04-25 17:57:42 +02:00
Sarah Hoffmann
6812f397af
Merge pull request #2297 from lonvia/update-deployment-docs
...
docs: update deployment to use project directory
2021-04-24 15:35:00 +02:00
Sarah Hoffmann
68bd9c6091
Merge pull request #2296 from lonvia/disable-too-few-public-methods-check
...
pylint: disable too-few-public-methods check
2021-04-24 15:03:28 +02:00
Sarah Hoffmann
754f9e3a20
docs: update deployment to use project directory
...
Fixes #2295 .
2021-04-24 15:00:46 +02:00
Sarah Hoffmann
b951b11336
fix pylint complaints
2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9
pylint: disable check too-few-public-methods
2021-04-24 11:39:44 +02:00
Sarah Hoffmann
b4fe7d7c7d
Merge pull request #2293 from darkshredder/update-manpage
...
Updated manual page
2021-04-24 09:20:28 +02:00
Sarah Hoffmann
5071710db7
Merge pull request #2294 from lonvia/update-actions
...
CI: add import test against Python 3.5 and fix discovered issues
2021-04-23 23:33:15 +02:00
Sarah Hoffmann
9faaf3fc88
actions: add import on ubuntu 18.04
...
This uses oldest possible dependencies where possible.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9c51c133f7
indexes with includes are not available for postgresql < 11
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
91d2fb6b1c
use group() for regex matches
...
Needed for compatibility with Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7
use pathlib version of open
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99
subprocess needs string argument
...
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
f8f8c7e534
check for existance of custom .env before opening
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4
use more generic ImportError to check for module
...
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9685c68e30
replace usages of fromisoformat() with strptime()
...
fromisoformat was only introduced with Python 3.7 while we
still support Python 3.5.
Fixes #2292 .
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
95e6ec091b
remove argparse dependency for vagrant scripts
...
Users don't need to recreate the manpage.
2021-04-23 22:50:08 +02:00
Darkshredder
34f5e4a199
Updated manual page
2021-04-24 01:42:38 +05:30
Sarah Hoffmann
788baafa26
bdd tests: fix place dependen ranking tests
...
The ranks of places may differ for some countries. Force the
place nodes in the test on null island which always uses the
default ranking.
2021-04-22 17:31:00 +02:00
Sarah Hoffmann
4c31813398
Merge pull request #2288 from RhinoDevel/patch-1
...
Replace "nominatim-update" with "nominatim".
2021-04-22 17:12:25 +02:00
RhinoDevel
b7bae80616
Replace "nominatim-update" with "nominatim".
...
If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".
2021-04-22 15:40:22 +02:00
Sarah Hoffmann
f7e4aa51d3
indexer: reset query counter
...
Reset the counter for queries after the asynchronous connections
have been reopened.
2021-04-21 10:33:45 +02:00
Sarah Hoffmann
696c50459f
Merge pull request #2285 from lonvia/split-indexer-code
...
Rework indexer code
2021-04-20 15:34:14 +02:00
Sarah Hoffmann
50b6d7298c
factor out async connection handling into separate class
...
Also adds a test for reconnecting regularly while indexing.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
26a81654a8
indexer: make self.conn function-local
...
Also switches to our internal connect function which gives us
a cursor with a sclar() function.
2021-04-20 14:08:37 +02:00