AntoJvlt
8b8dfc46eb
Added --no-replace command for special phrases importation and added corresponding tests
2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed
Code cleaning and SPLoader deleted
2021-05-16 16:59:12 +02:00
Sarah Hoffmann
925726222f
Merge pull request #2323 from darkshredder/disable-search-reverse-only
...
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
7d621389ee
adapt tests to new TIGER CSV format
2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c
use tokenizer during Tiger data import
...
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5
feat: Added reverse-only-search validation
2021-05-14 02:36:21 +05:30
Sarah Hoffmann
5feece64c1
use WorkerPool for Tiger data import
...
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
b9a09129fa
move WorkerPool into db module
...
The pool is independent of the indexer and may also be used
by other parts of the software.
2021-05-13 17:11:17 +02:00
Sarah Hoffmann
fc860787dd
do not preload postcodes
...
This is too expensive for updates.
2021-05-13 16:14:12 +02:00
Sarah Hoffmann
63e35574d4
Merge pull request #2324 from lonvia/generic-external-postcodes
...
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7
fix token_info migration
...
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75
ignore invalid coordinates in external postcodes
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24
ignore entries without country code
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4
correctly handle removing all postcodes for country
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54
index postcodes after refreshing
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234
add and extend tests for new postcode handling
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83
move filling of postcode table to python
...
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
AntoJvlt
9d83da830f
Introduction of SPCsvLoader to load special phrases from a csv file
2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57
Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader
2021-05-10 21:49:31 +02:00
Sarah Hoffmann
872ab91421
fix name of transliterator
...
Should be different from the normalisation rules.
2021-05-05 17:09:38 +02:00
Sarah Hoffmann
a263e54b94
enable BDD tests for different tokenizers
...
The tokenizer to be used can be choosen with -DTOKENIZER.
Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
2021-05-05 10:31:51 +02:00
Sarah Hoffmann
18c99a5c5f
add unit tests for legacy ICU tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
d55fc39275
cache translieration results
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
ba8ed7967d
add PHP part for new ICU-base tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
f44af49df9
add Python part for new ICU-based tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
36c624ec71
commit between migrations
...
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
7fd871a74d
increase database version for tokenizer migration
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
ced8f0f4a2
fix liniting issues
2021-04-30 17:59:50 +02:00
Sarah Hoffmann
388ebcbae2
move index creation for word table to tokenizer
...
This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
20891abe1c
indexer: fetch extra place data asynchronously
...
The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
6ce6f62b8e
fetch place info asynchronously
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
602728895e
indexer: fetch ids in batches
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
fc995ea6b9
move database check for module to tokenizer
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057
boilerplate for PHP code of tokenizer
...
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
23fd1d032a
tests for legacy tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
7cb7cf848d
move amenity creation to tokenizer
...
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e
move default country name creation to tokenizer
...
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
dc700c25b6
cache all postcodes
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
0ba93e5ba9
reorganise address iteration in tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9e92759ac7
extract address tokens in tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e
move postcode normalization into tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60
move houseunumber handling to tokenizer
...
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d711f5a81e
move name token creation into tokenizer
...
Name tokens are now handed in via token_info and used from there.
Also moves the generic search name insertion function back to
placex_triggers.sql.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fa2bc60468
introduce name analyzer
...
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3
require tokeinzer for indexer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd
add extra column for tokenizer
...
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8
introduce external processing in indexer
...
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.
The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399
move word table and normalisation SQL into tokenizer
...
Creating and populating the word table is now the responsibility
of the tokenizer.
The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c
add migration for configurable tokenizer
...
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f
move module installation to legacy tokenizer
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
af968d4903
introduce tokenizer modules
...
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
185d369404
remove support for AUX housenumber tables
...
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.
This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6
Merge pull request #2299 from lonvia/update-actions
...
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112
Merge pull request #2291 from AntoJvlt/special-phrases-statistics
...
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a
do not check for extra housenumber index for reverse-only
...
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
4457bf7528
avoid Path in subprocess parameters
...
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
AntoJvlt
abb3d56b20
Switching to log info and only send warning for invalid phrases
2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0
Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class
2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2
reorganization of folder/file for the special phrases importer
2021-04-25 17:57:42 +02:00
Sarah Hoffmann
b951b11336
fix pylint complaints
2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9
pylint: disable check too-few-public-methods
2021-04-24 11:39:44 +02:00
Sarah Hoffmann
9c51c133f7
indexes with includes are not available for postgresql < 11
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
91d2fb6b1c
use group() for regex matches
...
Needed for compatibility with Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7
use pathlib version of open
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99
subprocess needs string argument
...
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
f8f8c7e534
check for existance of custom .env before opening
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4
use more generic ImportError to check for module
...
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9685c68e30
replace usages of fromisoformat() with strptime()
...
fromisoformat was only introduced with Python 3.7 while we
still support Python 3.5.
Fixes #2292 .
2021-04-23 22:50:08 +02:00
RhinoDevel
b7bae80616
Replace "nominatim-update" with "nominatim".
...
If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".
2021-04-22 15:40:22 +02:00
Sarah Hoffmann
f7e4aa51d3
indexer: reset query counter
...
Reset the counter for queries after the asynchronous connections
have been reopened.
2021-04-21 10:33:45 +02:00
Sarah Hoffmann
50b6d7298c
factor out async connection handling into separate class
...
Also adds a test for reconnecting regularly while indexing.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
26a81654a8
indexer: make self.conn function-local
...
Also switches to our internal connect function which gives us
a cursor with a sclar() function.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
6430371d7d
make index() function private
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
18705b3f18
move analyse function into indexinf function
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c6bd2bb7fb
indexer: move runner into separate file
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
79d55357e8
simplify sql and website creation functions
2021-04-19 10:53:30 +02:00
Sarah Hoffmann
4fa6c0ad53
simplify constructor for SQL preprocessor
...
Use sql path from config.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
8f63f9516b
simplify interface for adding tiger data
...
Also simplifies tests using existing fixtures.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
995ba2c7c2
add library directories to config
...
Allows to reduce the number of parameters in functions that take
the config anyway.
2021-04-19 10:26:25 +02:00
AntoJvlt
b2ae715699
Only log a warning if a wrong input is detected on the wiki while importing special phrases
2021-04-17 20:19:39 +02:00
AntoJvlt
a95c748363
Fix occurence regex
2021-04-17 19:24:13 +02:00
Sarah Hoffmann
d74ae669e3
add support index when continuing import at index phase
...
Indexing scans the placex table sequentially during indexing
on the initial import. That is okay because we know that all
rows need to be processed anywhere. When continuing the import,
however, a large part might already be indexed, so that the
process spends a lot of time going through rows that are no
longer of interest. Create a supporting index for all unindexed
rows to speed up the scan. This is the same index as used later
for updates.
2021-04-17 11:07:04 +02:00
Sarah Hoffmann
da98a2102a
remove transition functions from Python
2021-04-16 18:41:14 +02:00
Sarah Hoffmann
886a01c796
port function to compute initial postcodes to Python
2021-04-16 16:11:20 +02:00
Sarah Hoffmann
76b1885595
use absolute imports in Python code
...
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Sarah Hoffmann
c64193f839
Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate
...
Implemented auto update of special phrases while importing them
2021-04-15 10:13:25 +02:00
Sarah Hoffmann
e90adfc7c3
adapt database check to new index layout
2021-04-14 17:52:59 +02:00
Sarah Hoffmann
16267dc021
add migration for new placenode geometry index
2021-04-14 17:52:59 +02:00
Darkshredder
49ee7505ed
Fix: Removed error if endstatement is wrong and improved tests
2021-04-13 15:44:12 +05:30
AntoJvlt
ae2b2cb9a5
Tests added for the auto update of special phrases during import
2021-04-12 14:35:29 +02:00
AntoJvlt
8c2f287ce4
Implemented auto update of special phrases while importing them
2021-04-12 14:30:48 +02:00
AntoJvlt
5ecae10713
Fix default languages loading
2021-04-11 22:26:31 +02:00
Sarah Hoffmann
71564fa1de
split LANGUAGES parameter before use
...
The user supplies the languages as a comma-separated list.
2021-04-09 17:48:28 +02:00
Sarah Hoffmann
492186716f
prepare 3.7.0 release
2021-04-06 21:23:29 +02:00
Sarah Hoffmann
96b0699621
add migration for transliterated housenumbers
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
8d8b1d4307
use non-key index to speed up housenumber search
...
On Postgresql versions 11+ add an index to speed up the lookup
of housenumbers for terms found in search_name. This is really
just a band-aid around the query planer's interpretation of the
query.
2021-04-01 17:10:44 +02:00
Darkshredder
b7d6ae93e3
Nominatim/cli.py rebase fixes
2021-03-29 14:16:41 +05:30
Darkshredder
21b1b75b08
Rebase with master
2021-03-29 14:00:45 +05:30
Darkshredder
51e2654cd2
Added Manual page and fixed documentation
2021-03-29 13:57:13 +05:30
Sarah Hoffmann
09b2510219
Merge pull request #2228 from AntoJvlt/import-special-phrases-porting-python
...
Import special phrases porting python
2021-03-29 09:49:35 +02:00
AntoJvlt
57ce75eb67
Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'.
2021-03-26 02:22:38 +01:00
AntoJvlt
cde9389e75
Errors fixes, Cleaning code, Improvement and addition of tests
2021-03-26 01:53:33 +01:00
AntoJvlt
2c19bd5ea3
Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests.
2021-03-25 21:13:57 +01:00
AntoJvlt
ff34198569
Code cleaning, tests simplification and use of python3-icu package
2021-03-23 23:56:39 +01:00
AntoJvlt
1ce8b530cd
Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql.
2021-03-23 23:34:16 +01:00
AntoJvlt
6d56cbb3e8
Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file.
2021-03-23 23:30:39 +01:00
Sarah Hoffmann
4f1bdde32e
Merge pull request #2231 from mtmail/correct-cli-help-page
...
nominatim -h was printing wrong text for lookup and details
2021-03-21 16:52:20 +01:00
Sarah Hoffmann
a08ca5b1b5
avoid division by zero in progress meter
...
On Windows systems the timer may not be accurate enough to measure
the time between init() and done(). Avoid computing statistics with
a diff time of 0 in such cases.
Fixes #2230 .
2021-03-21 16:47:22 +01:00
marc tobias
87d5883ddb
nominatim -h was priting wrong text for lookup and details
2021-03-21 16:06:41 +01:00
AntoJvlt
17cb59efbd
Ported functions for the import of special phrases from php to python.
...
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
2021-03-20 19:11:50 +01:00
Sarah Hoffmann
81a6b746b8
Merge pull request #2212 from darkshredder/country-name
...
Ported createCountryNames() to python and Added tests
2021-03-15 09:36:06 +01:00
Sarah Hoffmann
7212fa8630
fix template variable name
2021-03-13 12:05:53 +01:00
Darkshredder
b108bd1c1e
Linting fix
2021-03-12 18:28:47 +05:30
Darkshredder
077a8c1f95
refactored tests and made changes to code for easy readibility
2021-03-12 18:23:20 +05:30
Darkshredder
7a874d5b97
Ported createCountryNames() to python and added tests
2021-03-12 10:28:41 +05:30
Sarah Hoffmann
9086a794a1
Merge pull request #2204 from darkshredder/tiger-data
...
Ported tiger-data-import to Python and Added Tarball Support
2021-03-11 22:48:38 +01:00
Darkshredder
e5719de657
Added fixture for sql_preprocessor and fixed some issues
2021-03-11 15:39:17 +05:30
Darkshredder
ccfad57fca
Added test and removed runlegacyscript
2021-03-10 17:18:12 +05:30
Darkshredder
64128b699a
fixed linting, refactored threaded sql handling and removed importTigerData() function
2021-03-10 13:28:29 +05:30
Darkshredder
4080fbb95c
Test fixes
2021-03-09 01:00:56 +05:30
Darkshredder
14ec83c886
Linting fixes
2021-03-08 23:10:49 +05:30
Darkshredder
122c4618b9
Linting fixes
2021-03-08 22:59:51 +05:30
Darkshredder
2af82975cd
Ported tiger-data-import to python and Added Tarball Support
2021-03-08 21:57:56 +05:30
Sarah Hoffmann
764a41b973
automatic migration from 3.6 release
...
Adds a 'admin --migrate' command that checks for the current
database version and runs any necessary migrations. Also
has migrations going back to 3.6.
2021-03-06 16:36:57 +01:00
Sarah Hoffmann
9d103503f7
Merge pull request #2197 from lonvia/use-jinja-for-sql-preprocessing
...
Use jinja2 for SQL preprocessing
2021-03-04 16:36:18 +01:00
Sarah Hoffmann
09f4d767e4
port index creation to python
...
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
2021-03-04 11:11:47 +01:00
Sarah Hoffmann
dd301cf5ac
indexer: ANALYSE must be run outside transactions
2021-03-04 11:06:33 +01:00
Sarah Hoffmann
eacabb0e96
move table creation to jinja-based preprocessing
2021-03-03 22:07:51 +01:00
Sarah Hoffmann
d2bd6aa78d
introduce jinja2 for preprocessing SQL
...
Replaces various hand-crafted replacements of varying format with
a single Jinja2 templating mechanism. Allows full access to
configuration if necessary.
2021-03-03 17:51:08 +01:00
Sarah Hoffmann
3a0a4b9175
save software version in the database
...
The version represents the software version that was used to
import the data.
2021-03-01 20:35:15 +01:00
Sarah Hoffmann
4faefe156c
report software version of status call
2021-03-01 16:47:19 +01:00
Sarah Hoffmann
86273f5e2a
introduce database patch level for version
...
This will be needed later for automatic migrations.
2021-03-01 16:46:19 +01:00
Sarah Hoffmann
b4f64aa770
make sure that calls to PHP legacy scripts are fatal on error
2021-03-01 16:10:45 +01:00
Sarah Hoffmann
b46adbad22
make sure psql always finishes
...
If an execption is raised by other means, we still have to close
the stdin pipe to psql to make sure that it exits and releases its
connection to the database.
2021-02-27 10:24:40 +01:00
Sarah Hoffmann
d14a3df10f
do not truncate search_name in reverse-only mode
2021-02-27 09:46:42 +01:00
Sarah Hoffmann
c7f40e3cee
fix verbose flag for PHP wrapper scripts
...
The flag must come after the command.
2021-02-26 16:49:32 +01:00
Sarah Hoffmann
dd03aeb966
bdd: use python library where possible
...
Replace calls to PHP scripts with direct calls into the
nominatim Python library where possible. This speed up
tests quite a bit.
2021-02-26 16:14:29 +01:00
Sarah Hoffmann
15b5906790
move setup function to python
...
There are still back-calls to PHP for some of the sub-steps.
These needs some larger refactoring to be moved to Python.
2021-02-26 15:02:39 +01:00
Sarah Hoffmann
3ee8d9fa75
properly close connections of indexer after use
2021-02-26 12:10:54 +01:00
Sarah Hoffmann
57db5819ef
prot load-data function to python
2021-02-25 21:32:40 +01:00
Sarah Hoffmann
3c186f8030
add a function for the intial indexing run
...
Also moves postcodes to fully parallel indexing.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
c7fd0a7af4
port wikipedia importance functions to python
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
32683f73c7
move import-data option to native python
...
This adds a new dependecy to the Python psutil package.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
7222235579
introduce custom object for cmdline arguments
...
Allows to define special functions over the arguments.
Also splits CLI tests in two files as they have become too many.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
f6e894a53a
port database setup function to python
...
Hide the former PHP functions in a transition command until
they are removed.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
b93ec2522e
use psql for executing sql files
...
This allows to run larger files without needing to keep
them in memory.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
af7226393a
add function to set up libpq environment
...
Instead of parsing the DSN for each external libpq program we
are going to execute, provide a function that feeds them all
necessary parameters through the environment.
osm2pgsql is the first user.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
e520613362
convert connect() into a context manager
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
a1f0fc1a10
improve deadlock detection for various versions of psycopg2
...
Psycopg2 has changed the kind of exception that is emitted on
deadlocks between versions 2.7 and 2.8. The code was already
trying to catch both kind of errors but because the
psycopg2.errors package is unknown in 2.7 and below, the
code would throw an exception on anything but a deadlock error.
This commit wraps the deadlock handling into a context manager
to avoid code duplication and uses module imports to detect if
the new error codes are available.
Also sets the required psycopg2 version to 2.7 or bigger as
versions below are difficult to test.
2021-02-25 18:11:16 +01:00
Sarah Hoffmann
971df231b0
avoid os.environ as default valie
2021-02-19 19:29:57 +01:00
Sarah Hoffmann
4b32cbe518
fix return code for check database run with 'not applicable'
2021-02-19 18:32:00 +01:00
Sarah Hoffmann
f08078ccca
bdd tests: directly call python code for setup-website
2021-02-19 18:20:55 +01:00
Sarah Hoffmann
389138abfe
port setup-website to python
2021-02-19 17:51:06 +01:00
Sarah Hoffmann
a0ae4945cd
add unit tests for new check_database code
2021-02-18 20:36:11 +01:00
Sarah Hoffmann
b169e4c88c
port check-database function to python
...
This change also adapts the hints to use the nominatim tool.
Slightly changed checks, so that they are just as effective on
a frozen database.
2021-02-18 17:32:30 +01:00
Sarah Hoffmann
101a1f895d
port freeze function to python
2021-02-17 21:43:15 +01:00
Sarah Hoffmann
c9838a02ce
disable JIT and parallel execution for osm2pgsql updates again
...
The gazetteer output doesn't disable these functions when
writing to the place table but the triggers may contain
operations that cause misplanning for the query planner.
2021-02-16 18:23:47 +01:00
Sarah Hoffmann
fbe7be760b
ignore failure to get replication date
2021-02-14 12:17:30 +01:00
Sarah Hoffmann
7cc4c53adb
always return 0 for updates unless there is an error
...
This is more in line with previous behavioru than returning
a status code when no updates are available.
2021-02-11 10:33:49 +01:00
Sarah Hoffmann
de37dc9300
forgot to replace one occurence of sql_dir
2021-02-09 19:32:05 +01:00
Sarah Hoffmann
8ffd7d9243
remove unused BINDIR constant
2021-02-09 19:30:31 +01:00
Sarah Hoffmann
298ed11261
introduce constant for configuration directory
...
This replaces {data_dir}/settings throughout the code, so that
the configuration may be placed somewhere else in the directory
structure (e.g. in /etc).
2021-02-09 18:45:45 +01:00
Sarah Hoffmann
b9517c99ae
rename sql directory to lib-sql
...
Also introduces a separate constant for the sql directory, so that
it can be put separately from the rest of the data if required.
2021-02-09 15:26:56 +01:00
Sarah Hoffmann
d81e152804
integrate analyse of indexing into nominatim tool
2021-02-08 22:22:49 +01:00
Sarah Hoffmann
0cbf98c020
consolidate warm and db-check into single admin command
2021-02-08 21:05:06 +01:00
Sarah Hoffmann
195f9f5ef3
split cli.py by subcommands
...
Reduces file size below 1000 lines.
2021-02-08 17:23:05 +01:00
Sarah Hoffmann
861e67dfe8
fix off-by-one error in replication download
2021-02-04 17:04:04 +01:00
Sarah Hoffmann
948217d5e9
reintroduce timeout for replication file download
...
This ports the --socket-timeout parameter from
pyosmium-get-changes which ensures that the update
process eventually times out on hanging network connections.
2021-02-04 11:47:11 +01:00
Sarah Hoffmann
0b2abfb115
replace make serve with nominatim serve command
...
With the website directory now tied to the project directory instead
of the build directory, it is no longer possible to use make for
running the web server.
2021-02-03 16:34:31 +01:00
Sarah Hoffmann
cb06d1f4ca
do not overwrite custom set module paths
...
Given that the module is now copied to the project directory
when no module path is set, we need the information that the
module path is empty. Therefore hand in the default module path
in a separate variable.
2021-02-02 18:31:25 +01:00
Sarah Hoffmann
36447c488a
print project directory before running any command
2021-02-02 11:19:31 +01:00
Sarah Hoffmann
5f63d4ca1f
print nice summary after updates
2021-02-01 10:34:31 +01:00
Sarah Hoffmann
90aaab77fc
fix linting issues
2021-01-30 16:42:25 +01:00
Sarah Hoffmann
7158433cd3
disable warning about non-toplevel import
...
They are needed here so nominatim can be run when osmium
is not installed. Everything except replication will work fine.
2021-01-30 16:29:28 +01:00
Sarah Hoffmann
e629a175ed
introduce custom UsageError
...
This is a exception to be thrown when the error occures because
of bad user data. We don't want to print a full stack trace in
these cases but just tell the user what went wrong.
2021-01-30 16:20:10 +01:00
Sarah Hoffmann
4cb6dc01f3
port replication update function to python
2021-01-30 15:50:34 +01:00
Sarah Hoffmann
8f0885f6cb
port check-for-update function to python
2021-01-28 14:50:14 +01:00
Sarah Hoffmann
d78f0ba804
port replication initialisation to Python
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
5b46fcad8e
convert functon creation to python
...
The new functions always creates normal and partitioned functions.
Also adds specialised connection and cursor classes for adding
frequently used helper functions.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
94fa7162be
port address level computation to Python
...
Also adds simple tests for correct table creation.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
e6c2842b66
move update code for postcode and word count to Python
...
Adds also tests for the new function to execute a SQL script.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
e6d9485c4a
cli: import python modules for commands on demand
...
Given that only one command will be executed in the end, it is
not necessary to import what amounts to the whole library. This
becomes in particular important for update functions that have
a dependency on pyosmium. The dependency can remain optional for
people not using updates.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
42ec67f63c
add more tests for CLI parameter parser
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
8c02786820
add tests for indexer
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
c26f323bf5
add simple tests for CLI parsing
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
041ae67fd9
optionally hand in command line arguments to CLI functions
...
Allows easier testing.
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
52b76d1d01
add tests for Python exec_utils
2021-01-20 21:30:27 +01:00
marc tobias
f62c784102
correct parameter name in query CLI
2021-01-20 21:09:41 +01:00
Sarah Hoffmann
8cf54a1317
add API functions to nominatim tool
2021-01-19 19:38:46 +01:00
Sarah Hoffmann
77e287f669
rename nominatim.admin to nominatim.tools
2021-01-19 19:38:46 +01:00
Sarah Hoffmann
5d95a72758
probe for php_cgi in cmake to be used for querying
2021-01-19 19:38:46 +01:00
Sarah Hoffmann
504922ffbe
remove old nominatim.py in favour of 'nominatim index'
...
The PHP scripts need to know the position of the nominatim
tool in order to call it. This is handed in as environment
variable, so it can be set by the Python script.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
c77877a934
implementaion of 'nominatim index'
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
27977411e9
move indexing function into its own Python module
...
This makes it mow a standard function of our new Python
library instead of a stand-alone program.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
b79c79fa73
add function to get a DSN for psycopg
...
Converts the PHP DSN syntax into psycopg syntax when necessary.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
7cf9d459d6
use check parameter of subprocess.run
...
...instead of checking on our own.
Also increase required version of Python to 3.5 because of
subprocess.run().
2021-01-15 10:43:04 +01:00
Sarah Hoffmann
8e53f63036
fix errors reported by pylint
2021-01-15 08:57:00 +01:00
Sarah Hoffmann
eda0900c8e
fix typo
2021-01-14 20:30:27 +01:00
Sarah Hoffmann
2f73bb3643
bdd: directly call utility scripts in lib
...
This removes the dependency on php-symfony-dotenv for the tests.
2021-01-14 18:19:22 +01:00
Sarah Hoffmann
88c57b4dc8
maller command execution fixes
2021-01-14 12:03:49 +01:00
Sarah Hoffmann
ba13cfd9ff
make sure that environment variables have highest prio
2021-01-14 11:12:45 +01:00
Sarah Hoffmann
1ff8751caa
liniting of new python code
2021-01-14 10:19:21 +01:00
Sarah Hoffmann
98dbc84836
add wrapper calls for all nominatim tool functions
2021-01-14 09:37:47 +01:00
Sarah Hoffmann
04690ad8c4
implement warming in new cli tool
...
Adds infrastructure for calling the legacy PHP scripts. As the
CONST_* values cannot be set from the python script, hand the values
in via secret environment variables instead. These are all
temporary hacks for the transition phase to python code.
2021-01-13 18:25:15 +01:00
Sarah Hoffmann
d6bcb7c8b7
consolidate cli interface to single tool
2021-01-13 10:11:58 +01:00
Sarah Hoffmann
57f5e6d898
create skeleton for new CLI tools
2021-01-12 22:21:20 +01:00
Sarah Hoffmann
612fd50612
add skeleton for new Nominatim executables
2021-01-12 10:17:28 +01:00
Sarah Hoffmann
5016eace34
improve progress logging during indexing
...
Wait for 2 seconds before logging the first progress, so that we
have numbers that are a bit more reliable statistically speaking.
Also provides an actual implementation for the log_interval
parameter and fixes some small style issues.
2020-11-30 10:59:29 +01:00
Sarah Hoffmann
4ac29fb525
only index larger batches for rank 30
...
Fixes #2045 .
2020-11-05 22:14:49 +01:00
Sarah Hoffmann
13dba94307
do not run rank 0 objects in parallel
...
Waterways are at address rank 0 and do linking. This might lead to
deadlocks.
2020-08-22 19:51:19 +02:00
Sarah Hoffmann
73c449b97b
switch indexind to address rank
...
A place needs all lower address rank object indexed to make up
the address. The search rank no longer ensures that as it can have
a different ordering than the address rank.
This switches indexing rank order to address ranks. Non-address
objects (with address rank 0) are indexed together with POIs.
2020-08-18 16:58:58 +02:00
Sarah Hoffmann
3816b86a9e
nominatim: also index boundaries by rank
...
We need to make sure that the entry in serach_name from a lower rank
is indeed available.
2020-08-18 15:17:09 +02:00
Sarah Hoffmann
a4b30fc649
index admin boundaries before everything else
...
Avoids irregularities that might happen because the address
rank of a boundary is changed through linking.
2020-08-18 15:17:09 +02:00
Sarah Hoffmann
fc50eb8688
nominatim: move DBConnection class into its own file
2020-08-18 15:17:09 +02:00
Sarah Hoffmann
5be084e0f5
indexer: allow batch processing of places
...
Request and process multiple place_ids at once so that
Postgres can make better use of caching and there are less
transactions running.
2020-08-03 10:32:39 +02:00
Sarah Hoffmann
2323923bec
indexer: move progress tracker into separate class
2020-08-03 10:32:39 +02:00
Sarah Hoffmann
0f54d42863
indexer: get rid of special handling of few places
...
Given that we do not distiribute geometry sectors to threads anymore,
there is no point in this kind of special handling.
2020-08-03 10:32:39 +02:00
Sarah Hoffmann
cca366196d
Disable JIT and parallel workers when indexing
...
Locally disable jit and parallel workers in the connection that
do indexing. The query planner tends to be overenthusiatic about
using JIT. But with the rather less complex queries we have, the
overhead tends to be larger than the performance gain.
Fixes #1677 .
2020-05-30 11:20:16 +02:00
Sarah Hoffmann
431948d768
nominatim: always use deadlock-protected wait
...
Fixes #1785 .
2020-05-15 18:49:27 +02:00
Sarah Hoffmann
5469d02d03
nominatim.py: fix wrong use of assert
...
Fixes #1762 .
2020-04-19 17:59:49 +02:00
Sarah Hoffmann
d1eeaa59a6
nominatim.py: use async in connect() function
...
The _async parameter name is only supported since psycopg 2.7.
However, async is a keyword in Python >= 3.7, so using this
gives us a syntax error. Working around this by defining the
parameters in a dict and handing that into the connect function.
2020-02-11 22:16:17 +01:00
Sarah Hoffmann
882f496e0a
nominatim.py: also catch deadlocks on final wait
2020-02-11 22:16:17 +01:00
Sarah Hoffmann
8b8aa1b4e6
regularly close connection while indexing
...
Postgres sooner or later runs out of memory when the connection
is used for too long.
2020-02-11 22:16:17 +01:00
Sarah Hoffmann
1801db523b
fix typo
2020-01-29 11:50:30 +01:00
Sarah Hoffmann
8f6fdfeb0b
forgot to index last rank
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
b4e6d72fde
replace nominatim C program
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
a338ebfce0
fix log levels
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
4144364a15
add time display for nominatim.py
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
11c0dd235b
clean up and document script
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
4a9502bf88
fix SQL and some other stuff
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
6c0d6d3178
Revert "switch to threading"
...
This reverts commit 8b1c2181be5aa5335c68d36a49cab9c4e2cd8bef.
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
0a26ca7104
switch to threading
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
2a15b2522f
use generator for thread choice
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
c11d1d78e9
add prepared statement
2020-01-24 22:06:30 +01:00
Sarah Hoffmann
7e51aa4cef
simple implementation
2020-01-24 22:06:30 +01:00
Eric Stadtherr
62747c934d
Work on setup/update scripts, unit tests, and documentation to enable Postgres server to be optionally configured on a remote host
2018-07-21 12:09:47 -06:00
Sarah Hoffmann
4ac1bf2d47
clean up byte order detection
...
Check for existence of the expected functions and macros
and error out if nothing appropriate can be found.
2018-03-16 23:09:40 +01:00
Sarah Hoffmann
8f23ba076b
replace non-standard uint type with unsigned
...
See #879 .
2018-01-10 23:27:49 +01:00
Jonathan Montane
c54fc44b33
feat(export): added linked_place_id as an attribute to feature element
2017-12-18 10:34:05 +01:00
Sarah Hoffmann
9a47e1834f
reduce message frequency during indexing
2017-09-17 20:13:05 +02:00
Edward Betts
7e3af2debc
correct spelling mistakes
2017-03-08 15:06:50 +00:00
Melvyn Sopacua
13ab03a03a
Fix warnings:
...
- be consistent with (const char *) casts when assigning
- use xmlStrlen instead of strlen when dealing with xmlChar *
2017-02-15 10:17:43 +01:00
Melvyn Sopacua
6eb6f35f24
BSD compat: use sys/endian.h if available
...
<byteswap.h> is a linuxism. On BSD-like systems this is <sys/endian.h>
2017-02-13 14:30:48 +01:00
Sarah Hoffmann
ea5fe54c6b
force language of pgsql to C
...
Fixes #558 .
2017-01-15 21:31:14 +01:00
markigail
f07d620ee8
Change load-data in setup.php.
2016-05-11 10:22:03 +02:00
markigail
190a72cab5
Fix bug in index.c and remove column admin_level from location_property_osmline.
2016-05-08 16:46:42 +02:00
markigail
1a4f369e2b
fix small bug in index.c
2016-05-03 12:52:08 +02:00
Markus Gail
db719d489f
index on geometry of interpolation lines, and more improvements.
2016-04-27 17:42:59 +02:00
Markus Gail
7879ad44cd
Remove interpolation lines from placex and save them in an extra table.
2016-04-25 09:44:01 +02:00
Sarah Hoffmann
932abeb0e2
add actual cmake file
2016-02-29 22:26:55 +01:00