Commit Graph

267 Commits

Author SHA1 Message Date
Tareq Al-Ahdal
b6ac4ad837 fix linting error 2022-03-18 21:05:47 +08:00
Tareq Al-Ahdal
af739d2f57 modify logic of _include_key function 2022-03-18 06:52:16 +08:00
Tareq Al-Ahdal
fa2aca1cbc adding prefix to keys is now more configurable 2022-03-18 06:20:00 +08:00
Tareq Al-Ahdal
d09670d208 modify logic to prepend 'name:' to keys' 2022-03-18 06:01:25 +08:00
Tareq Al-Ahdal
d32a7c1888 initialize an empty dictionary for nested name key 2022-03-18 02:50:33 +08:00
Tareq Al-Ahdal
6be2077d92
Merge branch 'master' into country-names-yaml-configuration 2022-03-18 02:36:12 +08:00
Tareq Al-Ahdal
456d439e97 Reformatting of country keys 2022-03-18 02:23:11 +08:00
Tareq Al-Ahdal
b4bd4ff67d fix linting error 2022-03-15 19:14:04 +08:00
Sandor Nagy
7e3701b64a Fix typo in log message on replication initialisation 2022-03-15 07:50:47 +01:00
Tareq Al-Ahdal
165d17f7f7 reintroduce 'name:' prefix to country name keys 2022-03-13 18:58:27 +08:00
Tareq Al-Ahdal
377cf36be3 modify data import logic to load country names from yaml 2022-03-12 15:20:57 +08:00
Sarah Hoffmann
5425394654 add migration to add new derived_names column 2022-02-24 20:50:33 +01:00
Sarah Hoffmann
98432395c3 add migration for upcoming change to tiger tables 2022-01-27 11:48:27 +01:00
Sarah Hoffmann
83d2c440d5 add migration for new interpolation table layout 2022-01-27 11:14:55 +01:00
Sarah Hoffmann
e6d855b954 add migration for new lookup index 2022-01-27 11:14:55 +01:00
Sarah Hoffmann
c3788d765e add consistent SPDX copyright headers 2022-01-03 16:23:58 +01:00
Sarah Hoffmann
a52ed366e4 add tests for migration 2021-12-01 20:27:40 +01:00
Sarah Hoffmann
810056349f add migration for inclusive housenumber Tiger index 2021-11-24 12:03:20 +01:00
Sarah Hoffmann
c4f5c11a4e be case-insensitve about special phrase operator 2021-10-25 19:51:20 +02:00
Sarah Hoffmann
5a1c3dbea3 fix parsing of operator in special phrases
Because of unstripped input, the operators wouldn't match.
2021-10-25 19:46:30 +02:00
Sarah Hoffmann
13e7398566 allow relative paths for log files 2021-10-25 10:26:05 +02:00
Sarah Hoffmann
0ae8d7ac08 have ADDRESS_LEVEL_CONFIG use load_sub_configuration
This means that relative paths now are looked up in the
project directory.
2021-10-22 16:36:52 +02:00
Sarah Hoffmann
c77df2d1eb replace NOMINATIM_PHRASE_CONFIG with command line option 2021-10-22 14:41:14 +02:00
Sarah Hoffmann
e8e2502e2f make word recount a tokenizer-specific function 2021-10-19 11:21:16 +02:00
Sarah Hoffmann
97a10ec218 apply variants by languages
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
2021-10-06 11:09:54 +02:00
Sarah Hoffmann
16daa57e47 unify ICUNameProcessorRules and ICURuleLoader
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
231250f2eb add wrapper class for place data passed to tokenizer
This is mostly for convenience and documentation purposes.
2021-09-29 11:54:07 +02:00
Sarah Hoffmann
8e1d4818ac use yaml config loader for country info 2021-09-04 00:22:55 +02:00
Sarah Hoffmann
7e7dd769fd remove language and partition from name import 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
79da96b369 read partition and languages from config file 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
78fcabade8 move country name generation to country_info module 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
284645f505 move generation of country tables in own module 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
87dedde5d6 allow multiple files for the import command
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
Sarah Hoffmann
e42349c963 replace add-data function with native Python code 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
14f777da18 use psycopg's SQL quoting where possible
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67 add helper function for execute_values
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00
Sarah Hoffmann
06602b4ec0 provide wrapper function for DROP TABLE
Use psycopg2 formatting to ensure correct quoting.
2021-07-12 20:32:46 +02:00
Sarah Hoffmann
cf98cff2a1 more formatting fixes
Found by flake8.
2021-07-12 17:45:42 +02:00
Sarah Hoffmann
fff0012249 simplify website setup code
Use formaat strings and move variable quoting code into extra
function.
2021-07-12 11:41:05 +02:00
Sarah Hoffmann
d5a1883b62 avoid repeated patterns for table name 2021-07-12 11:33:09 +02:00
Sarah Hoffmann
a08ef43e40 simplify if statements 2021-07-12 11:28:47 +02:00
Sarah Hoffmann
3661f7a321 avoid multiple returns of same value
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
a2edbbf78a cannot use capture_output in subprocess.run
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
AntoJvlt
3676310efe Improved performance of the postcodes query and some code cleaning 2021-06-12 15:46:08 +02:00
AntoJvlt
1c175e3a67 Clean and update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
47fb7cd3a8 Use place_exists() into can_compute() for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
a4733eed90 Use place instead of placex to compute postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
799a4c9ab6 Documentation update and small code fixes 2021-05-18 22:35:21 +02:00
AntoJvlt
3206bf59df Resolve conflicts 2021-05-17 13:52:35 +02:00
AntoJvlt
8b8dfc46eb Added --no-replace command for special phrases importation and added corresponding tests 2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed Code cleaning and SPLoader deleted 2021-05-16 16:59:12 +02:00
Sarah Hoffmann
925726222f
Merge pull request #2323 from darkshredder/disable-search-reverse-only
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
7d621389ee adapt tests to new TIGER CSV format 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c use tokenizer during Tiger data import
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5 feat: Added reverse-only-search validation 2021-05-14 02:36:21 +05:30
Sarah Hoffmann
5feece64c1 use WorkerPool for Tiger data import
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
63e35574d4
Merge pull request #2324 from lonvia/generic-external-postcodes
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7 fix token_info migration
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75 ignore invalid coordinates in external postcodes 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24 ignore entries without country code 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4 correctly handle removing all postcodes for country 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54 index postcodes after refreshing 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234 add and extend tests for new postcode handling 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83 move filling of postcode table to python
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.

External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
AntoJvlt
9d83da830f Introduction of SPCsvLoader to load special phrases from a csv file 2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57 Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader 2021-05-10 21:49:31 +02:00
Sarah Hoffmann
36c624ec71 commit between migrations
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
fc995ea6b9 move database check for module to tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057 boilerplate for PHP code of tokenizer
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
7cb7cf848d move amenity creation to tokenizer
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e move default country name creation to tokenizer
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e move postcode normalization into tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60 move houseunumber handling to tokenizer
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd add extra column for tokenizer
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399 move word table and normalisation SQL into tokenizer
Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c add migration for configurable tokenizer
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f move module installation to legacy tokenizer 2021-04-30 11:29:57 +02:00
Sarah Hoffmann
185d369404 remove support for AUX housenumber tables
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6
Merge pull request #2299 from lonvia/update-actions
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112
Merge pull request #2291 from AntoJvlt/special-phrases-statistics
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a do not check for extra housenumber index for reverse-only
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
4457bf7528 avoid Path in subprocess parameters
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
AntoJvlt
abb3d56b20 Switching to log info and only send warning for invalid phrases 2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0 Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class 2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2 reorganization of folder/file for the special phrases importer 2021-04-25 17:57:42 +02:00
Sarah Hoffmann
b951b11336 fix pylint complaints 2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9 pylint: disable check too-few-public-methods 2021-04-24 11:39:44 +02:00
Sarah Hoffmann
9c51c133f7 indexes with includes are not available for postgresql < 11 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7 use pathlib version of open 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99 subprocess needs string argument
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4 use more generic ImportError to check for module
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
79d55357e8 simplify sql and website creation functions 2021-04-19 10:53:30 +02:00
Sarah Hoffmann
4fa6c0ad53 simplify constructor for SQL preprocessor
Use sql path from config.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
8f63f9516b simplify interface for adding tiger data
Also simplifies tests using existing fixtures.
2021-04-19 10:26:25 +02:00
AntoJvlt
b2ae715699 Only log a warning if a wrong input is detected on the wiki while importing special phrases 2021-04-17 20:19:39 +02:00
AntoJvlt
a95c748363 Fix occurence regex 2021-04-17 19:24:13 +02:00
Sarah Hoffmann
886a01c796 port function to compute initial postcodes to Python 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
76b1885595 use absolute imports in Python code
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Sarah Hoffmann
c64193f839
Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate
Implemented auto update of special phrases while importing them
2021-04-15 10:13:25 +02:00
Sarah Hoffmann
e90adfc7c3 adapt database check to new index layout 2021-04-14 17:52:59 +02:00
Sarah Hoffmann
16267dc021 add migration for new placenode geometry index 2021-04-14 17:52:59 +02:00
Darkshredder
49ee7505ed Fix: Removed error if endstatement is wrong and improved tests 2021-04-13 15:44:12 +05:30
AntoJvlt
ae2b2cb9a5 Tests added for the auto update of special phrases during import 2021-04-12 14:35:29 +02:00
AntoJvlt
8c2f287ce4 Implemented auto update of special phrases while importing them 2021-04-12 14:30:48 +02:00
AntoJvlt
5ecae10713 Fix default languages loading 2021-04-11 22:26:31 +02:00
Sarah Hoffmann
71564fa1de split LANGUAGES parameter before use
The user supplies the languages as a comma-separated list.
2021-04-09 17:48:28 +02:00
Sarah Hoffmann
96b0699621 add migration for transliterated housenumbers 2021-04-04 15:26:47 +02:00
AntoJvlt
cde9389e75 Errors fixes, Cleaning code, Improvement and addition of tests 2021-03-26 01:53:33 +01:00
AntoJvlt
2c19bd5ea3 Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests. 2021-03-25 21:13:57 +01:00
AntoJvlt
ff34198569 Code cleaning, tests simplification and use of python3-icu package 2021-03-23 23:56:39 +01:00
AntoJvlt
1ce8b530cd Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql. 2021-03-23 23:34:16 +01:00
AntoJvlt
6d56cbb3e8 Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file. 2021-03-23 23:30:39 +01:00
AntoJvlt
17cb59efbd Ported functions for the import of special phrases from php to python.
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
2021-03-20 19:11:50 +01:00
Sarah Hoffmann
81a6b746b8
Merge pull request #2212 from darkshredder/country-name
Ported createCountryNames() to python and Added tests
2021-03-15 09:36:06 +01:00
Sarah Hoffmann
7212fa8630 fix template variable name 2021-03-13 12:05:53 +01:00
Darkshredder
b108bd1c1e Linting fix 2021-03-12 18:28:47 +05:30
Darkshredder
077a8c1f95 refactored tests and made changes to code for easy readibility 2021-03-12 18:23:20 +05:30
Darkshredder
7a874d5b97 Ported createCountryNames() to python and added tests 2021-03-12 10:28:41 +05:30
Sarah Hoffmann
9086a794a1
Merge pull request #2204 from darkshredder/tiger-data
Ported tiger-data-import to Python and Added Tarball Support
2021-03-11 22:48:38 +01:00
Darkshredder
e5719de657 Added fixture for sql_preprocessor and fixed some issues 2021-03-11 15:39:17 +05:30
Darkshredder
64128b699a fixed linting, refactored threaded sql handling and removed importTigerData() function 2021-03-10 13:28:29 +05:30
Darkshredder
14ec83c886 Linting fixes 2021-03-08 23:10:49 +05:30
Darkshredder
122c4618b9 Linting fixes 2021-03-08 22:59:51 +05:30
Darkshredder
2af82975cd Ported tiger-data-import to python and Added Tarball Support 2021-03-08 21:57:56 +05:30
Sarah Hoffmann
764a41b973 automatic migration from 3.6 release
Adds a 'admin --migrate' command that checks for the current
database version and runs any necessary migrations. Also
has migrations going back to 3.6.
2021-03-06 16:36:57 +01:00
Sarah Hoffmann
09f4d767e4 port index creation to python
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
2021-03-04 11:11:47 +01:00
Sarah Hoffmann
eacabb0e96 move table creation to jinja-based preprocessing 2021-03-03 22:07:51 +01:00
Sarah Hoffmann
d2bd6aa78d introduce jinja2 for preprocessing SQL
Replaces various hand-crafted replacements of varying format with
a single Jinja2 templating mechanism. Allows full access to
configuration if necessary.
2021-03-03 17:51:08 +01:00
Sarah Hoffmann
4faefe156c report software version of status call 2021-03-01 16:47:19 +01:00
Sarah Hoffmann
86273f5e2a introduce database patch level for version
This will be needed later for automatic migrations.
2021-03-01 16:46:19 +01:00
Sarah Hoffmann
d14a3df10f do not truncate search_name in reverse-only mode 2021-02-27 09:46:42 +01:00
Sarah Hoffmann
dd03aeb966 bdd: use python library where possible
Replace calls to PHP scripts with direct calls into the
nominatim Python library where possible. This speed up
tests quite a bit.
2021-02-26 16:14:29 +01:00
Sarah Hoffmann
15b5906790 move setup function to python
There are still back-calls to PHP for some of the sub-steps.
These needs some larger refactoring to be moved to Python.
2021-02-26 15:02:39 +01:00
Sarah Hoffmann
57db5819ef prot load-data function to python 2021-02-25 21:32:40 +01:00
Sarah Hoffmann
c7fd0a7af4 port wikipedia importance functions to python 2021-02-25 18:42:54 +01:00
Sarah Hoffmann
32683f73c7 move import-data option to native python
This adds a new dependecy to the Python psutil package.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
f6e894a53a port database setup function to python
Hide the former PHP functions in a transition command until
they are removed.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
b93ec2522e use psql for executing sql files
This allows to run larger files without needing to keep
them in memory.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
af7226393a add function to set up libpq environment
Instead of parsing the DSN for each external libpq program we
are going to execute, provide a function that feeds them all
necessary parameters through the environment.

osm2pgsql is the first user.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
e520613362 convert connect() into a context manager 2021-02-25 18:42:54 +01:00
Sarah Hoffmann
4b32cbe518 fix return code for check database run with 'not applicable' 2021-02-19 18:32:00 +01:00
Sarah Hoffmann
389138abfe port setup-website to python 2021-02-19 17:51:06 +01:00
Sarah Hoffmann
a0ae4945cd add unit tests for new check_database code 2021-02-18 20:36:11 +01:00
Sarah Hoffmann
b169e4c88c port check-database function to python
This change also adapts the hints to use the nominatim tool.
Slightly changed checks, so that they are just as effective on
a frozen database.
2021-02-18 17:32:30 +01:00
Sarah Hoffmann
101a1f895d port freeze function to python 2021-02-17 21:43:15 +01:00
Sarah Hoffmann
c9838a02ce disable JIT and parallel execution for osm2pgsql updates again
The gazetteer output doesn't disable these functions when
writing to the place table but the triggers may contain
operations that cause misplanning for the query planner.
2021-02-16 18:23:47 +01:00
Sarah Hoffmann
fbe7be760b ignore failure to get replication date 2021-02-14 12:17:30 +01:00
Sarah Hoffmann
8ffd7d9243 remove unused BINDIR constant 2021-02-09 19:30:31 +01:00
Sarah Hoffmann
298ed11261 introduce constant for configuration directory
This replaces {data_dir}/settings throughout the code, so that
the configuration may be placed somewhere else in the directory
structure (e.g. in /etc).
2021-02-09 18:45:45 +01:00