Sarah Hoffmann
6f51c1ba33
remove code that disables processing of forward dependencies
2022-12-11 19:35:58 +01:00
Sarah Hoffmann
2abe9e6fd9
use data paths from new nominatim.paths
2022-11-27 12:15:41 +01:00
Sarah Hoffmann
b6ff697ff0
add experimental option for enabling forward dependencies
2022-11-21 14:48:00 +01:00
Sarah Hoffmann
51ed55cc32
initial flex import scripts
...
Only implements the extratags style for the moment. Tests pass
for the same behaviour as the gazetteer output. Updates still need
to be done.
2022-11-10 09:37:38 +01:00
Sarah Hoffmann
6ddb39fda3
respect socket timeout also in other replication functions
2022-11-09 09:12:37 +01:00
Sarah Hoffmann
1fdcec985a
fix timeout use for replication timeout
...
The timeout parameter is no longer taken into account since
pyosmium switched to the requests library. This adds the parameter
back.
2022-11-09 09:12:37 +01:00
Sarah Hoffmann
abf349fb0d
simplify use of secondary importance
...
The values in the raster are already normalized between 0 and 2**16,
so a simple conversion to [0, 1] will do.
Check for existance of secondary_importance table statically when
creating the SQL function. For that to work importance tables need
to be created before the functions.
2022-10-01 11:01:49 +02:00
Sarah Hoffmann
3185fad918
load views as a SQL file and rename to 'secondary importance'
...
The only requirement for secondary importance is that a raster table
comes out of it. The generic name leaves open where the data comes
from.
2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
0ab0f0ea44
Integrated OSM views into importance computation
2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
ac467c7a2d
Enhanced the implementation of OSM views GeoTIFF import functionality
2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
c85b74497b
Initial implementation of GeoTIFF import functionality
2022-10-01 11:01:49 +02:00
Sarah Hoffmann
f4d3ae6f70
consolidate indexes over geometry_sectors
...
The index over geometry_sectors are mainly used for ordering
the places which need indexing. That means they function effectively
as a TODO list. Consolodate them so that they always only contain
the places which are still to do. Also add the appropriate index
for the boundary indexing phase.
2022-09-21 10:38:58 +02:00
Sarah Hoffmann
ed3dd81d04
run final index creation in parallel
2022-09-19 11:55:25 +02:00
Tareq Al-Ahdal
465d82a92f
Integrated 'collect_os_info.py' into Nominatim's CLI tool
2022-08-13 06:18:10 +08:00
Sarah Hoffmann
83054af46f
remove typing_extensions requirement
...
The typing_extensions package is only necessary now when running mypy.
It won't be used at runtime anymore.
2022-07-18 09:55:58 +02:00
Sarah Hoffmann
a849f3c9ec
add type annotations for command line functions
2022-07-18 09:55:54 +02:00
Sarah Hoffmann
aaf2b6032e
fix uses of config.get_path() to expect None
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
4b12d52ef5
convert admin --analyse-indexing to new indexing method
...
A proper run of indexing requires the place information from the
analyzer. Add the pre-processing of place data, so the right
information is handed into the update function.
2022-07-07 16:20:08 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
61d813bfef
add get_str_list() for config
...
Converts a config value written as a comma-sparated list into
a Python list of strings.
2022-05-29 13:53:50 +02:00
Sarah Hoffmann
dc6c4bf22e
add offline import mode
...
In offline mode no attempts are made to download data from the internet.
At the moment that only concerns the computation of the database date.
It contacts the main API to get the date.
2022-05-11 15:03:02 +02:00
Sarah Hoffmann
bb2bd76f91
pylint: avoid explicit use of format() function
...
Use psycopg2 SQL formatters for SQL and formatted string literals
everywhere else.
2022-05-11 09:48:56 +02:00
Sarah Hoffmann
4e1e166c6a
add a function to return a formatted version
...
Replaces the various repeated format strings throughout the code.
2022-05-11 09:01:24 +02:00
Sarah Hoffmann
4f59644cc2
add tests for new data invalidation functions
2022-04-14 14:52:13 +02:00
Sarah Hoffmann
c3f1d34b71
add new commands for forced invalidation before indexing
2022-04-14 11:05:43 +02:00
Sarah Hoffmann
4c66c35ed6
reinit the tokenizer directory on website refresh
...
This means the project directory is usable again, once refresh --website
was run.
2022-03-20 17:49:22 +01:00
Sarah Hoffmann
c170d323d9
add tests for cleaning housenumbers
2022-01-20 23:47:20 +01:00
Sarah Hoffmann
344a2bfc1a
add new command for cleaning word tokens
...
Just pulls outdated housenumbers for the moment.
2022-01-20 20:05:15 +01:00
Sarah Hoffmann
c3788d765e
add consistent SPDX copyright headers
2022-01-03 16:23:58 +01:00
Sarah Hoffmann
54d35ddfe9
split cli tests by subcommand and extend coverage
2021-12-02 23:45:48 +01:00
Sarah Hoffmann
10e979e841
only instantiate indexer once for replication
...
Also makes sure that indexer object exists everywhere were needed.
See #2518 .
2021-11-19 14:48:58 +01:00
Sarah Hoffmann
345c812e43
better error reporting when API script does not exist
...
Check if the API script exists on the expected location before
running php-cli. This way we can add a useful hint about the
project directory.
Fixes #2513 .
2021-11-10 11:58:20 +01:00
Sarah Hoffmann
2c4b798f9b
further refactor setup to keep function small
2021-10-26 12:00:13 +02:00
Sarah Hoffmann
9934421442
make word count computation part of the import
...
Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.
Adds about an hour import time.
2021-10-26 12:00:13 +02:00
Sarah Hoffmann
1098ab732f
allow relative paths for flatnode file
2021-10-22 17:32:51 +02:00
Sarah Hoffmann
0ae8d7ac08
have ADDRESS_LEVEL_CONFIG use load_sub_configuration
...
This means that relative paths now are looked up in the
project directory.
2021-10-22 16:36:52 +02:00
Sarah Hoffmann
c77df2d1eb
replace NOMINATIM_PHRASE_CONFIG with command line option
2021-10-22 14:41:14 +02:00
Sarah Hoffmann
c1fa70639b
add new replication mode catch-up
...
This mode gets updates until the server reports no new diffs
anymore.
Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
2021-10-20 22:05:15 +02:00
Sarah Hoffmann
12643c5986
run Tiger import with parallel threads per default
2021-10-19 15:00:26 +02:00
Sarah Hoffmann
e8e2502e2f
make word recount a tokenizer-specific function
2021-10-19 11:21:16 +02:00
Sarah Hoffmann
47417d1871
update and extend man page
...
Provide extended descriptions for most subcommands.
2021-10-18 09:03:07 +02:00
Sarah Hoffmann
8e1d4818ac
use yaml config loader for country info
2021-09-04 00:22:55 +02:00
Sarah Hoffmann
7e7dd769fd
remove language and partition from name import
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
79da96b369
read partition and languages from config file
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
78fcabade8
move country name generation to country_info module
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
284645f505
move generation of country tables in own module
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
75a5c7013f
split up large setup function
2021-08-15 12:24:13 +02:00
Sarah Hoffmann
87dedde5d6
allow multiple files for the import command
...
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
Sarah Hoffmann
e42349c963
replace add-data function with native Python code
2021-07-26 10:41:37 +02:00
Sarah Hoffmann
878835e4bd
move add-data subcommand into a separate file
2021-07-25 18:14:12 +02:00
Sarah Hoffmann
cf98cff2a1
more formatting fixes
...
Found by flake8.
2021-07-12 17:45:42 +02:00
AntoJvlt
3676310efe
Improved performance of the postcodes query and some code cleaning
2021-06-12 15:46:08 +02:00
AntoJvlt
a4733eed90
Use place instead of placex to compute postcodes
2021-06-09 09:31:32 +02:00
Sarah Hoffmann
72625dc72a
call freeze after running and non-updateable import
...
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
2021-06-02 11:08:48 +02:00
Sarah Hoffmann
cc2f152d70
commit changes to replication log table
...
Fixes #2350 .
2021-05-26 11:47:08 +02:00
Sarah Hoffmann
a0e85cc17c
only initialise tokenizer for refresh functions where needed
...
Fixes #2347 .
2021-05-25 19:16:22 +02:00
AntoJvlt
3206bf59df
Resolve conflicts
2021-05-17 13:52:35 +02:00
AntoJvlt
8b8dfc46eb
Added --no-replace command for special phrases importation and added corresponding tests
2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed
Code cleaning and SPLoader deleted
2021-05-16 16:59:12 +02:00
Darkshredder
e5ffc59cd5
feat: Added reverse-only-search validation
2021-05-14 02:36:21 +05:30
Sarah Hoffmann
bf864b2c54
index postcodes after refreshing
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83
move filling of postcode table to python
...
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
AntoJvlt
9d83da830f
Introduction of SPCsvLoader to load special phrases from a csv file
2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57
Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader
2021-05-10 21:49:31 +02:00
Sarah Hoffmann
ced8f0f4a2
fix liniting issues
2021-04-30 17:59:50 +02:00
Sarah Hoffmann
388ebcbae2
move index creation for word table to tokenizer
...
This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
7cb7cf848d
move amenity creation to tokenizer
...
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e
move default country name creation to tokenizer
...
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e
move postcode normalization into tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3
require tokeinzer for indexer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399
move word table and normalisation SQL into tokenizer
...
Creating and populating the word table is now the responsibility
of the tokenizer.
The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
296a66558f
move module installation to legacy tokenizer
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
af968d4903
introduce tokenizer modules
...
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
2021-04-30 11:29:57 +02:00
AntoJvlt
1b68152fb2
reorganization of folder/file for the special phrases importer
2021-04-25 17:57:42 +02:00
Sarah Hoffmann
89c90bedb9
pylint: disable check too-few-public-methods
2021-04-24 11:39:44 +02:00
Sarah Hoffmann
79d55357e8
simplify sql and website creation functions
2021-04-19 10:53:30 +02:00
Sarah Hoffmann
d74ae669e3
add support index when continuing import at index phase
...
Indexing scans the placex table sequentially during indexing
on the initial import. That is okay because we know that all
rows need to be processed anywhere. When continuing the import,
however, a large part might already be indexed, so that the
process spends a lot of time going through rows that are no
longer of interest. Create a supporting index for all unindexed
rows to speed up the scan. This is the same index as used later
for updates.
2021-04-17 11:07:04 +02:00
Sarah Hoffmann
da98a2102a
remove transition functions from Python
2021-04-16 18:41:14 +02:00
Sarah Hoffmann
886a01c796
port function to compute initial postcodes to Python
2021-04-16 16:11:20 +02:00
Sarah Hoffmann
76b1885595
use absolute imports in Python code
...
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Darkshredder
21b1b75b08
Rebase with master
2021-03-29 14:00:45 +05:30
Sarah Hoffmann
09b2510219
Merge pull request #2228 from AntoJvlt/import-special-phrases-porting-python
...
Import special phrases porting python
2021-03-29 09:49:35 +02:00
AntoJvlt
57ce75eb67
Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'.
2021-03-26 02:22:38 +01:00
AntoJvlt
2c19bd5ea3
Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests.
2021-03-25 21:13:57 +01:00
AntoJvlt
6d56cbb3e8
Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file.
2021-03-23 23:30:39 +01:00
marc tobias
87d5883ddb
nominatim -h was priting wrong text for lookup and details
2021-03-21 16:06:41 +01:00
AntoJvlt
17cb59efbd
Ported functions for the import of special phrases from php to python.
...
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
2021-03-20 19:11:50 +01:00
Darkshredder
7a874d5b97
Ported createCountryNames() to python and added tests
2021-03-12 10:28:41 +05:30
Sarah Hoffmann
9086a794a1
Merge pull request #2204 from darkshredder/tiger-data
...
Ported tiger-data-import to Python and Added Tarball Support
2021-03-11 22:48:38 +01:00
Darkshredder
122c4618b9
Linting fixes
2021-03-08 22:59:51 +05:30
Darkshredder
2af82975cd
Ported tiger-data-import to python and Added Tarball Support
2021-03-08 21:57:56 +05:30
Sarah Hoffmann
764a41b973
automatic migration from 3.6 release
...
Adds a 'admin --migrate' command that checks for the current
database version and runs any necessary migrations. Also
has migrations going back to 3.6.
2021-03-06 16:36:57 +01:00
Sarah Hoffmann
09f4d767e4
port index creation to python
...
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
2021-03-04 11:11:47 +01:00
Sarah Hoffmann
eacabb0e96
move table creation to jinja-based preprocessing
2021-03-03 22:07:51 +01:00
Sarah Hoffmann
3a0a4b9175
save software version in the database
...
The version represents the software version that was used to
import the data.
2021-03-01 20:35:15 +01:00
Sarah Hoffmann
b4f64aa770
make sure that calls to PHP legacy scripts are fatal on error
2021-03-01 16:10:45 +01:00
Sarah Hoffmann
dd03aeb966
bdd: use python library where possible
...
Replace calls to PHP scripts with direct calls into the
nominatim Python library where possible. This speed up
tests quite a bit.
2021-02-26 16:14:29 +01:00
Sarah Hoffmann
15b5906790
move setup function to python
...
There are still back-calls to PHP for some of the sub-steps.
These needs some larger refactoring to be moved to Python.
2021-02-26 15:02:39 +01:00
Sarah Hoffmann
57db5819ef
prot load-data function to python
2021-02-25 21:32:40 +01:00
Sarah Hoffmann
3c186f8030
add a function for the intial indexing run
...
Also moves postcodes to fully parallel indexing.
2021-02-25 18:42:54 +01:00