Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-29 16:42:23 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	893490f94e	add more tests for legacy tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fa2bc60468	introduce name analyzer The name analyzer is the actual work horse of the tokenizer. It is instantiated on a thread-base and provides all functions for analysing names and queries.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	296a66558f	move module installation to legacy tokenizer	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	af968d4903	introduce tokenizer modules This adds the boilerplate for selecting configurable tokenizers. A tokenizer can be chosen at import time and will then install itself such that it is fixed for the given database import even when the software itself is updated. The legacy tokenizer implements Nominatim's traditional algorithms.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	185d369404	remove support for AUX housenumber tables These tables have never been actively maintained and the code is completely untested. With the upcomming changes, it is unlikely that the code remains usable. This removes the aux tables and all code that references them.	2021-04-30 10:08:29 +02:00
AntoJvlt	1b68152fb2	reorganization of folder/file for the special phrases importer	2021-04-25 17:57:42 +02:00
Sarah Hoffmann	9685c68e30	replace usages of fromisoformat() with strptime() fromisoformat was only introduced with Python 3.7 while we still support Python 3.5. Fixes #2292.	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	50b6d7298c	factor out async connection handling into separate class Also adds a test for reconnecting regularly while indexing.	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	b88b952f56	simplify token precomputation Rename function to reflect that it is only used for precomputation. The token IDs are not really needed, so don't bother to compute the array of tokens.	2021-04-19 17:24:19 +02:00
Darkshredder	1f898405a6	Fix: tiger-data tarfile test	2021-04-19 16:02:52 +05:30
Sarah Hoffmann	79d55357e8	simplify sql and website creation functions	2021-04-19 10:53:30 +02:00
Sarah Hoffmann	4fa6c0ad53	simplify constructor for SQL preprocessor Use sql path from config.	2021-04-19 10:26:25 +02:00
Sarah Hoffmann	8f63f9516b	simplify interface for adding tiger data Also simplifies tests using existing fixtures.	2021-04-19 10:26:25 +02:00
AntoJvlt	b2ae715699	Only log a warning if a wrong input is detected on the wiki while importing special phrases	2021-04-17 20:19:39 +02:00
AntoJvlt	ec859e41c6	Cleaned tests and add database cleaning tests on test_import_from_wiki	2021-04-17 19:23:33 +02:00
Sarah Hoffmann	2ca11ccc6b	add tests for continuing import	2021-04-17 11:10:36 +02:00
Sarah Hoffmann	0f11e311c4	add test for new postcode import function	2021-04-16 16:11:20 +02:00
Sarah Hoffmann	c64193f839	Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate Implemented auto update of special phrases while importing them	2021-04-15 10:13:25 +02:00
Darkshredder	49ee7505ed	Fix: Removed error if endstatement is wrong and improved tests	2021-04-13 15:44:12 +05:30
AntoJvlt	ae2b2cb9a5	Tests added for the auto update of special phrases during import	2021-04-12 14:35:29 +02:00
AntoJvlt	e82de99e5a	Cleaned tests of exceptions and fix phrase_settings.json test file name.	2021-03-29 22:07:29 +02:00
AntoJvlt	57ce75eb67	Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'.	2021-03-26 02:22:38 +01:00
AntoJvlt	cde9389e75	Errors fixes, Cleaning code, Improvement and addition of tests	2021-03-26 01:53:33 +01:00
AntoJvlt	2c19bd5ea3	Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests.	2021-03-25 21:13:57 +01:00
AntoJvlt	ff34198569	Code cleaning, tests simplification and use of python3-icu package	2021-03-23 23:56:39 +01:00
AntoJvlt	1ce8b530cd	Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql.	2021-03-23 23:34:16 +01:00
AntoJvlt	17cb59efbd	Ported functions for the import of special phrases from php to python. - the command is now --import-special-phrases - the output is not an sql file anymore, data are directly imported to the database. - the little part on the documentation (section data import) has been modified.	2021-03-20 19:11:50 +01:00
Darkshredder	077a8c1f95	refactored tests and made changes to code for easy readibility	2021-03-12 18:23:20 +05:30
Darkshredder	7a874d5b97	Ported createCountryNames() to python and added tests	2021-03-12 10:28:41 +05:30
Darkshredder	e5719de657	Added fixture for sql_preprocessor and fixed some issues	2021-03-11 15:39:17 +05:30
Darkshredder	8486a83cf5	Added test for tarfile	2021-03-10 18:14:17 +05:30
Darkshredder	ccfad57fca	Added test and removed runlegacyscript	2021-03-10 17:18:12 +05:30
Sarah Hoffmann	09f4d767e4	port index creation to python Also switches to jinja-based preprocessing, which allows to simplify the SQL files. Use 'if not exists' where possible so that the step can be rerun to fix missing indexes.	2021-03-04 11:11:47 +01:00
Sarah Hoffmann	eacabb0e96	move table creation to jinja-based preprocessing	2021-03-03 22:07:51 +01:00
Sarah Hoffmann	d2bd6aa78d	introduce jinja2 for preprocessing SQL Replaces various hand-crafted replacements of varying format with a single Jinja2 templating mechanism. Allows full access to configuration if necessary.	2021-03-03 17:51:08 +01:00
Sarah Hoffmann	7ae9c3a9f0	add database_version setting to tests	2021-03-01 21:49:33 +01:00
Sarah Hoffmann	3a0a4b9175	save software version in the database The version represents the software version that was used to import the data.	2021-03-01 20:35:15 +01:00
Sarah Hoffmann	db663dd92f	remove unused import	2021-03-01 09:26:08 +01:00
Sarah Hoffmann	90a5d23016	use tmp_path fixture in config tests	2021-03-01 09:24:04 +01:00
Sarah Hoffmann	afabbeb546	older versions of Postgresql need explicit return type	2021-02-27 09:46:42 +01:00
Sarah Hoffmann	15b5906790	move setup function to python There are still back-calls to PHP for some of the sub-steps. These needs some larger refactoring to be moved to Python.	2021-02-26 15:02:39 +01:00
Sarah Hoffmann	3c186f8030	add a function for the intial indexing run Also moves postcodes to fully parallel indexing.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	c7fd0a7af4	port wikipedia importance functions to python	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	32683f73c7	move import-data option to native python This adds a new dependecy to the Python psutil package.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	7222235579	introduce custom object for cmdline arguments Allows to define special functions over the arguments. Also splits CLI tests in two files as they have become too many.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	f6e894a53a	port database setup function to python Hide the former PHP functions in a transition command until they are removed.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	b93ec2522e	use psql for executing sql files This allows to run larger files without needing to keep them in memory.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	af7226393a	add function to set up libpq environment Instead of parsing the DSN for each external libpq program we are going to execute, provide a function that feeds them all necessary parameters through the environment. osm2pgsql is the first user.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	e520613362	convert connect() into a context manager	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	a1f0fc1a10	improve deadlock detection for various versions of psycopg2 Psycopg2 has changed the kind of exception that is emitted on deadlocks between versions 2.7 and 2.8. The code was already trying to catch both kind of errors but because the psycopg2.errors package is unknown in 2.7 and below, the code would throw an exception on anything but a deadlock error. This commit wraps the deadlock handling into a context manager to avoid code duplication and uses module imports to detect if the new error codes are available. Also sets the required psycopg2 version to 2.7 or bigger as versions below are difficult to test.	2021-02-25 18:11:16 +01:00
Sarah Hoffmann	389138abfe	port setup-website to python	2021-02-19 17:51:06 +01:00
Sarah Hoffmann	a0ae4945cd	add unit tests for new check_database code	2021-02-18 20:36:11 +01:00
Sarah Hoffmann	b169e4c88c	port check-database function to python This change also adapts the hints to use the nominatim tool. Slightly changed checks, so that they are just as effective on a frozen database.	2021-02-18 17:32:30 +01:00
Sarah Hoffmann	101a1f895d	port freeze function to python	2021-02-17 21:43:15 +01:00
Sarah Hoffmann	fbe7be760b	ignore failure to get replication date	2021-02-14 12:17:30 +01:00
Sarah Hoffmann	7cc4c53adb	always return 0 for updates unless there is an error This is more in line with previous behavioru than returning a status code when no updates are available.	2021-02-11 10:33:49 +01:00
Sarah Hoffmann	0e0e9a6809	need test database for analysing cli test	2021-02-10 16:19:51 +01:00
Sarah Hoffmann	c60a0784ea	adapt unit tests to new directory structure	2021-02-09 20:13:00 +01:00
Sarah Hoffmann	b9517c99ae	rename sql directory to lib-sql Also introduces a separate constant for the sql directory, so that it can be put separately from the rest of the data if required.	2021-02-09 15:26:56 +01:00
Sarah Hoffmann	db3ced17bb	rename lib to lib-php	2021-02-09 11:52:07 +01:00
Sarah Hoffmann	d81e152804	integrate analyse of indexing into nominatim tool	2021-02-08 22:22:49 +01:00
Sarah Hoffmann	0cbf98c020	consolidate warm and db-check into single admin command	2021-02-08 21:05:06 +01:00
Sarah Hoffmann	195f9f5ef3	split cli.py by subcommands Reduces file size below 1000 lines.	2021-02-08 17:23:05 +01:00
Sarah Hoffmann	0b2abfb115	replace make serve with nominatim serve command With the website directory now tied to the project directory instead of the build directory, it is no longer possible to use make for running the web server.	2021-02-03 16:34:31 +01:00
Sarah Hoffmann	cb06d1f4ca	do not overwrite custom set module paths Given that the module is now copied to the project directory when no module path is set, we need the information that the module path is empty. Therefore hand in the default module path in a separate variable.	2021-02-02 18:31:25 +01:00
Sarah Hoffmann	5f63d4ca1f	print nice summary after updates	2021-02-01 10:34:31 +01:00
Sarah Hoffmann	e629a175ed	introduce custom UsageError This is a exception to be thrown when the error occures because of bad user data. We don't want to print a full stack trace in these cases but just tell the user what went wrong.	2021-01-30 16:20:10 +01:00
Sarah Hoffmann	4cb6dc01f3	port replication update function to python	2021-01-30 15:50:34 +01:00
Sarah Hoffmann	8f0885f6cb	port check-for-update function to python	2021-01-28 14:50:14 +01:00
Sarah Hoffmann	d78f0ba804	port replication initialisation to Python	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	5b46fcad8e	convert functon creation to python The new functions always creates normal and partitioned functions. Also adds specialised connection and cursor classes for adding frequently used helper functions.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	94fa7162be	port address level computation to Python Also adds simple tests for correct table creation.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	e6c2842b66	move update code for postcode and word count to Python Adds also tests for the new function to execute a SQL script.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	e6d9485c4a	cli: import python modules for commands on demand Given that only one command will be executed in the end, it is not necessary to import what amounts to the whole library. This becomes in particular important for update functions that have a dependency on pyosmium. The dependency can remain optional for people not using updates.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	063a4cb403	cli indexer tests need a fake database The Indexer constructor opens a connection to the given database.	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	42ec67f63c	add more tests for CLI parameter parser	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	8c02786820	add tests for indexer	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	c26f323bf5	add simple tests for CLI parsing	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	bfa6580ad5	use pytest mocking functions for manipulating os.environ	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	52b76d1d01	add tests for Python exec_utils	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	b79c79fa73	add function to get a DSN for psycopg Converts the PHP DSN syntax into psycopg syntax when necessary.	2021-01-18 15:43:27 +01:00
Sarah Hoffmann	eb3b789855	add initial pytest test for Configuration	2021-01-15 14:42:03 +01:00

... 4 5 6 7 8

392 Commits