Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-12-28 23:42:59 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	5a36559834	move flatten_config_list into config module For general usage by other modules.	2021-10-04 11:56:54 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	5e5addcdbf	fix typo	2021-09-29 14:16:09 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	bb18479d5b	remove unused parameter	2021-09-27 14:58:43 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	b894d2c04a	fix indent	2021-09-04 10:30:35 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00
Sarah Hoffmann	28c98584c1	add tests for generic YAML config reader	2021-09-03 22:31:30 +02:00
Sarah Hoffmann	1c42780bb5	introduce generic YAML config loader Adds a function to the Configuration class to load a YAML file. This means that searching for the file is generalised and works the same now for all configuration files. Changes the search logic, so that it is always possible to have a custom version of the configuration file in the project directory. Move ICU tokenizer to use new load function.	2021-09-03 18:20:07 +02:00
Sarah Hoffmann	7e7dd769fd	remove language and partition from name import	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	78fcabade8	move country name generation to country_info module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	284645f505	move generation of country tables in own module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	90b40fc3e6	define formal public Python interface for tokenizer This introduces an abstract class for the Tokenizer/Analyzer for documentation purposes.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	75a5c7013f	split up large setup function	2021-08-15 12:24:13 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
Sarah Hoffmann	d48793c22c	fix Python linitin errors	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	70f154be8b	switch word tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	4342b28882	switch special phrases to new word table format	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5394b1fa1b	switch postcode tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5ab0a63fd6	switch housenumber tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1618aba5f2	switch country name tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	8377528952	new word table layout for icu tokenizer The table now directly reflects the different token types. Extra information is saved in a json structure that may be dynamically extended in the future without affecting the table layout.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42349c963	replace add-data function with native Python code	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	2c8242c8df	remove special code for pre9.5 postgresql 9.5 is now the minimum requirement.	2021-07-19 10:24:57 +02:00
Sarah Hoffmann	e7d6f89aca	increase minimum version for PostgreSQL to 9.5 This is the minimum version we can test with the CI. With 9.5 there is also complete support for jsonb available.	2021-07-19 10:21:19 +02:00
Sarah Hoffmann	14f777da18	use psycopg's SQL quoting where possible Use the SQL formatting supplied with psycopg whenever the query needs to be put together from snippets.	2021-07-12 22:05:22 +02:00
Sarah Hoffmann	6f6681ce67	add helper function for execute_values Make psycopg2's convenience function accessible through the cursor.	2021-07-12 21:08:20 +02:00
Sarah Hoffmann	06602b4ec0	provide wrapper function for DROP TABLE Use psycopg2 formatting to ensure correct quoting.	2021-07-12 20:32:46 +02:00
Sarah Hoffmann	cf98cff2a1	more formatting fixes Found by flake8.	2021-07-12 17:45:42 +02:00
Sarah Hoffmann	f8b5a63de3	factor out connection reset code	2021-07-12 14:58:44 +02:00
Sarah Hoffmann	568316f07c	simplify analyse function	2021-07-12 14:47:50 +02:00
Sarah Hoffmann	daa597b300	split up variant computation for better readability	2021-07-12 14:43:50 +02:00
Sarah Hoffmann	47adb2a3fc	reorganise process_place function Move address processing into its own function as it is rather extensive.	2021-07-12 11:57:55 +02:00
Sarah Hoffmann	fff0012249	simplify website setup code Use formaat strings and move variable quoting code into extra function.	2021-07-12 11:41:05 +02:00
Sarah Hoffmann	d5a1883b62	avoid repeated patterns for table name	2021-07-12 11:33:09 +02:00
Sarah Hoffmann	a08ef43e40	simplify if statements	2021-07-12 11:28:47 +02:00
Sarah Hoffmann	3661f7a321	avoid multiple returns of same value Found by Sonarqube.	2021-07-11 18:23:42 +02:00
Sarah Hoffmann	a2edbbf78a	cannot use capture_output in subprocess.run Only available since Python 3.7.	2021-07-06 22:57:42 +02:00
Sarah Hoffmann	1e86dc1d93	remove default parameter for namedtuple This is only available in Python 3.7.	2021-07-06 22:57:42 +02:00
Sarah Hoffmann	62d5984b1b	limit the number of variants that can be produced	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	c32551b4e0	restrict partial word counting to names of reasoanble length The partial word count does not split names to save a bit of time. The result is that it might enounter unreasonably long names which in truth consist of multiple words. No accurate statistics are needed so simply restrict the count to words shorter than 75 characters.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	7b0f6b7905	leave ICU variant properties empty for now Saving unused properties causes unnecessary duplicates.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	9ff4f66f55	complete tests for icu tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	32ca631b74	fix full term token in special phrases	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	6ba00e6aee	icu tokenizer: move transliteration rules in separate file The tokenizer configuration has become difficult to handle due to the additional manual transliteration rules. Allow to have a separate rule file that is given to the ICU library as is.	2021-07-04 10:28:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	47fb7cd3a8	Use place_exists() into can_compute() for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	a4733eed90	Use place instead of placex to compute postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	72625dc72a	call freeze after running and non-updateable import Some of the tables will have already been removed but the tables for indexing are still there and should be dropped.	2021-06-02 11:08:48 +02:00
Sarah Hoffmann	cc2f152d70	commit changes to replication log table Fixes #2350.	2021-05-26 11:47:08 +02:00
Sarah Hoffmann	a0e85cc17c	only initialise tokenizer for refresh functions where needed Fixes #2347.	2021-05-25 19:16:22 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	fa3e48c59f	use make_keywords for place search terms also Ensures that place indeed uses the same search names as other names.	2021-05-23 23:08:11 +02:00
Sarah Hoffmann	16bb007135	Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer Do not hide errors when importing tokenizer	2021-05-18 23:00:10 +02:00
AntoJvlt	799a4c9ab6	Documentation update and small code fixes	2021-05-18 22:35:21 +02:00
Sarah Hoffmann	b2722650d4	do not hide errors when importing tokenizer Explicitly check for the tokenizer source file to check that the name is correct. We can't use the import error for that because it hides other import errors like a missing library. Fixes #2327.	2021-05-18 16:28:21 +02:00
AntoJvlt	3206bf59df	Resolve conflicts	2021-05-17 13:52:35 +02:00
AntoJvlt	8b8dfc46eb	Added --no-replace command for special phrases importation and added corresponding tests	2021-05-17 13:25:06 +02:00
AntoJvlt	06aab389ed	Code cleaning and SPLoader deleted	2021-05-16 16:59:12 +02:00
Sarah Hoffmann	925726222f	Merge pull request #2323 from darkshredder/disable-search-reverse-only Feat: Disabled search API for --reverse-only imports	2021-05-14 10:40:22 +02:00
Sarah Hoffmann	7d621389ee	adapt tests to new TIGER CSV format	2021-05-14 00:02:50 +02:00
Sarah Hoffmann	35efe3b41c	use tokenizer during Tiger data import This also changes the required import format to CSV.	2021-05-14 00:02:50 +02:00
Darkshredder	e5ffc59cd5	feat: Added reverse-only-search validation	2021-05-14 02:36:21 +05:30
Sarah Hoffmann	5feece64c1	use WorkerPool for Tiger data import Requires adding an option that SQL errors are ignored.	2021-05-13 20:36:50 +02:00
Sarah Hoffmann	b9a09129fa	move WorkerPool into db module The pool is independent of the indexer and may also be used by other parts of the software.	2021-05-13 17:11:17 +02:00
Sarah Hoffmann	fc860787dd	do not preload postcodes This is too expensive for updates.	2021-05-13 16:14:12 +02:00
Sarah Hoffmann	63e35574d4	Merge pull request #2324 from lonvia/generic-external-postcodes Rework postcode handling and generalised external postcode support	2021-05-13 14:52:19 +02:00
Sarah Hoffmann	db2dbf15f7	fix token_info migration A bad indent meant that only one table received the new column.	2021-05-13 14:31:41 +02:00
Sarah Hoffmann	f5977dac75	ignore invalid coordinates in external postcodes	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	8f2746fe24	ignore entries without country code	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	1ccd4360b4	correctly handle removing all postcodes for country	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	bf864b2c54	index postcodes after refreshing	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	4abaf71234	add and extend tests for new postcode handling	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	a4aba23a83	move filling of postcode table to python The Python code now takes care of reading postcodes from placex, enhancing them with potentially existing external postcodes and updating location_postcodes accordingly. The initial setup and updates use exactly the same function. External postcode handling has been generalized. External postcodes for any country are now accepted. The format of the external postcode file has changed. We now expect CSV, potentially gzipped. The postcodes are no longer saved in the database.	2021-05-13 14:15:42 +02:00
AntoJvlt	9d83da830f	Introduction of SPCsvLoader to load special phrases from a csv file	2021-05-10 23:26:39 +02:00
AntoJvlt	00959fac57	Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader	2021-05-10 21:49:31 +02:00
Sarah Hoffmann	872ab91421	fix name of transliterator Should be different from the normalisation rules.	2021-05-05 17:09:38 +02:00
Sarah Hoffmann	a263e54b94	enable BDD tests for different tokenizers The tokenizer to be used can be choosen with -DTOKENIZER. Adapt all tests, so that they work with legacy_icu tokenizer. Move lookup in word table to a function in the tokenizer. Special phrases are temporarily imported from the wiki until we have an implementation that can import from file. TIGER tests do not work yet.	2021-05-05 10:31:51 +02:00
Sarah Hoffmann	18c99a5c5f	add unit tests for legacy ICU tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	d55fc39275	cache translieration results	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	ba8ed7967d	add PHP part for new ICU-base tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	f44af49df9	add Python part for new ICU-based tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	36c624ec71	commit between migrations Later migrations may require tables set up by older ones.	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	7fd871a74d	increase database version for tokenizer migration	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	ced8f0f4a2	fix liniting issues	2021-04-30 17:59:50 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	20891abe1c	indexer: fetch extra place data asynchronously The indexer now fetches any extra data besides the place_id asynchronously while processing the places from the last batch. This also means that more places are now fetched at once.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	6ce6f62b8e	fetch place info asynchronously	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	602728895e	indexer: fetch ids in batches	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	dc700c25b6	cache all postcodes	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	0ba93e5ba9	reorganise address iteration in tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9e92759ac7	extract address tokens in tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d8ed1bfc60	move houseunumber handling to tokenizer Normalization and token computation are now done in the tokenizer. The tokenizer keeps a cache to the hundred most used house numbers to keep the numbers of calls to the database low.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d711f5a81e	move name token creation into tokenizer Name tokens are now handed in via token_info and used from there. Also moves the generic search name insertion function back to placex_triggers.sql.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fa2bc60468	introduce name analyzer The name analyzer is the actual work horse of the tokenizer. It is instantiated on a thread-base and provides all functions for analysing names and queries.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	a73711f3cd	add extra column for tokenizer Add a jsonb column to the placex and location_property_osmline tables which can be used by the installed tokenizer as required. No other part of the software will use or otherwise rely on this column.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	b5540dc35c	add migration for configurable tokenizer Adds a migration that initialises a legacy tokenizer for an existing database. The migration is not active yet as it will need completion when more functionality is added to the legacy tokenizer.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	296a66558f	move module installation to legacy tokenizer	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	af968d4903	introduce tokenizer modules This adds the boilerplate for selecting configurable tokenizers. A tokenizer can be chosen at import time and will then install itself such that it is fixed for the given database import even when the software itself is updated. The legacy tokenizer implements Nominatim's traditional algorithms.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	185d369404	remove support for AUX housenumber tables These tables have never been actively maintained and the code is completely untested. With the upcomming changes, it is unlikely that the code remains usable. This removes the aux tables and all code that references them.	2021-04-30 10:08:29 +02:00
Sarah Hoffmann	51d20b19b6	Merge pull request #2299 from lonvia/update-actions Fix database check for reverse-only	2021-04-27 12:18:45 +02:00
Sarah Hoffmann	46e8c6b112	Merge pull request #2291 from AntoJvlt/special-phrases-statistics Special phrases statistics	2021-04-27 11:57:05 +02:00
Sarah Hoffmann	c8fb25201a	do not check for extra housenumber index for reverse-only Also adds a database check for reverse only import to the CI.	2021-04-27 10:14:26 +02:00
Sarah Hoffmann	4457bf7528	avoid Path in subprocess parameters Not supported by Python 3.5.	2021-04-26 10:55:23 +02:00
AntoJvlt	abb3d56b20	Switching to log info and only send warning for invalid phrases	2021-04-25 17:57:43 +02:00
AntoJvlt	c5ecb9bae0	Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class	2021-04-25 17:57:43 +02:00
AntoJvlt	1b68152fb2	reorganization of folder/file for the special phrases importer	2021-04-25 17:57:42 +02:00
Sarah Hoffmann	b951b11336	fix pylint complaints	2021-04-24 11:59:32 +02:00
Sarah Hoffmann	89c90bedb9	pylint: disable check too-few-public-methods	2021-04-24 11:39:44 +02:00
Sarah Hoffmann	9c51c133f7	indexes with includes are not available for postgresql < 11	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	91d2fb6b1c	use group() for regex matches Needed for compatibility with Python 3.5.	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	280406c0d7	use pathlib version of open	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	d5fc3b5e99	subprocess needs string argument Compatibility change for Python 3.5.	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	f8f8c7e534	check for existance of custom .env before opening	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	3a642d50a4	use more generic ImportError to check for module ModuleNotFoundError was only introduced in Python 3.6.	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	9685c68e30	replace usages of fromisoformat() with strptime() fromisoformat was only introduced with Python 3.7 while we still support Python 3.5. Fixes #2292.	2021-04-23 22:50:08 +02:00
RhinoDevel	b7bae80616	Replace "nominatim-update" with "nominatim". If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".	2021-04-22 15:40:22 +02:00
Sarah Hoffmann	f7e4aa51d3	indexer: reset query counter Reset the counter for queries after the asynchronous connections have been reopened.	2021-04-21 10:33:45 +02:00
Sarah Hoffmann	50b6d7298c	factor out async connection handling into separate class Also adds a test for reconnecting regularly while indexing.	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	26a81654a8	indexer: make self.conn function-local Also switches to our internal connect function which gives us a cursor with a sclar() function.	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	6430371d7d	make index() function private	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	18705b3f18	move analyse function into indexinf function	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	c6bd2bb7fb	indexer: move runner into separate file	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	79d55357e8	simplify sql and website creation functions	2021-04-19 10:53:30 +02:00
Sarah Hoffmann	4fa6c0ad53	simplify constructor for SQL preprocessor Use sql path from config.	2021-04-19 10:26:25 +02:00
Sarah Hoffmann	8f63f9516b	simplify interface for adding tiger data Also simplifies tests using existing fixtures.	2021-04-19 10:26:25 +02:00
Sarah Hoffmann	995ba2c7c2	add library directories to config Allows to reduce the number of parameters in functions that take the config anyway.	2021-04-19 10:26:25 +02:00
AntoJvlt	b2ae715699	Only log a warning if a wrong input is detected on the wiki while importing special phrases	2021-04-17 20:19:39 +02:00
AntoJvlt	a95c748363	Fix occurence regex	2021-04-17 19:24:13 +02:00
Sarah Hoffmann	d74ae669e3	add support index when continuing import at index phase Indexing scans the placex table sequentially during indexing on the initial import. That is okay because we know that all rows need to be processed anywhere. When continuing the import, however, a large part might already be indexed, so that the process spends a lot of time going through rows that are no longer of interest. Create a supporting index for all unindexed rows to speed up the scan. This is the same index as used later for updates.	2021-04-17 11:07:04 +02:00
Sarah Hoffmann	da98a2102a	remove transition functions from Python	2021-04-16 18:41:14 +02:00
Sarah Hoffmann	886a01c796	port function to compute initial postcodes to Python	2021-04-16 16:11:20 +02:00
Sarah Hoffmann	76b1885595	use absolute imports in Python code Relative imports are no longer officially recommended.	2021-04-16 14:20:09 +02:00
Sarah Hoffmann	c64193f839	Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate Implemented auto update of special phrases while importing them	2021-04-15 10:13:25 +02:00
Sarah Hoffmann	e90adfc7c3	adapt database check to new index layout	2021-04-14 17:52:59 +02:00
Sarah Hoffmann	16267dc021	add migration for new placenode geometry index	2021-04-14 17:52:59 +02:00
Darkshredder	49ee7505ed	Fix: Removed error if endstatement is wrong and improved tests	2021-04-13 15:44:12 +05:30
AntoJvlt	ae2b2cb9a5	Tests added for the auto update of special phrases during import	2021-04-12 14:35:29 +02:00
AntoJvlt	8c2f287ce4	Implemented auto update of special phrases while importing them	2021-04-12 14:30:48 +02:00
AntoJvlt	5ecae10713	Fix default languages loading	2021-04-11 22:26:31 +02:00
Sarah Hoffmann	71564fa1de	split LANGUAGES parameter before use The user supplies the languages as a comma-separated list.	2021-04-09 17:48:28 +02:00
Sarah Hoffmann	492186716f	prepare 3.7.0 release	2021-04-06 21:23:29 +02:00
Sarah Hoffmann	96b0699621	add migration for transliterated housenumbers	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	8d8b1d4307	use non-key index to speed up housenumber search On Postgresql versions 11+ add an index to speed up the lookup of housenumbers for terms found in search_name. This is really just a band-aid around the query planer's interpretation of the query.	2021-04-01 17:10:44 +02:00
Darkshredder	b7d6ae93e3	Nominatim/cli.py rebase fixes	2021-03-29 14:16:41 +05:30
Darkshredder	21b1b75b08	Rebase with master	2021-03-29 14:00:45 +05:30
Darkshredder	51e2654cd2	Added Manual page and fixed documentation	2021-03-29 13:57:13 +05:30
Sarah Hoffmann	09b2510219	Merge pull request #2228 from AntoJvlt/import-special-phrases-porting-python Import special phrases porting python	2021-03-29 09:49:35 +02:00
AntoJvlt	57ce75eb67	Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'.	2021-03-26 02:22:38 +01:00
AntoJvlt	cde9389e75	Errors fixes, Cleaning code, Improvement and addition of tests	2021-03-26 01:53:33 +01:00
AntoJvlt	2c19bd5ea3	Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests.	2021-03-25 21:13:57 +01:00
AntoJvlt	ff34198569	Code cleaning, tests simplification and use of python3-icu package	2021-03-23 23:56:39 +01:00
AntoJvlt	1ce8b530cd	Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql.	2021-03-23 23:34:16 +01:00
AntoJvlt	6d56cbb3e8	Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file.	2021-03-23 23:30:39 +01:00
Sarah Hoffmann	4f1bdde32e	Merge pull request #2231 from mtmail/correct-cli-help-page nominatim -h was printing wrong text for lookup and details	2021-03-21 16:52:20 +01:00
Sarah Hoffmann	a08ca5b1b5	avoid division by zero in progress meter On Windows systems the timer may not be accurate enough to measure the time between init() and done(). Avoid computing statistics with a diff time of 0 in such cases. Fixes #2230.	2021-03-21 16:47:22 +01:00
marc tobias	87d5883ddb	nominatim -h was priting wrong text for lookup and details	2021-03-21 16:06:41 +01:00
AntoJvlt	17cb59efbd	Ported functions for the import of special phrases from php to python. - the command is now --import-special-phrases - the output is not an sql file anymore, data are directly imported to the database. - the little part on the documentation (section data import) has been modified.	2021-03-20 19:11:50 +01:00
Sarah Hoffmann	81a6b746b8	Merge pull request #2212 from darkshredder/country-name Ported createCountryNames() to python and Added tests	2021-03-15 09:36:06 +01:00
Sarah Hoffmann	7212fa8630	fix template variable name	2021-03-13 12:05:53 +01:00
Darkshredder	b108bd1c1e	Linting fix	2021-03-12 18:28:47 +05:30
Darkshredder	077a8c1f95	refactored tests and made changes to code for easy readibility	2021-03-12 18:23:20 +05:30
Darkshredder	7a874d5b97	Ported createCountryNames() to python and added tests	2021-03-12 10:28:41 +05:30
Sarah Hoffmann	9086a794a1	Merge pull request #2204 from darkshredder/tiger-data Ported tiger-data-import to Python and Added Tarball Support	2021-03-11 22:48:38 +01:00
Darkshredder	e5719de657	Added fixture for sql_preprocessor and fixed some issues	2021-03-11 15:39:17 +05:30
Darkshredder	ccfad57fca	Added test and removed runlegacyscript	2021-03-10 17:18:12 +05:30
Darkshredder	64128b699a	fixed linting, refactored threaded sql handling and removed importTigerData() function	2021-03-10 13:28:29 +05:30
Darkshredder	4080fbb95c	Test fixes	2021-03-09 01:00:56 +05:30
Darkshredder	14ec83c886	Linting fixes	2021-03-08 23:10:49 +05:30
Darkshredder	122c4618b9	Linting fixes	2021-03-08 22:59:51 +05:30
Darkshredder	2af82975cd	Ported tiger-data-import to python and Added Tarball Support	2021-03-08 21:57:56 +05:30
Sarah Hoffmann	764a41b973	automatic migration from 3.6 release Adds a 'admin --migrate' command that checks for the current database version and runs any necessary migrations. Also has migrations going back to 3.6.	2021-03-06 16:36:57 +01:00
Sarah Hoffmann	9d103503f7	Merge pull request #2197 from lonvia/use-jinja-for-sql-preprocessing Use jinja2 for SQL preprocessing	2021-03-04 16:36:18 +01:00
Sarah Hoffmann	09f4d767e4	port index creation to python Also switches to jinja-based preprocessing, which allows to simplify the SQL files. Use 'if not exists' where possible so that the step can be rerun to fix missing indexes.	2021-03-04 11:11:47 +01:00
Sarah Hoffmann	dd301cf5ac	indexer: ANALYSE must be run outside transactions	2021-03-04 11:06:33 +01:00
Sarah Hoffmann	eacabb0e96	move table creation to jinja-based preprocessing	2021-03-03 22:07:51 +01:00
Sarah Hoffmann	d2bd6aa78d	introduce jinja2 for preprocessing SQL Replaces various hand-crafted replacements of varying format with a single Jinja2 templating mechanism. Allows full access to configuration if necessary.	2021-03-03 17:51:08 +01:00
Sarah Hoffmann	3a0a4b9175	save software version in the database The version represents the software version that was used to import the data.	2021-03-01 20:35:15 +01:00
Sarah Hoffmann	4faefe156c	report software version of status call	2021-03-01 16:47:19 +01:00
Sarah Hoffmann	86273f5e2a	introduce database patch level for version This will be needed later for automatic migrations.	2021-03-01 16:46:19 +01:00
Sarah Hoffmann	b4f64aa770	make sure that calls to PHP legacy scripts are fatal on error	2021-03-01 16:10:45 +01:00
Sarah Hoffmann	b46adbad22	make sure psql always finishes If an execption is raised by other means, we still have to close the stdin pipe to psql to make sure that it exits and releases its connection to the database.	2021-02-27 10:24:40 +01:00
Sarah Hoffmann	d14a3df10f	do not truncate search_name in reverse-only mode	2021-02-27 09:46:42 +01:00
Sarah Hoffmann	c7f40e3cee	fix verbose flag for PHP wrapper scripts The flag must come after the command.	2021-02-26 16:49:32 +01:00
Sarah Hoffmann	dd03aeb966	bdd: use python library where possible Replace calls to PHP scripts with direct calls into the nominatim Python library where possible. This speed up tests quite a bit.	2021-02-26 16:14:29 +01:00
Sarah Hoffmann	15b5906790	move setup function to python There are still back-calls to PHP for some of the sub-steps. These needs some larger refactoring to be moved to Python.	2021-02-26 15:02:39 +01:00
Sarah Hoffmann	3ee8d9fa75	properly close connections of indexer after use	2021-02-26 12:10:54 +01:00
Sarah Hoffmann	57db5819ef	prot load-data function to python	2021-02-25 21:32:40 +01:00
Sarah Hoffmann	3c186f8030	add a function for the intial indexing run Also moves postcodes to fully parallel indexing.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	c7fd0a7af4	port wikipedia importance functions to python	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	32683f73c7	move import-data option to native python This adds a new dependecy to the Python psutil package.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	7222235579	introduce custom object for cmdline arguments Allows to define special functions over the arguments. Also splits CLI tests in two files as they have become too many.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	f6e894a53a	port database setup function to python Hide the former PHP functions in a transition command until they are removed.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	b93ec2522e	use psql for executing sql files This allows to run larger files without needing to keep them in memory.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	af7226393a	add function to set up libpq environment Instead of parsing the DSN for each external libpq program we are going to execute, provide a function that feeds them all necessary parameters through the environment. osm2pgsql is the first user.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	e520613362	convert connect() into a context manager	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	a1f0fc1a10	improve deadlock detection for various versions of psycopg2 Psycopg2 has changed the kind of exception that is emitted on deadlocks between versions 2.7 and 2.8. The code was already trying to catch both kind of errors but because the psycopg2.errors package is unknown in 2.7 and below, the code would throw an exception on anything but a deadlock error. This commit wraps the deadlock handling into a context manager to avoid code duplication and uses module imports to detect if the new error codes are available. Also sets the required psycopg2 version to 2.7 or bigger as versions below are difficult to test.	2021-02-25 18:11:16 +01:00
Sarah Hoffmann	971df231b0	avoid os.environ as default valie	2021-02-19 19:29:57 +01:00
Sarah Hoffmann	4b32cbe518	fix return code for check database run with 'not applicable'	2021-02-19 18:32:00 +01:00
Sarah Hoffmann	f08078ccca	bdd tests: directly call python code for setup-website	2021-02-19 18:20:55 +01:00
Sarah Hoffmann	389138abfe	port setup-website to python	2021-02-19 17:51:06 +01:00
Sarah Hoffmann	a0ae4945cd	add unit tests for new check_database code	2021-02-18 20:36:11 +01:00
Sarah Hoffmann	b169e4c88c	port check-database function to python This change also adapts the hints to use the nominatim tool. Slightly changed checks, so that they are just as effective on a frozen database.	2021-02-18 17:32:30 +01:00
Sarah Hoffmann	101a1f895d	port freeze function to python	2021-02-17 21:43:15 +01:00
Sarah Hoffmann	c9838a02ce	disable JIT and parallel execution for osm2pgsql updates again The gazetteer output doesn't disable these functions when writing to the place table but the triggers may contain operations that cause misplanning for the query planner.	2021-02-16 18:23:47 +01:00
Sarah Hoffmann	fbe7be760b	ignore failure to get replication date	2021-02-14 12:17:30 +01:00
Sarah Hoffmann	7cc4c53adb	always return 0 for updates unless there is an error This is more in line with previous behavioru than returning a status code when no updates are available.	2021-02-11 10:33:49 +01:00
Sarah Hoffmann	de37dc9300	forgot to replace one occurence of sql_dir	2021-02-09 19:32:05 +01:00
Sarah Hoffmann	8ffd7d9243	remove unused BINDIR constant	2021-02-09 19:30:31 +01:00
Sarah Hoffmann	298ed11261	introduce constant for configuration directory This replaces {data_dir}/settings throughout the code, so that the configuration may be placed somewhere else in the directory structure (e.g. in /etc).	2021-02-09 18:45:45 +01:00
Sarah Hoffmann	b9517c99ae	rename sql directory to lib-sql Also introduces a separate constant for the sql directory, so that it can be put separately from the rest of the data if required.	2021-02-09 15:26:56 +01:00
Sarah Hoffmann	d81e152804	integrate analyse of indexing into nominatim tool	2021-02-08 22:22:49 +01:00
Sarah Hoffmann	0cbf98c020	consolidate warm and db-check into single admin command	2021-02-08 21:05:06 +01:00
Sarah Hoffmann	195f9f5ef3	split cli.py by subcommands Reduces file size below 1000 lines.	2021-02-08 17:23:05 +01:00
Sarah Hoffmann	861e67dfe8	fix off-by-one error in replication download	2021-02-04 17:04:04 +01:00
Sarah Hoffmann	948217d5e9	reintroduce timeout for replication file download This ports the --socket-timeout parameter from pyosmium-get-changes which ensures that the update process eventually times out on hanging network connections.	2021-02-04 11:47:11 +01:00

... 3 4 5 6 7 ...

566 Commits