Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-29 08:36:24 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	a6b4e8ff67	add tests for housenumber-as-name feature	2022-02-07 11:45:12 +01:00
Sarah Hoffmann	38c3ef3da0	add tests for get_string_list() Renaming test file for sanitizer config because pytest requires unique names for test files.	2022-02-07 11:22:24 +01:00
Sarah Hoffmann	610f2cc254	sanitizer: move helpers into a configuration class	2022-02-07 10:48:00 +01:00
Sarah Hoffmann	a79a3210e6	implement is-a-name option for housenumbers	2022-02-07 09:27:11 +01:00
Sarah Hoffmann	b6fa121f53	remove tests for closest housenumber function	2022-01-27 16:21:45 +01:00
Sarah Hoffmann	64abc90d30	use new tiger step column for queries	2022-01-27 14:08:08 +01:00
Sarah Hoffmann	6b89624f33	adapt frontend to new interpolation table layout	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	4b28b4fed4	adapt BDD tests for new interpolation style	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	c170d323d9	add tests for cleaning housenumbers	2022-01-20 23:47:20 +01:00
Sarah Hoffmann	d09db09849	adapt ICU tets to new housenumber sanitizer Restrict tests to making sure that handing in multiple housenumbers works.	2022-01-20 16:05:49 +01:00
Sarah Hoffmann	3741afa6dc	generalize filter-kind parameter for sanatizers Now behaves the same for tag_analyzer_by_language and clean_housenumbers. Adds tests.	2022-01-20 15:42:42 +01:00
Sarah Hoffmann	560a006892	add pytest config We are using custom marks now which need to be registered to avoid warnings.	2022-01-20 15:38:02 +01:00
Sarah Hoffmann	4774e45218	clean_housenumbers: make kinds and delimiters configurable Also adds unit tests for various options.	2022-01-20 12:07:12 +01:00
Sarah Hoffmann	206ee87188	factor out housenumber splitting into sanitizer	2022-01-19 17:27:50 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	c3788d765e	add consistent SPDX copyright headers	2022-01-03 16:23:58 +01:00
Sarah Hoffmann	ab6f35d83a	Merge pull request #2553 from lonvia/revert-street-matching-to-full-names Revert street matching to full names	2021-12-14 15:52:34 +01:00
Sarah Hoffmann	f9b56a8581	correctly match abbreviated addr:street This only works when addr:street is abbreviated and the street name isn't. It does not work the other way around.	2021-12-08 21:58:43 +01:00
Sarah Hoffmann	04857d32cd	enable PHPUnit 9 for coverage A couple of functions have been renamed.	2021-12-07 12:07:17 +01:00
Sarah Hoffmann	109cdce92c	php unit: replace deprecated regex assert The regEx assertion has been renamed in PHPUnit 9.5 and causes deprecation warnings.	2021-12-07 11:34:21 +01:00
Sarah Hoffmann	b7554d9ed8	php unit: don't enforce a name on the test database Also gets rid of a PHPUnit deprecation warning.	2021-12-07 11:31:45 +01:00
Sarah Hoffmann	6106f1a32e	php test: class must be called like the file	2021-12-07 11:20:38 +01:00
Sarah Hoffmann	7f7d2fd5b3	skip most addr: tags with suffixes Only one addr: tag can be processed currently, so make sure it is the one without suffixes to not get odd data. addr:street is the exception because it uses a different matching mechanism.	2021-12-06 14:55:10 +01:00
Sarah Hoffmann	5e435b41ba	ICU: matching any street name will do again	2021-12-06 14:26:08 +01:00
Sarah Hoffmann	44cfce1ca4	revert to using full names for street name matching Using partial names turned out to not work well because there are often similarly named streets next to each other. It also prevents us from being able to take into account all addr:street:* tags. This change gets all the full term tokens for the addr:street tags from the DB. As they are used for matching only, we can assume that the term must already be there or there will be no match. This avoid creating unused full name tags.	2021-12-06 11:38:38 +01:00
Sarah Hoffmann	5a9fb6eaf7	specify text type in test SQL Older version of postgres fail otherwise.	2021-12-03 13:56:23 +01:00
Sarah Hoffmann	54d35ddfe9	split cli tests by subcommand and extend coverage	2021-12-02 23:45:48 +01:00
Sarah Hoffmann	14a78f55cd	more unit tests for tokenizers	2021-12-02 15:46:36 +01:00
Sarah Hoffmann	7617a9316e	extend API unit tests	2021-12-01 20:48:29 +01:00
Sarah Hoffmann	a52ed366e4	add tests for migration	2021-12-01 20:27:40 +01:00
Sarah Hoffmann	7be164e2a5	more testing for refresh functions	2021-12-01 14:58:54 +01:00
Sarah Hoffmann	a24f25c0d8	more tests for exec utilities	2021-12-01 14:23:51 +01:00
Sarah Hoffmann	993b238a41	add more tests for database import	2021-12-01 11:54:58 +01:00
Sarah Hoffmann	bbbfc8201c	add tests for adding additional data Also adds checks that parameters for osm2pgsql are set as expected.	2021-12-01 11:22:46 +01:00
Sarah Hoffmann	6f03a4d6ce	add tests for flatten_config_file and other than yaml formats	2021-12-01 10:24:11 +01:00
Sarah Hoffmann	c8958a22d2	tests: add fixture for making test project directory	2021-11-30 18:01:46 +01:00
Sarah Hoffmann	37afa2180b	generalize fixtures for cli tests	2021-11-30 14:07:39 +01:00
Sarah Hoffmann	b2df8e478a	python test: move single-use fixtures to subdirectories	2021-11-30 12:03:16 +01:00
Sarah Hoffmann	50fccb52be	remove unused test files	2021-11-30 11:44:10 +01:00
Sarah Hoffmann	b90e719da5	organise python tests in subdirectories The directories follow the same structure as the modules in nominatim/.	2021-11-30 11:22:26 +01:00
Sarah Hoffmann	80e0a3cce4	change default rank for highway objects to 30 The highway key is being used more and more for non-ways these days. This clashes with Nominatim's assumption that essentially everything that has a highway tag can be used as the street part of the address. Change the default rank of highway objects to 30 to avoid this. Only the known values for streets keep the rank 26 and are now listed explicitly.	2021-11-24 22:10:40 +01:00
Sarah Hoffmann	10e979e841	only instantiate indexer once for replication Also makes sure that indexer object exists everywhere were needed. See #2518.	2021-11-19 14:48:58 +01:00
Sarah Hoffmann	345c812e43	better error reporting when API script does not exist Check if the API script exists on the expected location before running php-cli. This way we can add a useful hint about the project directory. Fixes #2513.	2021-11-10 11:58:20 +01:00
Sarah Hoffmann	37eeccbf4c	ICU: use normalization from config in PHP The TERM_NORMALIZATION config option is no longer applicable. That was already documented but not yet implemented.	2021-10-27 11:32:44 +02:00
Sarah Hoffmann	1722fc537f	bdd: add tests for non-latin scripts	2021-10-26 17:29:03 +02:00
Sarah Hoffmann	c0f347fc8c	adapt BDD tests to stricter partial search	2021-10-26 15:52:57 +02:00
Sarah Hoffmann	c4f5c11a4e	be case-insensitve about special phrase operator	2021-10-25 19:51:20 +02:00
Sarah Hoffmann	5a1c3dbea3	fix parsing of operator in special phrases Because of unstripped input, the operators wouldn't match.	2021-10-25 19:46:30 +02:00
Sarah Hoffmann	1098ab732f	allow relative paths for flatnode file	2021-10-22 17:32:51 +02:00
Sarah Hoffmann	507fdd4f40	switch IMPORT_STYLE to use generic file search Allows relative paths wrt project directory.	2021-10-22 16:49:57 +02:00
Sarah Hoffmann	0ae8d7ac08	have ADDRESS_LEVEL_CONFIG use load_sub_configuration This means that relative paths now are looked up in the project directory.	2021-10-22 16:36:52 +02:00
Sarah Hoffmann	c77df2d1eb	replace NOMINATIM_PHRASE_CONFIG with command line option	2021-10-22 14:41:14 +02:00
Sarah Hoffmann	c1fa70639b	add new replication mode catch-up This mode gets updates until the server reports no new diffs anymore. Also adds additional indexing, when the main indexing step left a couple of objects to process. This happens only when the next update is expected to be more than 40min away.	2021-10-20 22:05:15 +02:00
Sarah Hoffmann	824562357b	adapt tests for new word count mechanism	2021-10-19 12:03:48 +02:00
Sarah Hoffmann	552fb16cb2	fix template expressions for tablespaces	2021-10-15 15:11:09 +02:00
Sarah Hoffmann	3649487f5e	use SP-GIST index for building index where available Point-in-polygon queries are much faster with a SP-GIST geometry index, so use that for the index used to check if a housenumber is inside a building. Only available with Postgis 3. There is an automatic fallback to GIST for Postgis 2.	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	299934fd2a	reorganize and complete tests around generic token analysis	2021-10-06 17:03:37 +02:00
Sarah Hoffmann	b18d042832	add tests for sanitizer tagging language	2021-10-06 12:29:25 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	d35400a7d7	use analyser provided in the 'analyzer' property Implements per-name choice of analyzer. If a non-default analyzer is choosen, then the 'word' identifier is extended with the name of the ana;yzer, so that we still have unique items.	2021-10-05 14:10:32 +02:00
Sarah Hoffmann	9ba2019470	precompute replacements while loading configuration	2021-10-05 10:20:08 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	6b348d43c6	replace test variable for PG env tests 'tty' was removed in PG14 and causes an error.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	40f9d52ad8	Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql ICU tokenizer: switch match method to using partial terms	2021-09-28 09:45:15 +02:00
Sarah Hoffmann	09c9fad6c3	adapt tests to new ICU address token handling	2021-09-27 17:36:23 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
Sarah Hoffmann	56124546a6	fix dynamic assignment of address parts A boolean check for dynamic changes of address parts is not sufficient. The order of choice should be: 1. an addr:* part matches the name 2. the address part surrounds the object 3. the address part was declared as isaddress The implementation uses a slightly different ordering to avoid geometry checks unless strictly necessary (isaddress is false and no matching address). See #2446.	2021-09-19 12:34:39 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00
Sarah Hoffmann	28c98584c1	add tests for generic YAML config reader	2021-09-03 22:31:30 +02:00
Sarah Hoffmann	1c42780bb5	introduce generic YAML config loader Adds a function to the Configuration class to load a YAML file. This means that searching for the file is generalised and works the same now for all configuration files. Changes the search logic, so that it is always possible to have a custom version of the configuration file in the project directory. Move ICU tokenizer to use new load function.	2021-09-03 18:20:07 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	78fcabade8	move country name generation to country_info module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	284645f505	move generation of country tables in own module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	5f2b9e317a	add tests for US state hacks IL, AS and LA are replaced with the US state in Geocode because the old tokenizer would simply remove the abbreviations otherwise.	2021-08-17 10:49:07 +02:00
Sarah Hoffmann	1147b83b22	php: make word list a first-class object This separates the logic of creating word sets from the Phrase class. A tokenizer may now derived the word sets any way they like. The SimpleWordList class provides a standard implementation for splitting phrases on spaces.	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	324b1b5575	bdd tests: do not query word table directly The BDD tests cannot make assumptions about the structure of the word table anymore because it depends on the tokenizer. Use more abstract descriptions instead that ask for specific kinds of tokens.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	0c023fb4d2	adapt cli tests to Python port for add-data	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	62d5984b1b	limit the number of variants that can be produced	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	0d80a9b897	tests for composing decomposed suffixes	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	9ff4f66f55	complete tests for icu tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	e7b4fc70e7	make sure old data gets deleted on place type change When changing from some other place type to place=postcode make sure that the old place type entry in the place table is deleted.	2021-06-18 10:58:41 +02:00
Sarah Hoffmann	457982e1d2	update postcode in place if it already exists	2021-06-18 00:28:52 +02:00
Sarah Hoffmann	aa558e6080	Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search Do not return POIs when dropping house number in query	2021-06-17 15:30:05 +02:00
Sarah Hoffmann	fe11d3cbbd	do not return POIs when dropping house number in query We've previously added searching through rank 30 in a house number search to enable searches for house number+name. This had the unintended side effect that rank 30 objects are also returned in s search that dropped the house number from the query. This is wrong because POIs cannot function as a parent to a house number. This fix drops all rank 30 objects from the results for a house number search if they do not match the requested house number.	2021-06-17 14:21:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	e879814e43	Update tests for postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	3aac51c81f	switch BDD tests to always use search API	2021-06-06 15:27:52 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	10143e0ac7	Merge pull request #2342 from lonvia/icu-tokenizer-ci Add BDD tests with icu tokenizer to CI runs	2021-05-22 10:36:35 +02:00
Sarah Hoffmann	00094c43d1	enable Tiger BDD API test for legacy_icu	2021-05-21 22:39:56 +02:00
Sarah Hoffmann	430c316e45	test: fix linting errors	2021-05-19 23:07:39 +02:00
Sarah Hoffmann	01f5a9ff84	test: more use of table_factory	2021-05-19 17:37:03 +02:00
Sarah Hoffmann	af52eed0dd	test: avoid use of tempfile module Use the tmp_path fixture instead which provides automatic cleanup.	2021-05-19 16:43:26 +02:00
Sarah Hoffmann	f93d0fa957	test: use src_dir fixture instead of self-computed paths	2021-05-19 16:03:54 +02:00
Sarah Hoffmann	c06a1d007a	test: replace raw execute() with fixture code where possible	2021-05-19 12:11:04 +02:00
Sarah Hoffmann	65bd749918	test: use table_rows() and execute_values() where possible Some uses of scalar() could also be replaced with convenience functions from the word table mock.	2021-05-19 10:51:10 +02:00
Sarah Hoffmann	510eb53f53	test: move Testingcursor into separate class Also adds more convenience functions: counting with a where statement and a wrapper to execute_values().	2021-05-19 10:30:36 +02:00
Sarah Hoffmann	16bb007135	Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer Do not hide errors when importing tokenizer	2021-05-18 23:00:10 +02:00
Sarah Hoffmann	b2722650d4	do not hide errors when importing tokenizer Explicitly check for the tokenizer source file to check that the name is correct. We can't use the import error for that because it hides other import errors like a missing library. Fixes #2327.	2021-05-18 16:28:21 +02:00
AntoJvlt	3206bf59df	Resolve conflicts	2021-05-17 13:52:35 +02:00
AntoJvlt	8b8dfc46eb	Added --no-replace command for special phrases importation and added corresponding tests	2021-05-17 13:25:06 +02:00
AntoJvlt	06aab389ed	Code cleaning and SPLoader deleted	2021-05-16 16:59:12 +02:00
AntoJvlt	fb0ebb5bf0	Add tests for the new SPWikiLoader and SPCsvLoader	2021-05-16 16:10:06 +02:00
Sarah Hoffmann	925726222f	Merge pull request #2323 from darkshredder/disable-search-reverse-only Feat: Disabled search API for --reverse-only imports	2021-05-14 10:40:22 +02:00
Sarah Hoffmann	7d621389ee	adapt tests to new TIGER CSV format	2021-05-14 00:02:50 +02:00
Darkshredder	e5ffc59cd5	feat: Added reverse-only-search validation	2021-05-14 02:36:21 +05:30
Sarah Hoffmann	5feece64c1	use WorkerPool for Tiger data import Requires adding an option that SQL errors are ignored.	2021-05-13 20:36:50 +02:00
Sarah Hoffmann	f5977dac75	ignore invalid coordinates in external postcodes	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	8f2746fe24	ignore entries without country code	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	1ccd4360b4	correctly handle removing all postcodes for country	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	bf864b2c54	index postcodes after refreshing	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	4abaf71234	add and extend tests for new postcode handling	2021-05-13 14:15:42 +02:00
AntoJvlt	9d83da830f	Introduction of SPCsvLoader to load special phrases from a csv file	2021-05-10 23:26:39 +02:00
AntoJvlt	00959fac57	Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader	2021-05-10 21:49:31 +02:00
Sarah Hoffmann	b2c6eca2c8	add missing transliterations The ICU library only offers transliterations for a limited set of script. Add transliterations for missing scripts from the PostgreSQL module. These means that the same selection of scripts is supported as with the old module.	2021-05-05 21:16:55 +02:00
Sarah Hoffmann	a263e54b94	enable BDD tests for different tokenizers The tokenizer to be used can be choosen with -DTOKENIZER. Adapt all tests, so that they work with legacy_icu tokenizer. Move lookup in word table to a function in the tokenizer. Special phrases are temporarily imported from the wiki until we have an implementation that can import from file. TIGER tests do not work yet.	2021-05-05 10:31:51 +02:00
Sarah Hoffmann	18c99a5c5f	add unit tests for legacy ICU tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	8bdb9aa607	mock tokenizer factory for replication tests	2021-05-01 10:50:39 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	be6262c6ce	move status test to tokenizer The availability of the module is now tested by the tokenizer.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	893490f94e	add more tests for legacy tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	044bb6afa5	move tokenization in query into tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fa2bc60468	introduce name analyzer The name analyzer is the actual work horse of the tokenizer. It is instantiated on a thread-base and provides all functions for analysing names and queries.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	296a66558f	move module installation to legacy tokenizer	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	af968d4903	introduce tokenizer modules This adds the boilerplate for selecting configurable tokenizers. A tokenizer can be chosen at import time and will then install itself such that it is fixed for the given database import even when the software itself is updated. The legacy tokenizer implements Nominatim's traditional algorithms.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	185d369404	remove support for AUX housenumber tables These tables have never been actively maintained and the code is completely untested. With the upcomming changes, it is unlikely that the code remains usable. This removes the aux tables and all code that references them.	2021-04-30 10:08:29 +02:00
Sarah Hoffmann	46e8c6b112	Merge pull request #2291 from AntoJvlt/special-phrases-statistics Special phrases statistics	2021-04-27 11:57:05 +02:00
Sarah Hoffmann	1fd483643b	add tests for different scripts	2021-04-26 23:01:06 +02:00
AntoJvlt	1b68152fb2	reorganization of folder/file for the special phrases importer	2021-04-25 17:57:42 +02:00
Sarah Hoffmann	9685c68e30	replace usages of fromisoformat() with strptime() fromisoformat was only introduced with Python 3.7 while we still support Python 3.5. Fixes #2292.	2021-04-23 22:50:08 +02:00
Sarah Hoffmann	788baafa26	bdd tests: fix place dependen ranking tests The ranks of places may differ for some countries. Force the place nodes in the test on null island which always uses the default ranking.	2021-04-22 17:31:00 +02:00
Sarah Hoffmann	50b6d7298c	factor out async connection handling into separate class Also adds a test for reconnecting regularly while indexing.	2021-04-20 14:08:37 +02:00
Sarah Hoffmann	b88b952f56	simplify token precomputation Rename function to reflect that it is only used for precomputation. The token IDs are not really needed, so don't bother to compute the array of tokens.	2021-04-19 17:24:19 +02:00
Darkshredder	1f898405a6	Fix: tiger-data tarfile test	2021-04-19 16:02:52 +05:30
Sarah Hoffmann	79d55357e8	simplify sql and website creation functions	2021-04-19 10:53:30 +02:00
Sarah Hoffmann	4fa6c0ad53	simplify constructor for SQL preprocessor Use sql path from config.	2021-04-19 10:26:25 +02:00
Sarah Hoffmann	8f63f9516b	simplify interface for adding tiger data Also simplifies tests using existing fixtures.	2021-04-19 10:26:25 +02:00
AntoJvlt	b2ae715699	Only log a warning if a wrong input is detected on the wiki while importing special phrases	2021-04-17 20:19:39 +02:00
AntoJvlt	ec859e41c6	Cleaned tests and add database cleaning tests on test_import_from_wiki	2021-04-17 19:23:33 +02:00
Sarah Hoffmann	2ca11ccc6b	add tests for continuing import	2021-04-17 11:10:36 +02:00
Sarah Hoffmann	0f11e311c4	add test for new postcode import function	2021-04-16 16:11:20 +02:00
Sarah Hoffmann	c64193f839	Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate Implemented auto update of special phrases while importing them	2021-04-15 10:13:25 +02:00
Darkshredder	49ee7505ed	Fix: Removed error if endstatement is wrong and improved tests	2021-04-13 15:44:12 +05:30
AntoJvlt	ae2b2cb9a5	Tests added for the auto update of special phrases during import	2021-04-12 14:35:29 +02:00
Sarah Hoffmann	16a66b5326	move transliteration of housenumbers into indexing Housenumbers are now saved in transliterated form in the housenumber column. This saves the transliteration step during lookup.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	3590e76a1c	tests for finding non-ascii housenumbers	2021-04-04 15:26:47 +02:00
Darkshredder	0f9df32d11	Added Test for TokenSpecialTerm	2021-04-02 04:49:05 +05:30
AntoJvlt	e82de99e5a	Cleaned tests of exceptions and fix phrase_settings.json test file name.	2021-03-29 22:07:29 +02:00
Sarah Hoffmann	09b2510219	Merge pull request #2228 from AntoJvlt/import-special-phrases-porting-python Import special phrases porting python	2021-03-29 09:49:35 +02:00
AntoJvlt	57ce75eb67	Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'.	2021-03-26 02:22:38 +01:00
AntoJvlt	cde9389e75	Errors fixes, Cleaning code, Improvement and addition of tests	2021-03-26 01:53:33 +01:00
AntoJvlt	2c19bd5ea3	Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests.	2021-03-25 21:13:57 +01:00
AntoJvlt	ff34198569	Code cleaning, tests simplification and use of python3-icu package	2021-03-23 23:56:39 +01:00
AntoJvlt	1ce8b530cd	Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql.	2021-03-23 23:34:16 +01:00
AntoJvlt	9d1c23e4f5	Updated specialphrases_testdb.sql	2021-03-20 19:17:03 +01:00
AntoJvlt	17cb59efbd	Ported functions for the import of special phrases from php to python. - the command is now --import-special-phrases - the output is not an sql file anymore, data are directly imported to the database. - the little part on the documentation (section data import) has been modified.	2021-03-20 19:11:50 +01:00
Sarah Hoffmann	118befd7d7	bdd tests: make indexing less verbose Do not print progress info for indexing when there is an error in the BDD tests.	2021-03-20 10:39:29 +01:00
Sarah Hoffmann	0d9fe6e49c	Merge pull request #2219 from lonvia/bdd-test-remove-php BDD tests: run all setup via nominatim Python library	2021-03-17 11:40:34 +01:00
Sarah Hoffmann	ebae3553e0	bdd: run all setup via nominatim Python library Drops all calls to PHP utility functions. nominatim cli functions are used where possible, to stay as close to the final code as possible with the tests. By removing the PHP calls, the test code now only uses osm2pgsql and the database module from the build directory.	2021-03-16 22:20:41 +01:00
Sarah Hoffmann	4d7c5ec089	reverse: do not prefer interpolations over closer housenumbers Always look up the closest housenumber before looking up interpolations. This ensures that closer housenumbers are preferred over interpolations. Fixes #2214.	2021-03-15 10:50:04 +01:00
Darkshredder	077a8c1f95	refactored tests and made changes to code for easy readibility	2021-03-12 18:23:20 +05:30
Darkshredder	7a874d5b97	Ported createCountryNames() to python and added tests	2021-03-12 10:28:41 +05:30
Darkshredder	e5719de657	Added fixture for sql_preprocessor and fixed some issues	2021-03-11 15:39:17 +05:30
Darkshredder	8486a83cf5	Added test for tarfile	2021-03-10 18:14:17 +05:30
Darkshredder	ccfad57fca	Added test and removed runlegacyscript	2021-03-10 17:18:12 +05:30
Sarah Hoffmann	09f4d767e4	port index creation to python Also switches to jinja-based preprocessing, which allows to simplify the SQL files. Use 'if not exists' where possible so that the step can be rerun to fix missing indexes.	2021-03-04 11:11:47 +01:00
Sarah Hoffmann	eacabb0e96	move table creation to jinja-based preprocessing	2021-03-03 22:07:51 +01:00
Sarah Hoffmann	d2bd6aa78d	introduce jinja2 for preprocessing SQL Replaces various hand-crafted replacements of varying format with a single Jinja2 templating mechanism. Allows full access to configuration if necessary.	2021-03-03 17:51:08 +01:00
Sarah Hoffmann	7ae9c3a9f0	add database_version setting to tests	2021-03-01 21:49:33 +01:00
Sarah Hoffmann	3a0a4b9175	save software version in the database The version represents the software version that was used to import the data.	2021-03-01 20:35:15 +01:00
Sarah Hoffmann	db663dd92f	remove unused import	2021-03-01 09:26:08 +01:00
Sarah Hoffmann	90a5d23016	use tmp_path fixture in config tests	2021-03-01 09:24:04 +01:00
Sarah Hoffmann	afabbeb546	older versions of Postgresql need explicit return type	2021-02-27 09:46:42 +01:00
Sarah Hoffmann	dd03aeb966	bdd: use python library where possible Replace calls to PHP scripts with direct calls into the nominatim Python library where possible. This speed up tests quite a bit.	2021-02-26 16:14:29 +01:00
Sarah Hoffmann	15b5906790	move setup function to python There are still back-calls to PHP for some of the sub-steps. These needs some larger refactoring to be moved to Python.	2021-02-26 15:02:39 +01:00
Sarah Hoffmann	3c186f8030	add a function for the intial indexing run Also moves postcodes to fully parallel indexing.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	c7fd0a7af4	port wikipedia importance functions to python	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	32683f73c7	move import-data option to native python This adds a new dependecy to the Python psutil package.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	7222235579	introduce custom object for cmdline arguments Allows to define special functions over the arguments. Also splits CLI tests in two files as they have become too many.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	f6e894a53a	port database setup function to python Hide the former PHP functions in a transition command until they are removed.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	b93ec2522e	use psql for executing sql files This allows to run larger files without needing to keep them in memory.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	af7226393a	add function to set up libpq environment Instead of parsing the DSN for each external libpq program we are going to execute, provide a function that feeds them all necessary parameters through the environment. osm2pgsql is the first user.	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	e520613362	convert connect() into a context manager	2021-02-25 18:42:54 +01:00
Sarah Hoffmann	204fe20b4b	Merge pull request #2185 from lonvia/fix-deadlock-handling-for-psycopg27 Improve deadlock detection for various versions of psycopg2	2021-02-25 18:39:40 +01:00
Sarah Hoffmann	a1f0fc1a10	improve deadlock detection for various versions of psycopg2 Psycopg2 has changed the kind of exception that is emitted on deadlocks between versions 2.7 and 2.8. The code was already trying to catch both kind of errors but because the psycopg2.errors package is unknown in 2.7 and below, the code would throw an exception on anything but a deadlock error. This commit wraps the deadlock handling into a context manager to avoid code duplication and uses module imports to detect if the new error codes are available. Also sets the required psycopg2 version to 2.7 or bigger as versions below are difficult to test.	2021-02-25 18:11:16 +01:00
Sarah Hoffmann	5b7483ada5	return 404 for details when no bject is found in database Fixes #2157.	2021-02-22 16:28:29 +01:00
Sarah Hoffmann	72b01148d2	Merge pull request #2181 from lonvia/port-more-tool-functions-to-python Port more tool functions to python	2021-02-22 16:11:21 +01:00
Sarah Hoffmann	f08078ccca	bdd tests: directly call python code for setup-website	2021-02-19 18:20:55 +01:00
Sarah Hoffmann	389138abfe	port setup-website to python	2021-02-19 17:51:06 +01:00
Sarah Hoffmann	a0ae4945cd	add unit tests for new check_database code	2021-02-18 20:36:11 +01:00
Sarah Hoffmann	b169e4c88c	port check-database function to python This change also adapts the hints to use the nominatim tool. Slightly changed checks, so that they are just as effective on a frozen database.	2021-02-18 17:32:30 +01:00
Sarah Hoffmann	a60c34bded	use a frozen DB for API tests This way we also test that dropping does the right thing.	2021-02-17 22:35:27 +01:00
Sarah Hoffmann	153dbb71b8	remove unused code	2021-02-17 22:25:23 +01:00
Sarah Hoffmann	101a1f895d	port freeze function to python	2021-02-17 21:43:15 +01:00
Sarah Hoffmann	7ebcf602ac	add simple test for result splitting with multiple ranks	2021-02-16 17:59:12 +01:00
Sarah Hoffmann	fbe7be760b	ignore failure to get replication date	2021-02-14 12:17:30 +01:00
Sarah Hoffmann	7cc4c53adb	always return 0 for updates unless there is an error This is more in line with previous behavioru than returning a status code when no updates are available.	2021-02-11 10:33:49 +01:00
Sarah Hoffmann	0e0e9a6809	need test database for analysing cli test	2021-02-10 16:19:51 +01:00
Sarah Hoffmann	c60a0784ea	adapt unit tests to new directory structure	2021-02-09 20:13:00 +01:00
Sarah Hoffmann	3cb6f3e460	use DataDir constant for data only So far the data directory constant has pointed to the source directory to be usable with different subdirectories. Now only the data subdirectory itself is being used with the constant, so point to the directory directly.	2021-02-09 20:04:08 +01:00
Sarah Hoffmann	8ffd7d9243	remove unused BINDIR constant	2021-02-09 19:30:31 +01:00
Sarah Hoffmann	298ed11261	introduce constant for configuration directory This replaces {data_dir}/settings throughout the code, so that the configuration may be placed somewhere else in the directory structure (e.g. in /etc).	2021-02-09 18:45:45 +01:00
Sarah Hoffmann	b9517c99ae	rename sql directory to lib-sql Also introduces a separate constant for the sql directory, so that it can be put separately from the rest of the data if required.	2021-02-09 15:26:56 +01:00
Sarah Hoffmann	db3ced17bb	rename lib to lib-php	2021-02-09 11:52:07 +01:00
Sarah Hoffmann	d81e152804	integrate analyse of indexing into nominatim tool	2021-02-08 22:22:49 +01:00
Sarah Hoffmann	0cbf98c020	consolidate warm and db-check into single admin command	2021-02-08 21:05:06 +01:00
Sarah Hoffmann	195f9f5ef3	split cli.py by subcommands Reduces file size below 1000 lines.	2021-02-08 17:23:05 +01:00
Sarah Hoffmann	0b2abfb115	replace make serve with nominatim serve command With the website directory now tied to the project directory instead of the build directory, it is no longer possible to use make for running the web server.	2021-02-03 16:34:31 +01:00
Sarah Hoffmann	cb06d1f4ca	do not overwrite custom set module paths Given that the module is now copied to the project directory when no module path is set, we need the information that the module path is empty. Therefore hand in the default module path in a separate variable.	2021-02-02 18:31:25 +01:00
Sarah Hoffmann	5f63d4ca1f	print nice summary after updates	2021-02-01 10:34:31 +01:00
Sarah Hoffmann	e629a175ed	introduce custom UsageError This is a exception to be thrown when the error occures because of bad user data. We don't want to print a full stack trace in these cases but just tell the user what went wrong.	2021-01-30 16:20:10 +01:00
Sarah Hoffmann	4cb6dc01f3	port replication update function to python	2021-01-30 15:50:34 +01:00
Sarah Hoffmann	8f0885f6cb	port check-for-update function to python	2021-01-28 14:50:14 +01:00
Sarah Hoffmann	d78f0ba804	port replication initialisation to Python	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	5b46fcad8e	convert functon creation to python The new functions always creates normal and partitioned functions. Also adds specialised connection and cursor classes for adding frequently used helper functions.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	94fa7162be	port address level computation to Python Also adds simple tests for correct table creation.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	e6c2842b66	move update code for postcode and word count to Python Adds also tests for the new function to execute a SQL script.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	e6d9485c4a	cli: import python modules for commands on demand Given that only one command will be executed in the end, it is not necessary to import what amounts to the whole library. This becomes in particular important for update functions that have a dependency on pyosmium. The dependency can remain optional for people not using updates.	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	063a4cb403	cli indexer tests need a fake database The Indexer constructor opens a connection to the given database.	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	42ec67f63c	add more tests for CLI parameter parser	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	8c02786820	add tests for indexer	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	c26f323bf5	add simple tests for CLI parsing	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	bfa6580ad5	use pytest mocking functions for manipulating os.environ	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	52b76d1d01	add tests for Python exec_utils	2021-01-20 21:30:27 +01:00
Sarah Hoffmann	504922ffbe	remove old nominatim.py in favour of 'nominatim index' The PHP scripts need to know the position of the nominatim tool in order to call it. This is handed in as environment variable, so it can be set by the Python script.	2021-01-18 15:43:27 +01:00
Sarah Hoffmann	b79c79fa73	add function to get a DSN for psycopg Converts the PHP DSN syntax into psycopg syntax when necessary.	2021-01-18 15:43:27 +01:00
Sarah Hoffmann	340e7f7210	bdd: complete coverage for API tests Also removes some functions that are no longer used and fixes debug output where the tests found an issue.	2021-01-17 16:12:06 +01:00
Sarah Hoffmann	f9c43137c9	remove unused output formatting functions	2021-01-16 17:39:49 +01:00
Sarah Hoffmann	171ed36e36	bdd: remove duplicated tests	2021-01-16 16:57:28 +01:00
Sarah Hoffmann	c6c907d451	bdd: clean up and extend API tests for details - remove duplicates created by replacing HTML tests with JSON tests - add tests for newer functions for returning geometries and hierarchies	2021-01-16 12:04:13 +01:00
Sarah Hoffmann	19ab038724	collect coverage for /website directory as well	2021-01-15 20:27:14 +01:00
Sarah Hoffmann	eb3b789855	add initial pytest test for Configuration	2021-01-15 14:42:03 +01:00
Sarah Hoffmann	2f73bb3643	bdd: directly call utility scripts in lib This removes the dependency on php-symfony-dotenv for the tests.	2021-01-14 18:19:22 +01:00
Sarah Hoffmann	0495dbe756	bdd: add new API test data Make all data necessary for API tests directly available in the repository.	2021-01-09 17:01:33 +01:00
Sarah Hoffmann	5d656891ba	bdd: convert API tests to smaller test db Changes BDD API tests to restrict themselves to Liechtenstein. One test moved to DB as no appropriate data is available.	2021-01-09 16:59:46 +01:00
Sarah Hoffmann	74122dc965	bdd: improve assert output for API query checks Adds wrapper function for checking address parts and more explanation strings to asserts.	2021-01-09 16:58:37 +01:00
Sarah Hoffmann	ee18a511c6	bdd: import API test DB as part of step setup In the future, the BDD tests will simply set up the required test database themselves. Like with the template database, it is not reimported when it already exists unless that is explicitly forced. Makes most of the API tests currently fail because they still point to old test data.	2021-01-07 11:51:38 +01:00
Sarah Hoffmann	da20881096	Merge pull request #2129 from lonvia/cleanup-bdd-tests Clean up Python support code for BDD tests	2021-01-07 09:10:40 +01:00
Sarah Hoffmann	49142eb6e5	use relative dir for sources for phpunit	2021-01-07 08:55:15 +01:00
Sarah Hoffmann	73cbb6eb9a	bdd: clean up DB ops steps Adds comments and modernizes code.	2021-01-06 16:37:32 +01:00
Sarah Hoffmann	1f29475fa5	bdd: move column comparison in separate file Introduces a new class DBRow that encapsulates the comparison functions. This also is responsible for formatting more informative assert messages. place and placex steps are unified.	2021-01-06 12:28:09 +01:00
Sarah Hoffmann	d586b95ff1	bdd: move nominitim id reader to separate file	2021-01-05 16:00:48 +01:00
Sarah Hoffmann	25557e5f14	bdd: factor out reindexing on updates	2021-01-05 15:17:46 +01:00
Sarah Hoffmann	197870e67a	bdd: move place table inserter into separate file Also simplifies usage by implementing a function that inserts a complete table row.	2021-01-05 12:12:59 +01:00
Sarah Hoffmann	b8e39d2dde	bdd: move scene setup to OSM data steps The step has nothing to do with the database.	2021-01-05 11:42:28 +01:00
Sarah Hoffmann	5dfa76a610	bdd: switch to auto commit mode Put the connection to the test database into auto-commit mode and get rid of the explicit commits. Also use cursors always in context managers and unify the two implementations that copy data from the place table.	2021-01-05 11:42:28 +01:00
Sarah Hoffmann	58c471c627	bdd: remove class for lazy formatting assert in combination with format() does the right thing and calls the __str__() method only when an assertion hits.	2021-01-05 10:39:44 +01:00
Sarah Hoffmann	213bf7d19d	bdd: rename db_ops steps Now all files implementing steps are called steps_*.py.	2021-01-05 10:20:00 +01:00
Sarah Hoffmann	12ae8a4ed3	bdd: move output format computation into response	2021-01-05 10:17:59 +01:00
Sarah Hoffmann	8a93f8ed94	bdd: move Response classes in own file and simplify Removes most of the duplicated parse functions, introduces a common assert_field function with a more expressive error message.	2021-01-05 10:03:47 +01:00
Sarah Hoffmann	2712c5f90e	bdd: rename and clean up osm_data steps Move common OPL creation code into a function and remove unused imports.	2021-01-04 20:17:17 +01:00
Sarah Hoffmann	72587b08fa	bdd: move external process execution in separate func	2021-01-04 19:58:59 +01:00
Sarah Hoffmann	faa85ded50	bdd: move NominatimEnvironment into separate file Also cleans up and modernizes the code and adds documentation.	2021-01-04 17:54:51 +01:00
Sarah Hoffmann	14e5bc7a17	bdd: move grid generation code into geometry factory	2021-01-04 17:04:47 +01:00
Sarah Hoffmann	f727620859	bdd: move geoemtry creation into separate file Also renames the OsmDataFactory in the more appropriate GeometryFactory and modernizes code for python3.	2021-01-04 16:34:40 +01:00
Sarah Hoffmann	843d3a137c	remove stale code for python2	2021-01-04 14:14:34 +01:00
Sarah Hoffmann	4aba70caee	create a temporary project dir for tests The project directory contains the website script as configured through the test configuration. This means that tests are now completely independet of any configuration that may be contained in the build directory. Also removes the hack to inject additional settings via a environment variable.	2021-01-04 11:39:45 +01:00
Sarah Hoffmann	4ca7197826	replace nose assertions with simple asserts	2021-01-03 17:21:24 +01:00
Sarah Hoffmann	33b038ce6f	tests: always create the config file There is also one database test that uses the API functions.	2020-12-19 17:55:46 +01:00
Sarah Hoffmann	f62c65e9d9	adapt php tests to new directory constants	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	d97aed8741	adapt tests to new dotenv environment DB tests now can simply set the environment to change configuration variables. API tests still rely on a configuration file. Also, query.php needs to set up the CONST_* variables to work with the query scripts. That is a tiny bit messy and duplicates code but this part will need to be reworked later.	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	b5480f6e36	reorganise path settings in config CONST_BasePath is split into separate configuration variables for binaries, libraries and data. These variables as well as the installation path are now set in the executable directly and no longer configurable via project settings. This is the first step towards an installable software. The executables should know per installation where to find their necessary data to execute. Project configuration needs to be restricted to settings that really concern the specific Nominatim installation.	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	b59d01fe85	update country names Copies all name:xx country names that are in OSM as of today into the country name fallback table.	2020-12-09 17:52:25 +01:00
Sarah Hoffmann	65d8770b28	update country_names from OSM data Update names in the coutry_names table on the fly from incomming OSM country data. Adding a small sanity check that the country must be an OSM relation and within the area where we expect the country to be.	2020-12-09 11:38:19 +01:00
Sarah Hoffmann	987d60ccda	place nodes can only be linked once against boundaries If a place node is already linked against a boundary, it should not be used for linking again. It is usually a sign of a mapping error, when there are multiple boundary candidates. This change just avoids inconsistent data in the database, it does not guarantee that the linking is against the more correct boundary.	2020-12-02 15:31:02 +01:00
Sarah Hoffmann	63544db8f9	null entries need to be typed	2020-12-01 14:54:42 +01:00
Sarah Hoffmann	7295cad715	compute address parts for rank 30 objects on the fly Rank 30 objects usually use the address parts of their parent. When the parent has address parts that are areas but not marked as isaddress, then the parent might go through multiple administrative areas. In that case recheck if the right area has been choosen for the object in question instead of relying on isaddress. Note that we really only have to do the recomputation in the case of 'isarea = True and isaddress = False' which hopefully keeps the number of additional geometric operations we have to do to a minimum. There is one more special case to be taken into account here: a street may go through two administrative areas and a house along that street is placed in one of the area while the addr:* tags says it belongs to the other. In that case we must not switch the isaddress to the one it is situated. To avoid that recheck the address names against the name of the ara. That is not perfect but should cover most cases. Fixes #328.	2020-12-01 11:58:25 +01:00
Sarah Hoffmann	c5d98effc0	Merge pull request #2074 from lonvia/add-housenumber-to-unknown-places Improve finding addresses that have their own search_name entry because of unknown addr:* parts	2020-11-25 16:57:09 +01:00
Sarah Hoffmann	0f87da017f	improve handling of multi-word partials in SearchDescription Multi-word partial terms had an undue advantage over separate partial terms because they only need to pay the penalty once. This changes the behaviour by setting the penalty according to the number of words in the token. This should get rid of search interpretations with low chance of matching. This also fixes handling of exact term matching. We now match against all exact terms of the query, not just a couple of them collected while building the interpretations. Also adds a penalty to very short postcodes.	2020-11-25 12:07:04 +01:00
Sarah Hoffmann	22800d7d59	Search housenumbers with unknown address parts by housenumber term House numbers need special handling because they may appear after the street term. That means we canot just use them as the main name for searches where the address has its own search term entries. Doing this right now, we are able to find '40, Main St, Town' but not 'Main St 40, Town'. This switches to using the housenumber token as the name term instead. House number tokens can get special handling when building the search query that covers the case where they come after the street. The main disadvantage is that this once more increases the numbers of possible search interpretation of which we have already too many. no penalty for housenumber searches	2020-11-25 11:36:10 +01:00
Sarah Hoffmann	b4b50eef15	search rank 30 must always go with address rank 30	2020-11-24 17:57:28 +01:00
Sarah Hoffmann	49083c2597	Merge pull request #2058 from lonvia/split-address-words Split addr:* tags into words before adding to the search index	2020-11-18 08:58:17 +01:00
Sarah Hoffmann	ffb2c93ba3	POIs with unknown addr:place must add parent name to address The previous behaviour was a left-over from a former version where such POIs parented to the street. Now that they parent to places, it should be included.	2020-11-17 19:44:43 +01:00
Sarah Hoffmann	30a6b6bdac	split addr: tags into words before adding to the search index Address parts are only matched by single partial words. If the addr: names are not split, then multi-word names cannot be found.	2020-11-17 18:03:33 +01:00
Sarah Hoffmann	9ede048769	disallow linking for postcode areas	2020-11-17 10:53:26 +01:00
Sarah Hoffmann	885dc0a8e1	more tests for absense of additional addressline entries	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	7324431b12	get additional addresses for rank 30 objects get_addressdata() now also checks if the place itself has entries in the place_addressline table and merges them into the results. Also restrict checking for address tag places to cases where the name cannot be found in the parent's address search terms. Looking up all address tags is just too slow.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	021f2bef4c	get address terms from address tags for rank 30 For rank 30 objects add extra elements into the place_addressline table.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	6260fef2e8	add test for placex from addr tags	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	c7472662a6	lookup places for address tags for rank < 30 While previously the content of addr:* tags was only added to the list of address search keywords, we now really look up the matching place. This has the advantage that we pull in all potential translations from the place, just like all the other address terms that are looked up by neighbourhood search. If no place can be found for a given name, the content of the addr:* tag is still added to the search keywords as before.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	928c6245c9	Merge pull request #2038 from lonvia/addresses-for-large-areas Improve addresses for large areas	2020-11-03 08:49:01 +01:00
Sarah Hoffmann	33378dcf6e	remove tests for icon attribute The icon attribute requires the CONST_MapIcon_URL to be present which we cannot guarantee for the tests.	2020-11-02 16:46:29 +01:00
Sarah Hoffmann	b2ebf4b4b7	adapt tests to rank changes of natural	2020-11-02 11:42:10 +01:00
Sarah Hoffmann	d86cf6801f	remove tests for HTML output	2020-10-29 11:13:32 +01:00
Sarah Hoffmann	95f83b90d2	minor fixes for geometry compuation during boundary ranking Go back to using centroid when determining if one admin level is within another. There are cases where boundaries are slightly misaligned due to mapping errors (not using the same ways in the relations). Only declare boundaries the same when they have the same wikidata tag _and_ have exactly the same geometry. This works around tagging errors with the wikidata tag, which happen because of automated edits to the wikidata tag.	2020-10-28 10:49:26 +01:00
Sarah Hoffmann	7a16909219	detect and remove admin boundary duplicates The Polish community maps admin boundaries that span multiple levels by duplicating the boundary relations. Detect this situation by looking out for matching wikidata tags. The higher ranked duplicates are then thrown out from the address pool by setting their address rank to 0.	2020-10-28 10:49:26 +01:00
Sarah Hoffmann	b0ef84caae	add tests for rank computation	2020-10-17 17:51:22 +02:00
Sarah Hoffmann	64899ef54b	add tests for address computation	2020-10-16 11:07:17 +02:00
Sarah Hoffmann	ca680fc9fc	make housenumber interpolation tests more lenient	2020-10-11 12:04:53 +02:00
Sarah Hoffmann	a40684162a	Revert "adapt tests to rank_search removal" This reverts commit `2a717da850`.	2020-10-06 13:59:50 +02:00
Sarah Hoffmann	2a717da850	adapt tests to rank_search removal	2020-09-26 09:10:37 +02:00
Sarah Hoffmann	c84e7e72f1	add unknown addr:place to address output When a POI has no addr:street but an addr:place that is not contained in the name list of the parent place, then remember this situation and merge the content of addr:place into the address output. We don't need to care about translations in this case because it is obvious that no object with translations exists if the parent isn't the object named in addr:place.	2020-09-23 11:55:18 +02:00
Sarah Hoffmann	248d6b413a	remove test for is_in	2020-09-22 21:36:49 +02:00
Sarah Hoffmann	a8dfbcef44	always bind addr:place to place instead of street If an addr:place is given but no addr:street tag, then bind the rank 30 object always to a <=25 object, even when there is none found with the same name.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	caea14d035	merge addr tags into search_name table When a place of rank 30 has addr tags that are not covered by the search terms of the parent, add a separate entry for the POI in the search_name table that includes the addr tags. We can only do that with named places. For POIs without a name the housenumber is used as name. If that is not available either, searching still won't work.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	b219374d36	remove special casing for rank 25 postcodes They can be computed like any other place.	2020-09-18 16:18:02 +02:00
Sarah Hoffmann	4c9cfe2532	remove postcodes entirely from indexing place=postcode places are artificial places that collect addr:postcode points for aggration. They should neither show up in the address nor be searchable. That means that there is no need to index them at all. Only let boundary=postal_code through which define correct areas for postcodes.	2020-09-18 15:09:35 +02:00
Sarah Hoffmann	fe250d3ee8	Merge pull request #1961 from lonvia/set-place-type-for-result-in-address Use place type of for result object in address parts	2020-09-17 21:23:40 +02:00
Sarah Hoffmann	6f55c67d16	Merge pull request #1960 from lonvia/fix-postcodes-duplicated-by-normalization Make sure that all postcodes have an entry in the word table	2020-09-17 21:23:23 +02:00
Sarah Hoffmann	fe8566928e	use place type of for result object in address parts Boundaries shound derive the address part type from the linked place if possible. This was already implemented for the address objects but not for the address information from the address itself. Fixes #1949.	2020-09-17 18:17:01 +02:00
Sarah Hoffmann	3600709116	make sure that all postcodes have an entry in word It may happen that two different postcodes normalize to exactly the same token. In that case we still need two different entries in the word table. Token lookup will then make sure that the correct one is choosen. Fixes #1953.	2020-09-17 17:11:22 +02:00
Sarah Hoffmann	2b11a47a2f	restructure developer's manual Add a section on setting up the development environment which now also includes the former chapter on recreating the documentation. Move the README from test/ into the manual as the new Testing chapter.	2020-09-17 09:54:46 +02:00
Sarah Hoffmann	b6078de6f8	adapt tests to ranking changes	2020-09-01 18:03:17 +02:00
Sarah Hoffmann	6e4b7eb966	do not block deletion of large highway areas Deletion of areas should only e blocked for addressable features. Streets and POIs do not have a large impact on updates.	2020-08-28 09:49:21 +02:00
Sarah Hoffmann	be6ecc388c	add support for place=square Squares are now addressable (on address level 25) and thus can be attached to a house number via addr:place. Needed to increase the rank range for matching up addr:place to 25.	2020-08-26 12:12:52 +02:00
Sarah Hoffmann	d730e179bf	tests: use larger grid to avoid rouding errors	2020-08-22 16:04:24 +02:00
Sarah Hoffmann	d6ff7475f1	make sure that addr:* tags can always be searched for Always add contents of addr:* tags into address part of the search table, even when there is no corresponding other name. This keeps search tolerant to the kind of tagging where parts show up in the address that have no corresponding object in the database or where it is only an unaddressable object.	2020-08-19 11:44:10 +02:00
Sarah Hoffmann	e21a707166	remove linked_place from extratags when updating Before updating an admin boundary we need to make sure that any artificially generated 'linked_place' entry is removed from the extratags column. This ensures that the place designation does not linger when a linked place disappears and that it is updated when the linking changes.	2020-08-13 16:59:11 +02:00
Sarah Hoffmann	06aa0f0b76	use address rank for address forming when available	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	fb8bb30144	boundary address ranks must not go above 25 Fixes #1914.	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	7429a33818	add simple tests for address rank computation	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	f29dc7d7ac	Merge pull request #1865 from mtmail/how-to-import-test-db test/README.md - more instructions how to import test db	2020-08-04 14:31:19 +02:00
Sarah Hoffmann	1347abb1e7	be more strict what areas make up an address Exclude boundaries that touch a line in only one point and that touch areas only along the boundary. Fixes #1900.	2020-08-04 12:08:50 +02:00

... 5 6 7 8 9 ...

877 Commits