Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-30 02:07:52 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
Sarah Hoffmann	56124546a6	fix dynamic assignment of address parts A boolean check for dynamic changes of address parts is not sufficient. The order of choice should be: 1. an addr:* part matches the name 2. the address part surrounds the object 3. the address part was declared as isaddress The implementation uses a slightly different ordering to avoid geometry checks unless strictly necessary (isaddress is false and no matching address). See #2446.	2021-09-19 12:34:39 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00
Sarah Hoffmann	28c98584c1	add tests for generic YAML config reader	2021-09-03 22:31:30 +02:00
Sarah Hoffmann	1c42780bb5	introduce generic YAML config loader Adds a function to the Configuration class to load a YAML file. This means that searching for the file is generalised and works the same now for all configuration files. Changes the search logic, so that it is always possible to have a custom version of the configuration file in the project directory. Move ICU tokenizer to use new load function.	2021-09-03 18:20:07 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	78fcabade8	move country name generation to country_info module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	284645f505	move generation of country tables in own module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	5f2b9e317a	add tests for US state hacks IL, AS and LA are replaced with the US state in Geocode because the old tokenizer would simply remove the abbreviations otherwise.	2021-08-17 10:49:07 +02:00
Sarah Hoffmann	1147b83b22	php: make word list a first-class object This separates the logic of creating word sets from the Phrase class. A tokenizer may now derived the word sets any way they like. The SimpleWordList class provides a standard implementation for splitting phrases on spaces.	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	324b1b5575	bdd tests: do not query word table directly The BDD tests cannot make assumptions about the structure of the word table anymore because it depends on the tokenizer. Use more abstract descriptions instead that ask for specific kinds of tokens.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	0c023fb4d2	adapt cli tests to Python port for add-data	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	62d5984b1b	limit the number of variants that can be produced	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	0d80a9b897	tests for composing decomposed suffixes	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	9ff4f66f55	complete tests for icu tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	e7b4fc70e7	make sure old data gets deleted on place type change When changing from some other place type to place=postcode make sure that the old place type entry in the place table is deleted.	2021-06-18 10:58:41 +02:00
Sarah Hoffmann	457982e1d2	update postcode in place if it already exists	2021-06-18 00:28:52 +02:00
Sarah Hoffmann	aa558e6080	Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search Do not return POIs when dropping house number in query	2021-06-17 15:30:05 +02:00
Sarah Hoffmann	fe11d3cbbd	do not return POIs when dropping house number in query We've previously added searching through rank 30 in a house number search to enable searches for house number+name. This had the unintended side effect that rank 30 objects are also returned in s search that dropped the house number from the query. This is wrong because POIs cannot function as a parent to a house number. This fix drops all rank 30 objects from the results for a house number search if they do not match the requested house number.	2021-06-17 14:21:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	e879814e43	Update tests for postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	3aac51c81f	switch BDD tests to always use search API	2021-06-06 15:27:52 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	10143e0ac7	Merge pull request #2342 from lonvia/icu-tokenizer-ci Add BDD tests with icu tokenizer to CI runs	2021-05-22 10:36:35 +02:00
Sarah Hoffmann	00094c43d1	enable Tiger BDD API test for legacy_icu	2021-05-21 22:39:56 +02:00
Sarah Hoffmann	430c316e45	test: fix linting errors	2021-05-19 23:07:39 +02:00
Sarah Hoffmann	01f5a9ff84	test: more use of table_factory	2021-05-19 17:37:03 +02:00
Sarah Hoffmann	af52eed0dd	test: avoid use of tempfile module Use the tmp_path fixture instead which provides automatic cleanup.	2021-05-19 16:43:26 +02:00
Sarah Hoffmann	f93d0fa957	test: use src_dir fixture instead of self-computed paths	2021-05-19 16:03:54 +02:00

1 2 3 4 5 ...

505 Commits