Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-09-20 15:37:49 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	5a36559834	move flatten_config_list into config module For general usage by other modules.	2021-10-04 11:56:54 +02:00
Sarah Hoffmann	19d4e047f6	Merge pull request #2458 from lonvia/add-tokenizer-preprocessing Add a "sanitation" step for name and address tags before token processing	2021-10-01 21:53:34 +02:00
Sarah Hoffmann	6b348d43c6	replace test variable for PG env tests 'tty' was removed in PG14 and causes an error.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	5e5addcdbf	fix typo	2021-09-29 14:16:09 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	d44a428b74	Merge pull request #2455 from lonvia/adjust-address-levels-slovakia Adjust address levels for boundaries in Slovakia	2021-09-28 11:21:08 +02:00
Sarah Hoffmann	40f9d52ad8	Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql ICU tokenizer: switch match method to using partial terms	2021-09-28 09:45:15 +02:00
Sarah Hoffmann	7f3b05c179	adjust address levels for boundaries in Slovakia Levels choosen according to OSM wiki. Mainly moves admin_level 6 to county level and admin_level 8 to city/town level. Higher levels are adjusted accordingly. Fixes #2453.	2021-09-27 23:32:11 +02:00
Sarah Hoffmann	09c9fad6c3	adapt tests to new ICU address token handling	2021-09-27 17:36:23 +02:00
Sarah Hoffmann	bb18479d5b	remove unused parameter	2021-09-27 14:58:43 +02:00
Sarah Hoffmann	779ea8ac62	Merge pull request #2452 from lonvia/update-houses-on-street-name-change Force update of surrounding houses when street or place name changes	2021-09-27 14:55:50 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	c6fdcf9b0d	adapt documentation for SQL tokenizer interface	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	59fe74ddf6	move name matching into tokenizer module Instead of requesting the match tokens from the tokenizer when looking for parent streets/places and address parts, hand in the saved tokens and ask if they match. This gives the tokenizer more freedom to decide how name matching should be done.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
Sarah Hoffmann	d562f11298	slightly increase radius to look for postcodes	2021-09-24 23:56:42 +02:00
Sarah Hoffmann	972628c751	Merge pull request #2449 from lonvia/address-ranking-spain Adjust address ranks for Spain	2021-09-24 22:48:21 +02:00
Sarah Hoffmann	09b1db63f4	adjust address ranks for Spain Adjusts levels for boundaries according to the list on https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative * no admin_level 5, so drop that from addresses * admin_level 6 has the province * admin_level 7 has the county when it exists Also reranks place=province so that it matches up with admin_level 6 and introduces place=civil_parish which is used as a place node for some admin_level=9 boundaries in Galicia.	2021-09-24 18:39:44 +02:00
Sarah Hoffmann	e9d54f752c	Merge pull request #2447 from lonvia/fix-dynamic-address-assignment Fix dynamic assignment of address parts	2021-09-19 15:57:28 +02:00
Sarah Hoffmann	c335025167	CI: install locale for CentOS	2021-09-19 13:49:11 +02:00
Sarah Hoffmann	2b2109c89a	Remove the installation warning Installation has become a lot easier.	2021-09-19 13:01:32 +02:00
Sarah Hoffmann	56124546a6	fix dynamic assignment of address parts A boolean check for dynamic changes of address parts is not sufficient. The order of choice should be: 1. an addr:* part matches the name 2. the address part surrounds the object 3. the address part was declared as isaddress The implementation uses a slightly different ordering to avoid geometry checks unless strictly necessary (isaddress is false and no matching address). See #2446.	2021-09-19 12:34:39 +02:00
Sarah Hoffmann	336258ecf8	Merge pull request #2440 from lonvia/generic-config-loader Add generic loader for YAML configuration files	2021-09-04 17:41:15 +02:00
Sarah Hoffmann	b894d2c04a	fix indent	2021-09-04 10:30:35 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00
Sarah Hoffmann	28c98584c1	add tests for generic YAML config reader	2021-09-03 22:31:30 +02:00
Sarah Hoffmann	1c42780bb5	introduce generic YAML config loader Adds a function to the Configuration class to load a YAML file. This means that searching for the file is generalised and works the same now for all configuration files. Changes the search logic, so that it is always possible to have a custom version of the configuration file in the project directory. Move ICU tokenizer to use new load function.	2021-09-03 18:20:07 +02:00
Sarah Hoffmann	18554dfed7	Merge pull request #2437 from lonvia/tweak-ranking-searches Some more tweaks for search interpretation	2021-09-03 14:16:23 +02:00
Sarah Hoffmann	2e493fec46	Merge pull request #2436 from lonvia/country-configuration Move configuration of default languages into a configuration file	2021-09-03 08:55:36 +02:00
Sarah Hoffmann	98c2e08add	reduce penalty for special searches by name Additional penalty for special terms with operator None should only go to near searches. To reduce the number of produced searches, restrict the none operator to appear only in conjunction with the name.	2021-09-03 08:50:38 +02:00
Sarah Hoffmann	94d3dee369	further increase penalty on housenumbers without numbers Make the penality dependent on the length of the token: no penalty for one letter house numbers and increasing one for more letters.	2021-09-02 18:11:49 +02:00
Sarah Hoffmann	7e7dd769fd	remove language and partition from name import	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	78fcabade8	move country name generation to country_info module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	284645f505	move generation of country tables in own module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	0b349761a8	add country configuration The new configuration saves the default language(s) originally maintained in the OSM wiki as well as the partition information.	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	d18794931a	Merge pull request #2435 from lonvia/simplified-to-traditional-chinese icu: normalise simplified to traditional chinese	2021-08-31 15:29:26 +02:00
Sarah Hoffmann	b7d4ff3201	icu: normalise simplified to traditional chinese The conversion is unambigious in most cases, so that the information loss is minimal.	2021-08-31 11:18:34 +02:00
Sarah Hoffmann	4c6d674e03	Merge pull request #2434 from lonvia/vagrant-scripts-in-actions Test installation instructions via CI	2021-08-29 10:11:59 +02:00
Sarah Hoffmann	2c97af8021	CI: use packaged source also for test runs	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	832f75a55e	CI: unify jobs for different vagrant scripts	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	4e77969545	add workflow for centos 8	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	6ebbbfee61	CI: use vagrant scripts for import tests Use vanilla docker images of Ubuntu and leave the setup to the vagrant scripts. Then do the usual import tests. Also fixes a couple of issues found with the scripts	2021-08-24 10:10:01 +02:00

1 2 3 4 5 ...

3360 Commits