Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-22 21:28:10 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	a3f8a097a1	docs: move import style description to customize section	2021-10-18 09:04:06 +02:00
Sarah Hoffmann	751563644f	docs: make customization chapter a separate section	2021-10-18 09:04:01 +02:00
Sarah Hoffmann	e52b801cd0	fix typo	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	445a6428a6	docs: remove the development warning for ICU tokenizer	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	d59b26dad7	docs: add a warning about using --no-updates with TIGER data	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	47417d1871	update and extend man page Provide extended descriptions for most subcommands.	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	381aecb952	rename manual directory to man Avoids confusion between 'docs' and 'manual'.	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	45344575c6	add munin scipts and ICU subrules to installation	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	83381625bd	Merge pull request #2469 from lonvia/fix-tablespace-assignment Fix template expressions for tablespaces	2021-10-15 18:20:43 +02:00
Sarah Hoffmann	552fb16cb2	fix template expressions for tablespaces	2021-10-15 15:11:09 +02:00
Sarah Hoffmann	75c631f080	Merge pull request #2450 from mtmail/tiger-data-2021 US TIGER data 2021 released	2021-10-11 19:22:15 +02:00
Sarah Hoffmann	e2464fdf62	Merge pull request #2465 from lonvia/use-spgist-index Use SP-GIST for building index	2021-10-11 10:48:44 +02:00
Sarah Hoffmann	9ff98073db	remove outdated country_languages.php	2021-10-10 21:58:43 +02:00
Sarah Hoffmann	98ee5def37	add recommendation for Postgis 3+	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	3649487f5e	use SP-GIST index for building index where available Point-in-polygon queries are much faster with a SP-GIST geometry index, so use that for the index used to check if a housenumber is inside a building. Only available with Postgis 3. There is an automatic fallback to GIST for Postgis 2.	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	4b007ae740	Merge pull request #2460 from lonvia/multiple-analyzers Add support for multiple token analyzers	2021-10-09 14:41:09 +02:00
Sarah Hoffmann	6c79a60e19	add documentation for new configuration of ICU tokenizer	2021-10-07 11:55:53 +02:00
Sarah Hoffmann	2a94bfc703	fix argument description for check_database	2021-10-07 09:49:13 +02:00
Sarah Hoffmann	299934fd2a	reorganize and complete tests around generic token analysis	2021-10-06 17:03:37 +02:00
Sarah Hoffmann	b18d042832	add tests for sanitizer tagging language	2021-10-06 12:29:25 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	d35400a7d7	use analyser provided in the 'analyzer' property Implements per-name choice of analyzer. If a non-default analyzer is choosen, then the 'word' identifier is extended with the name of the ana;yzer, so that we still have unique items.	2021-10-05 14:10:32 +02:00
Sarah Hoffmann	92f6ec2328	remove support for properties on variants Those are not going to be used in the near future, so no need to carry that code around just now.	2021-10-05 10:29:36 +02:00
Sarah Hoffmann	9ba2019470	precompute replacements while loading configuration	2021-10-05 10:20:08 +02:00
Sarah Hoffmann	c171d88194	move parsing of token analysis config to analyzer Adds a second callback for the analyzer which is responsible for parsing the configuration rules and converting it to whatever format necessary. This way, each analyzer implementation can define its own configuration rules.	2021-10-04 18:31:58 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	5a36559834	move flatten_config_list into config module For general usage by other modules.	2021-10-04 11:56:54 +02:00
Sarah Hoffmann	19d4e047f6	Merge pull request #2458 from lonvia/add-tokenizer-preprocessing Add a "sanitation" step for name and address tags before token processing	2021-10-01 21:53:34 +02:00
Sarah Hoffmann	6b348d43c6	replace test variable for PG env tests 'tty' was removed in PG14 and causes an error.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	5e5addcdbf	fix typo	2021-09-29 14:16:09 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	d44a428b74	Merge pull request #2455 from lonvia/adjust-address-levels-slovakia Adjust address levels for boundaries in Slovakia	2021-09-28 11:21:08 +02:00
Sarah Hoffmann	40f9d52ad8	Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql ICU tokenizer: switch match method to using partial terms	2021-09-28 09:45:15 +02:00
Sarah Hoffmann	7f3b05c179	adjust address levels for boundaries in Slovakia Levels choosen according to OSM wiki. Mainly moves admin_level 6 to county level and admin_level 8 to city/town level. Higher levels are adjusted accordingly. Fixes #2453.	2021-09-27 23:32:11 +02:00
Sarah Hoffmann	09c9fad6c3	adapt tests to new ICU address token handling	2021-09-27 17:36:23 +02:00
Sarah Hoffmann	bb18479d5b	remove unused parameter	2021-09-27 14:58:43 +02:00
Sarah Hoffmann	779ea8ac62	Merge pull request #2452 from lonvia/update-houses-on-street-name-change Force update of surrounding houses when street or place name changes	2021-09-27 14:55:50 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	c6fdcf9b0d	adapt documentation for SQL tokenizer interface	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	59fe74ddf6	move name matching into tokenizer module Instead of requesting the match tokens from the tokenizer when looking for parent streets/places and address parts, hand in the saved tokens and ask if they match. This gives the tokenizer more freedom to decide how name matching should be done.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
marc tobias	834ae0a93f	US TIGER data 2021 released	2021-09-25 00:05:17 +02:00
Sarah Hoffmann	d562f11298	slightly increase radius to look for postcodes	2021-09-24 23:56:42 +02:00
Sarah Hoffmann	972628c751	Merge pull request #2449 from lonvia/address-ranking-spain Adjust address ranks for Spain	2021-09-24 22:48:21 +02:00

1 2 3 4 5 ...

3386 Commits