Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-27 10:43:02 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	d3372e69ec	update to modern mkdocstrings python handler	2023-08-25 21:40:20 +02:00
Sarah Hoffmann	9864b191b1	fix various typos	2022-07-31 17:10:35 +02:00
Sarah Hoffmann	51b6d16dc6	overhaul the token analysis interface The functional split betweenthe two functions is now that the first one creates the ID that is used in the word table and the second one creates the variants. There no longer is a requirement that the ID is the normalized version. We might later reintroduce the requirement that a normalized version be available but it doesn't necessarily need to be through the ID. The function that creates the ID now gets the full PlaceName. That way it might take into account attributes that were set by the sanitizers. Finally rename both functions to something more sane.	2022-07-29 15:14:11 +02:00
Sarah Hoffmann	094100bbf6	harmonize spelling Stick with the American spelling of Analyze.	2022-07-29 10:52:01 +02:00
Sarah Hoffmann	c8873d34af	harmonize interface of token analysis module The configure() function now receives a Transliterator object instead of the ICU rules. This harmonizes the parameters with the create function.	2022-07-29 10:43:07 +02:00
Sarah Hoffmann	f0d640961a	add documentation for custom token analysis	2022-07-29 09:41:28 +02:00
Kian-Meng Ang	f5e52e748f	docs: fix typos	2022-07-20 22:05:31 +08:00
Sarah Hoffmann	83054af46f	remove typing_extensions requirement The typing_extensions package is only necessary now when running mypy. It won't be used at runtime anymore.	2022-07-18 09:55:58 +02:00
Sarah Hoffmann	d35e3c25b6	add type annotations for token analysis No annotations for ICU types yet.	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	612d34930b	handle postcodes properly on word table updates update_postcodes_from_db() needs to do the full postcode treatment in order to derive the correct word table entries.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	ca7b46511d	introduce and use analyzer for postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	bb2bd76f91	pylint: avoid explicit use of format() function Use psycopg2 SQL formatters for SQL and formatted string literals everywhere else.	2022-05-11 09:48:56 +02:00
Sarah Hoffmann	13ed184efd	housenumber analyzer: avoid creating too many variants Housenumber fields with lots of text are likely bad data. So is data with many changes from letter to digit. Exclude them from adding optional spaces.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	f03a05f6bb	add new analyser for houenumbers This analyser makes spaces optional.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	837d44391c	move generation of normalized token form to analyzer This gives the analyzer more flexibility in choosing the normalized form. In particular, an analyzer creating different variants can choose the variant that will be used as the canonical form.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	3df560ea38	fix linting error	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	adbaf700cd	move parsing of mutation config to setup phase	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	0192a7af96	move variant configuration reading in separate file	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	630ad38a67	refactor variant production to use generators	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	c3788d765e	add consistent SPDX copyright headers	2022-01-03 16:23:58 +01:00
Sarah Hoffmann	299934fd2a	reorganize and complete tests around generic token analysis	2021-10-06 17:03:37 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	d35400a7d7	use analyser provided in the 'analyzer' property Implements per-name choice of analyzer. If a non-default analyzer is choosen, then the 'word' identifier is extended with the name of the ana;yzer, so that we still have unique items.	2021-10-05 14:10:32 +02:00
Sarah Hoffmann	92f6ec2328	remove support for properties on variants Those are not going to be used in the near future, so no need to carry that code around just now.	2021-10-05 10:29:36 +02:00
Sarah Hoffmann	9ba2019470	precompute replacements while loading configuration	2021-10-05 10:20:08 +02:00
Sarah Hoffmann	c171d88194	move parsing of token analysis config to analyzer Adds a second callback for the analyzer which is responsible for parsing the configuration rules and converting it to whatever format necessary. This way, each analyzer implementation can define its own configuration rules.	2021-10-04 18:31:58 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00

28 Commits