Nominatim/nominatim/tokenizer
Sarah Hoffmann 8171fe4571 introduce sanitizer step before token analysis
Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.

Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.
2021-10-01 12:27:24 +02:00
..
sanitizers introduce sanitizer step before token analysis 2021-10-01 12:27:24 +02:00
__init__.py introduce tokenizer modules 2021-04-30 11:29:57 +02:00
base.py unify ICUNameProcessorRules and ICURuleLoader 2021-10-01 12:27:24 +02:00
factory.py unify ICUNameProcessorRules and ICURuleLoader 2021-10-01 12:27:24 +02:00
icu_name_processor.py unify ICUNameProcessorRules and ICURuleLoader 2021-10-01 12:27:24 +02:00
icu_rule_loader.py introduce sanitizer step before token analysis 2021-10-01 12:27:24 +02:00
icu_tokenizer.py introduce sanitizer step before token analysis 2021-10-01 12:27:24 +02:00
icu_variants.py unify ICUNameProcessorRules and ICURuleLoader 2021-10-01 12:27:24 +02:00
legacy_tokenizer.py unify ICUNameProcessorRules and ICURuleLoader 2021-10-01 12:27:24 +02:00
place_sanitizer.py introduce sanitizer step before token analysis 2021-10-01 12:27:24 +02:00