mirror of
https://github.com/osm-search/Nominatim.git
synced 2024-12-26 06:22:13 +03:00
fix various typos
This commit is contained in:
parent
e7574f119e
commit
9864b191b1
@ -57,9 +57,9 @@ the function.
|
||||
show_source: no
|
||||
heading_level: 6
|
||||
|
||||
### The sanitation function
|
||||
### The main filter function of the sanitizer
|
||||
|
||||
The sanitation function receives a single object of type `ProcessInfo`
|
||||
The filter function receives a single object of type `ProcessInfo`
|
||||
which has with three members:
|
||||
|
||||
* `place`: read-only information about the place being processed.
|
||||
@ -74,6 +74,22 @@ While the `place` member is provided for information only, the `names` and
|
||||
remove entries, change information within a single entry (for example by
|
||||
adding extra attributes) or completely replace the list with a different one.
|
||||
|
||||
#### PlaceInfo - information about the place
|
||||
|
||||
::: nominatim.data.place_info.PlaceInfo
|
||||
rendering:
|
||||
show_source: no
|
||||
heading_level: 6
|
||||
|
||||
|
||||
#### PlaceName - extended naming information
|
||||
|
||||
::: nominatim.data.place_name.PlaceName
|
||||
rendering:
|
||||
show_source: no
|
||||
heading_level: 6
|
||||
|
||||
|
||||
### Example: Filter for US street prefixes
|
||||
|
||||
The following sanitizer removes the directional prefixes from street names
|
||||
@ -102,49 +118,32 @@ the filter.
|
||||
The filter function first checks if the object is interesting for the
|
||||
sanitizer. Namely it checks if the place is in the US (through `country_code`)
|
||||
and it the place is a street (a `rank_address` of 26 or 27). If the
|
||||
conditions are met, then it goes through all available names and replaces
|
||||
any removes any leading direction prefix using a simple regular expression.
|
||||
conditions are met, then it goes through all available names and
|
||||
removes any leading directional prefix using a simple regular expression.
|
||||
|
||||
Save the source code in a file in your project directory, for example as
|
||||
`us_streets.py`. Then you can use the sanitizer in your `icu_tokenizer.yaml`:
|
||||
|
||||
```
|
||||
``` yaml
|
||||
...
|
||||
sanitizers:
|
||||
- step: us_streets.py
|
||||
...
|
||||
```
|
||||
|
||||
For more sanitizer examples, have a look at the sanitizers provided by Nominatim.
|
||||
They can be found in the directory `nominatim/tokenizer/sanitizers`.
|
||||
|
||||
!!! warning
|
||||
This example is just a simplified show case on how to create a sanitizer.
|
||||
It is not really read for real-world use: while the sanitizer would
|
||||
correcly transform `West 5th Street` into `5th Street`. it would also
|
||||
shorten a simple `North Street` to `Street`.
|
||||
|
||||
#### PlaceInfo - information about the place
|
||||
For more sanitizer examples, have a look at the sanitizers provided by Nominatim.
|
||||
They can be found in the directory
|
||||
[`nominatim/tokenizer/sanitizers`](https://github.com/osm-search/Nominatim/tree/master/nominatim/tokenizer/sanitizers).
|
||||
|
||||
::: nominatim.data.place_info.PlaceInfo
|
||||
rendering:
|
||||
show_source: no
|
||||
heading_level: 6
|
||||
|
||||
|
||||
#### PlaceName - extended naming information
|
||||
|
||||
::: nominatim.data.place_name.PlaceName
|
||||
rendering:
|
||||
show_source: no
|
||||
heading_level: 6
|
||||
|
||||
## Custom token analysis module
|
||||
|
||||
Setup of a token analyser is split into two parts: configuration and
|
||||
analyser factory. A token analysis module must therefore implement two
|
||||
functions:
|
||||
|
||||
::: nominatim.tokenizer.token_analysis.base.AnalysisModule
|
||||
rendering:
|
||||
show_source: no
|
||||
|
@ -20,7 +20,7 @@ class PlaceName:
|
||||
is the part of the key after the first colon.
|
||||
|
||||
In addition to that, a name may have arbitrary additional attributes.
|
||||
How attributes are used, depends on the sanatizers and token analysers.
|
||||
How attributes are used, depends on the sanitizers and token analysers.
|
||||
The exception is is the 'analyzer' attribute. This attribute determines
|
||||
which token analysis module will be used to finalize the treatment of
|
||||
names.
|
||||
|
@ -23,8 +23,8 @@ else:
|
||||
class SanitizerConfig(_BaseUserDict):
|
||||
""" The `SanitizerConfig` class is a read-only dictionary
|
||||
with configuration options for the sanitizer.
|
||||
In addition to the usual dictionary function, the class provides
|
||||
accessors to standard sanatizer options that are used by many of the
|
||||
In addition to the usual dictionary functions, the class provides
|
||||
accessors to standard sanitizer options that are used by many of the
|
||||
sanitizers.
|
||||
"""
|
||||
|
||||
@ -81,15 +81,15 @@ class SanitizerConfig(_BaseUserDict):
|
||||
|
||||
def get_delimiter(self, default: str = ',;') -> Pattern[str]:
|
||||
""" Return the 'delimiters' parameter in the configuration as a
|
||||
compiled regular expression that can be used to split names on these
|
||||
delimiters.
|
||||
compiled regular expression that can be used to split strings on
|
||||
these delimiters.
|
||||
|
||||
Arguments:
|
||||
default: Delimiters to be used, when 'delimiters' parameter
|
||||
default: Delimiters to be used when 'delimiters' parameter
|
||||
is not explicitly configured.
|
||||
|
||||
Returns:
|
||||
A regular expression pattern, which can be used to
|
||||
A regular expression pattern which can be used to
|
||||
split a string. The regular expression makes sure that the
|
||||
resulting names are stripped and that repeated delimiters
|
||||
are ignored. It may still create empty fields on occasion. The
|
||||
|
@ -44,15 +44,18 @@ class Analyzer(Protocol):
|
||||
A list of possible spelling variants. All strings must have
|
||||
been transformed with the global normalizer and
|
||||
transliterator ICU rules. Otherwise they cannot be matched
|
||||
against the query later.
|
||||
against the input by the query frontend.
|
||||
The list may be empty, when there are no useful
|
||||
spelling variants. This may happen, when an analyzer only
|
||||
produces extra variants to the canonical spelling.
|
||||
spelling variants. This may happen when an analyzer only
|
||||
usually outputs additional variants to the canonical spelling
|
||||
and there are no such variants.
|
||||
"""
|
||||
|
||||
|
||||
class AnalysisModule(Protocol):
|
||||
""" Protocol for analysis modules.
|
||||
""" The setup of the token analysis is split into two parts:
|
||||
configuration and analyser factory. A token analysis module must
|
||||
therefore implement the two functions here described.
|
||||
"""
|
||||
|
||||
def configure(self, rules: Mapping[str, Any],
|
||||
@ -64,13 +67,14 @@ class AnalysisModule(Protocol):
|
||||
Arguments:
|
||||
rules: A dictionary with the additional configuration options
|
||||
as specified in the tokenizer configuration.
|
||||
normalizer: an ICU Transliterator with the compiled normalization
|
||||
rules.
|
||||
transliterator: an ICU transliterator with the compiled
|
||||
transliteration rules.
|
||||
normalizer: an ICU Transliterator with the compiled
|
||||
global normalization rules.
|
||||
transliterator: an ICU Transliterator with the compiled
|
||||
global transliteration rules.
|
||||
|
||||
Returns:
|
||||
A data object with the configuration that was set up. May be
|
||||
A data object with configuration data. This will be handed
|
||||
as is into the `create()` function and may be
|
||||
used freely by the analysis module as needed.
|
||||
"""
|
||||
|
||||
@ -82,7 +86,7 @@ class AnalysisModule(Protocol):
|
||||
Arguments:
|
||||
normalizer: an ICU Transliterator with the compiled normalization
|
||||
rules.
|
||||
transliterator: an ICU tranliterator with the compiled
|
||||
transliterator: an ICU Transliterator with the compiled
|
||||
transliteration rules.
|
||||
config: The object that was returned by the call to configure().
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user