mirror of
https://github.com/osm-search/Nominatim.git
synced 2024-12-22 20:41:49 +03:00
add developers documentation for query-side of tokenizer
This commit is contained in:
parent
fbb6edfdaf
commit
5b40aa579b
@ -91,14 +91,19 @@ for a custom tokenizer implementation.
|
||||
|
||||
### Directory Structure
|
||||
|
||||
Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py`
|
||||
containing the Python part of the implementation.
|
||||
Nominatim expects two files containing the Python part of the implementation:
|
||||
|
||||
* `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
|
||||
code used during import and
|
||||
* `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
|
||||
query time.
|
||||
|
||||
`<NAME>` is a unique name for the tokenizer consisting of only lower-case
|
||||
letters, digits and underscore. A tokenizer also needs to install some SQL
|
||||
functions. By convention, these should be placed in `lib-sql/tokenizer`.
|
||||
|
||||
If the tokenizer has a default configuration file, this should be saved in
|
||||
the `settings/<NAME>_tokenizer.<SUFFIX>`.
|
||||
`settings/<NAME>_tokenizer.<SUFFIX>`.
|
||||
|
||||
### Configuration and Persistence
|
||||
|
||||
@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
|
||||
time. If they are needed for the runtime then they must be saved into the
|
||||
`nominatim_properties` table and later loaded from there.
|
||||
|
||||
### The Python module
|
||||
### The Python modules
|
||||
|
||||
The Python module is expect to export a single factory function:
|
||||
#### `src/nominatim_db/tokenizer/`
|
||||
|
||||
The import Python module is expected to export a single factory function:
|
||||
|
||||
```python
|
||||
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
|
||||
@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
|
||||
database-specific data. The function must return the instance of the tokenizer
|
||||
class as defined below.
|
||||
|
||||
#### `src/nominatim_api/search/`
|
||||
|
||||
The query-time Python module must also export a factory function:
|
||||
|
||||
``` python
|
||||
def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
|
||||
```
|
||||
|
||||
The `conn` parameter contains the current search connection. See the
|
||||
[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
|
||||
for details on the class. The function must return the instance of the tokenizer
|
||||
class as defined below.
|
||||
|
||||
|
||||
### Python Tokenizer Class
|
||||
|
||||
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
|
||||
@ -138,6 +159,13 @@ and implement the abstract functions defined there.
|
||||
options:
|
||||
heading_level: 6
|
||||
|
||||
|
||||
### Python Query Analyzer Class
|
||||
|
||||
::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
|
||||
options:
|
||||
heading_level: 6
|
||||
|
||||
### PL/pgSQL Functions
|
||||
|
||||
The tokenizer must provide access functions for the `token_info` column
|
||||
|
Loading…
Reference in New Issue
Block a user