add developers documentation for query-side of tokenizer

This commit is contained in:
Sarah Hoffmann 2024-12-13 17:09:42 +01:00
parent fbb6edfdaf
commit 5b40aa579b

View File

@ -91,14 +91,19 @@ for a custom tokenizer implementation.
### Directory Structure
Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py`
containing the Python part of the implementation.
Nominatim expects two files containing the Python part of the implementation:
* `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
code used during import and
* `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
query time.
`<NAME>` is a unique name for the tokenizer consisting of only lower-case
letters, digits and underscore. A tokenizer also needs to install some SQL
functions. By convention, these should be placed in `lib-sql/tokenizer`.
If the tokenizer has a default configuration file, this should be saved in
the `settings/<NAME>_tokenizer.<SUFFIX>`.
`settings/<NAME>_tokenizer.<SUFFIX>`.
### Configuration and Persistence
@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
time. If they are needed for the runtime then they must be saved into the
`nominatim_properties` table and later loaded from there.
### The Python module
### The Python modules
The Python module is expect to export a single factory function:
#### `src/nominatim_db/tokenizer/`
The import Python module is expected to export a single factory function:
```python
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
database-specific data. The function must return the instance of the tokenizer
class as defined below.
#### `src/nominatim_api/search/`
The query-time Python module must also export a factory function:
``` python
def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
```
The `conn` parameter contains the current search connection. See the
[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
for details on the class. The function must return the instance of the tokenizer
class as defined below.
### Python Tokenizer Class
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
@ -138,6 +159,13 @@ and implement the abstract functions defined there.
options:
heading_level: 6
### Python Query Analyzer Class
::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
options:
heading_level: 6
### PL/pgSQL Functions
The tokenizer must provide access functions for the `token_info` column