remove documentation around legacy tokenizer

This commit is contained in:
Sarah Hoffmann 2024-09-21 18:27:01 +02:00
parent b54ff7d766
commit 4825a0bda3
7 changed files with 11 additions and 212 deletions

View File

@ -61,8 +61,7 @@ pylint3 --extension-pkg-whitelist=osmium nominatim
Before submitting a pull request make sure that the tests pass:
```
cd build
make test
make tests
```
## Releases

View File

@ -131,76 +131,13 @@ script ([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Using an external PostgreSQL database
You can install Nominatim using a database that runs on a different server when
you have physical access to the file system on the other server. Nominatim
uses a custom normalization library that needs to be made accessible to the
PostgreSQL server. This section explains how to set up the normalization
library.
!!! note
The external module is only needed when using the legacy tokenizer.
If you have chosen the ICU tokenizer, then you can ignore this section
and follow the standard import documentation.
### Option 1: Compiling the library on the database server
The most sure way to get a working library is to compile it on the database
server. From the prerequisites you need at least cmake, gcc and the
PostgreSQL server package.
Clone or unpack the Nominatim source code, enter the source directory and
create and enter a build directory.
```sh
cd Nominatim
mkdir build
cd build
```
Now configure cmake to only build the PostgreSQL module and build it:
```
cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off ..
make
```
When done, you find the normalization library in `build/module/nominatim.so`.
Copy it to a place where it is readable and executable by the PostgreSQL server
process.
### Option 2: Compiling the library on the import machine
You can also compile the normalization library on the machine from where you
run the import.
!!! important
You can only do this when the database server and the import machine have
the same architecture and run the same version of Linux. Otherwise there is
no guarantee that the compiled library is compatible with the PostgreSQL
server running on the database server.
Make sure that the PostgreSQL server package is installed on the machine
**with the same version as on the database server**. You do not need to install
the PostgreSQL server itself.
Download and compile Nominatim as per standard instructions. Once done, you find
the normalization library in `build/module/nominatim.so`. Copy the file to
the database server at a location where it is readable and executable by the
PostgreSQL server process.
### Running the import
On the client side you now need to configure the import to point to the
correct location of the library **on the database server**. Add the following
line to your your `.env` file:
```
NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
```
Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
to follow the [standard instructions for importing](Import.md).
You can install Nominatim using a database that runs on a different server.
Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the
server and follow the standard import documentation.
The import will be faster, if the import is run directly from the database
machine. You can easily switch to a different machine for the query frontend
after the import.
## Moving the database to another machine
@ -225,20 +162,9 @@ target machine.
data updates but the resulting database is only about a third of the size
of a full database.
Next install Nominatim on the target machine by following the standard installation
instructions. Again, make sure to use the same version as the source machine.
Next install nominatim-api on the target machine by following the standard
installation instructions. Again, make sure to use the same version as the
source machine.
Create a project directory on your destination machine and set up the `.env`
file to match the configuration on the source machine. Finally run
nominatim refresh --website
to make sure that the local installation of Nominatim will be used.
If you are using the legacy tokenizer you might also have to switch to the
PostgreSQL module that was compiled on your target machine. If you get errors
that PostgreSQL cannot find or access `nominatim.so` then rerun
nominatim refresh --functions
on the target machine to update the the location of the module.
file to match the configuration on the source machine. That's all.

View File

@ -178,18 +178,6 @@ make
sudo make install
```
!!! warning
The default installation no longer compiles the PostgreSQL module that
is needed for the legacy tokenizer from older Nominatim versions. If you
are upgrading an older database or want to run the
[legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
some other reason, you need to enable the PostgreSQL module via
cmake: `cmake -DBUILD_MODULE=on ../Nominatim`. To compile the module
you need to have the server development headers for PostgreSQL installed.
On Ubuntu/Debian run: `sudo apt install postgresql-server-dev-<postgresql version>`
The legacy tokenizer is deprecated and will be removed in Nominatim 5.0
Nominatim installs itself into `/usr/local` per default. To choose a different
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
cmake command. Make sure that the `bin` directory is available in your path

View File

@ -64,26 +64,6 @@ Nominatim grants minimal rights to this user to all tables that are needed
for running geocoding queries.
#### NOMINATIM_DATABASE_MODULE_PATH
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Directory where to find the PostgreSQL server module |
| **Format:** | path |
| **Default:** | _empty_ (use `<project_directory>/module`) |
| **After Changes:** | run `nominatim refresh --functions` |
| **Comment:** | Legacy tokenizer only |
Defines the directory in which the PostgreSQL server module `nominatim.so`
is stored. The directory and module must be accessible by the PostgreSQL
server.
For information on how to use this setting when working with external databases,
see [Advanced Installations](../admin/Advanced-Installations.md).
The option is only used by the Legacy tokenizer and ignored otherwise.
#### NOMINATIM_TOKENIZER
| Summary | |
@ -114,20 +94,6 @@ on the file format.
If a relative path is given, then the file is searched first relative to the
project directory and then in the global settings directory.
#### NOMINATIM_MAX_WORD_FREQUENCY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Number of occurrences before a word is considered frequent |
| **Format:** | int |
| **Default:** | 50000 |
| **After Changes:** | cannot be changed after import |
| **Comment:** | Legacy tokenizer only |
The word frequency count is used by the Legacy tokenizer to automatically
identify _stop words_. Any partial term that occurs more often then what
is defined in this setting, is effectively ignored during search.
#### NOMINATIM_LIMIT_REINDEXING
@ -162,25 +128,6 @@ codes, to restrict import to a subset of languages.
Currently only affects the initial import of country names and special phrases.
#### NOMINATIM_TERM_NORMALIZATION
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Rules for normalizing terms for comparisons |
| **Format:** | string: semicolon-separated list of ICU rules |
| **Default:** | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); |
| **Comment:** | Legacy tokenizer only |
[Special phrases](Special-Phrases.md) have stricter matching requirements than
normal search terms. They must appear exactly in the query after this term
normalization has been applied.
Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules
defined in the
[normalization section](Tokenizers.md#normalization-and-transliteration)
will be used.
#### NOMINATIM_USE_US_TIGER_DATA
| Summary | |

View File

@ -15,53 +15,6 @@ they can be configured.
chosen tokenizer is very limited as well. See the comments in each tokenizer
section.
## Legacy tokenizer
!!! danger
The Legacy tokenizer is deprecated and will be removed in Nominatim 5.0.
If you still use a database with the legacy tokenizer, you must reimport
it using the ICU tokenizer below.
The legacy tokenizer implements the analysis algorithms of older Nominatim
versions. It uses a special Postgresql module to normalize names and queries.
This tokenizer is automatically installed and used when upgrading an older
database. It should not be used for new installations anymore.
### Compiling the PostgreSQL module
The tokeinzer needs a special C module for PostgreSQL which is not compiled
by default. If you need the legacy tokenizer, compile Nominatim as follows:
```
mkdir build
cd build
cmake -DBUILD_MODULE=on
make
```
### Enabling the tokenizer
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=legacy
```
The Postgresql module for the tokenizer is available in the `module` directory
and also installed with the remainder of the software under
`lib/nominatim/module/nominatim.so`. You can specify a custom location for
the module with
```
NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
```
This is in particular useful when the database runs on a different server.
See [Advanced installations](../admin/Advanced-Installations.md#using-an-external-postgresql-database) for details.
There are no other configuration options for the legacy tokenizer. All
normalization functions are hard-coded.
## ICU tokenizer
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to

View File

@ -72,8 +72,6 @@ The tests can be configured with a set of environment variables (`behave -D key=
* `DB_PORT` - (optional) port of database on host
* `DB_USER` - (optional) username of database login
* `DB_PASS` - (optional) password for database login
* `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim
module shared library file (only needed for legacy tokenizer)
* `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
during the next run. Reusing the base templates speeds
up tests considerably but might lead to outdated errors

View File

@ -18,12 +18,6 @@ NOMINATIM_DATABASE_WEBUSER="www-data"
# Currently available tokenizers: icu, legacy
NOMINATIM_TOKENIZER="icu"
# Number of occurrences of a word before it is considered frequent.
# Similar to the concept of stop words. Frequent partial words get ignored
# or handled differently during search.
# Changing this value requires a reimport.
NOMINATIM_MAX_WORD_FREQUENCY=50000
# If true, admin level changes on places with many contained children are blocked.
NOMINATIM_LIMIT_REINDEXING=yes
@ -34,12 +28,6 @@ NOMINATIM_LIMIT_REINDEXING=yes
# Currently only affects the initial import of country names and special phrases.
NOMINATIM_LANGUAGES=
# Rules for normalizing terms for comparisons.
# The default is to remove accents and punctuation and to lower-case the
# term. Spaces are kept but collapsed to one standard space.
# Changing this value requires a reimport.
NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();"
# Configuration file for the tokenizer.
# The content depends on the tokenizer used. If left empty the default settings
# for the chosen tokenizer will be used. The configuration can only be set