Commit Graph

567 Commits

Author SHA1 Message Date
Sarah Hoffmann
3741afa6dc generalize filter-kind parameter for sanatizers
Now behaves the same for tag_analyzer_by_language and
clean_housenumbers. Adds tests.
2022-01-20 15:42:42 +01:00
Sarah Hoffmann
560a006892 add pytest config
We are using custom marks now which need to be registered to avoid
warnings.
2022-01-20 15:38:02 +01:00
Sarah Hoffmann
4774e45218 clean_housenumbers: make kinds and delimiters configurable
Also adds unit tests for various options.
2022-01-20 12:07:12 +01:00
Sarah Hoffmann
206ee87188 factor out housenumber splitting into sanitizer 2022-01-19 17:27:50 +01:00
Sarah Hoffmann
b453b0ea95 introduce mutation variants to generic token analyser
Mutations are regular-expression-based replacements that are applied
after variants have been computed. They are meant to be used for
variations on character level.

Add spelling variations for German umlauts.
2022-01-18 11:09:21 +01:00
Sarah Hoffmann
c3788d765e add consistent SPDX copyright headers 2022-01-03 16:23:58 +01:00
Sarah Hoffmann
ab6f35d83a
Merge pull request #2553 from lonvia/revert-street-matching-to-full-names
Revert street matching to full names
2021-12-14 15:52:34 +01:00
Sarah Hoffmann
f9b56a8581 correctly match abbreviated addr:street
This only works when addr:street is abbreviated and the street
name isn't. It does not work the other way around.
2021-12-08 21:58:43 +01:00
Sarah Hoffmann
04857d32cd enable PHPUnit 9 for coverage
A couple of functions have been renamed.
2021-12-07 12:07:17 +01:00
Sarah Hoffmann
109cdce92c php unit: replace deprecated regex assert
The regEx assertion has been renamed in PHPUnit 9.5
and causes deprecation warnings.
2021-12-07 11:34:21 +01:00
Sarah Hoffmann
b7554d9ed8 php unit: don't enforce a name on the test database
Also gets rid of a PHPUnit deprecation warning.
2021-12-07 11:31:45 +01:00
Sarah Hoffmann
6106f1a32e php test: class must be called like the file 2021-12-07 11:20:38 +01:00
Sarah Hoffmann
7f7d2fd5b3 skip most addr: tags with suffixes
Only one addr: tag can be processed currently, so make
sure it is the one without suffixes to not get odd data.
addr:street is the exception because it uses a different
matching mechanism.
2021-12-06 14:55:10 +01:00
Sarah Hoffmann
5e435b41ba ICU: matching any street name will do again 2021-12-06 14:26:08 +01:00
Sarah Hoffmann
44cfce1ca4 revert to using full names for street name matching
Using partial names turned out to not work well because there are
often similarly named streets next to each other. It also
prevents us from being able to take into account all addr:street:*
tags.

This change gets all the full term tokens for the addr:street tags
from the DB. As they are used for matching only, we can assume that
the term must already be there or there will be no match. This
avoid creating unused full name tags.
2021-12-06 11:38:38 +01:00
Sarah Hoffmann
5a9fb6eaf7 specify text type in test SQL
Older version of postgres fail otherwise.
2021-12-03 13:56:23 +01:00
Sarah Hoffmann
54d35ddfe9 split cli tests by subcommand and extend coverage 2021-12-02 23:45:48 +01:00
Sarah Hoffmann
14a78f55cd more unit tests for tokenizers 2021-12-02 15:46:36 +01:00
Sarah Hoffmann
7617a9316e extend API unit tests 2021-12-01 20:48:29 +01:00
Sarah Hoffmann
a52ed366e4 add tests for migration 2021-12-01 20:27:40 +01:00
Sarah Hoffmann
7be164e2a5 more testing for refresh functions 2021-12-01 14:58:54 +01:00
Sarah Hoffmann
a24f25c0d8 more tests for exec utilities 2021-12-01 14:23:51 +01:00
Sarah Hoffmann
993b238a41 add more tests for database import 2021-12-01 11:54:58 +01:00
Sarah Hoffmann
bbbfc8201c add tests for adding additional data
Also adds checks that parameters for osm2pgsql are set
as expected.
2021-12-01 11:22:46 +01:00
Sarah Hoffmann
6f03a4d6ce add tests for flatten_config_file and other than yaml formats 2021-12-01 10:24:11 +01:00
Sarah Hoffmann
c8958a22d2 tests: add fixture for making test project directory 2021-11-30 18:01:46 +01:00
Sarah Hoffmann
37afa2180b generalize fixtures for cli tests 2021-11-30 14:07:39 +01:00
Sarah Hoffmann
b2df8e478a python test: move single-use fixtures to subdirectories 2021-11-30 12:03:16 +01:00
Sarah Hoffmann
50fccb52be remove unused test files 2021-11-30 11:44:10 +01:00
Sarah Hoffmann
b90e719da5 organise python tests in subdirectories
The directories follow the same structure as the modules in
nominatim/.
2021-11-30 11:22:26 +01:00
Sarah Hoffmann
80e0a3cce4 change default rank for highway objects to 30
The highway key is being used more and more for non-ways these
days. This clashes with Nominatim's assumption that essentially
everything that has a highway tag can be used as the street part
of the address.

Change the default rank of highway objects to 30 to avoid this.
Only the known values for streets keep the rank 26 and are now
listed explicitly.
2021-11-24 22:10:40 +01:00
Sarah Hoffmann
10e979e841 only instantiate indexer once for replication
Also makes sure that indexer object exists everywhere were needed.

See #2518.
2021-11-19 14:48:58 +01:00
Sarah Hoffmann
345c812e43 better error reporting when API script does not exist
Check if the API script exists on the expected location before
running php-cli. This way we can add a useful hint about the
project directory.

Fixes #2513.
2021-11-10 11:58:20 +01:00
Sarah Hoffmann
37eeccbf4c ICU: use normalization from config in PHP
The TERM_NORMALIZATION config option is no longer applicable.
That was already documented but not yet implemented.
2021-10-27 11:32:44 +02:00
Sarah Hoffmann
1722fc537f bdd: add tests for non-latin scripts 2021-10-26 17:29:03 +02:00
Sarah Hoffmann
c0f347fc8c adapt BDD tests to stricter partial search 2021-10-26 15:52:57 +02:00
Sarah Hoffmann
c4f5c11a4e be case-insensitve about special phrase operator 2021-10-25 19:51:20 +02:00
Sarah Hoffmann
5a1c3dbea3 fix parsing of operator in special phrases
Because of unstripped input, the operators wouldn't match.
2021-10-25 19:46:30 +02:00
Sarah Hoffmann
1098ab732f allow relative paths for flatnode file 2021-10-22 17:32:51 +02:00
Sarah Hoffmann
507fdd4f40 switch IMPORT_STYLE to use generic file search
Allows relative paths wrt project directory.
2021-10-22 16:49:57 +02:00
Sarah Hoffmann
0ae8d7ac08 have ADDRESS_LEVEL_CONFIG use load_sub_configuration
This means that relative paths now are looked up in the
project directory.
2021-10-22 16:36:52 +02:00
Sarah Hoffmann
c77df2d1eb replace NOMINATIM_PHRASE_CONFIG with command line option 2021-10-22 14:41:14 +02:00
Sarah Hoffmann
c1fa70639b add new replication mode catch-up
This mode gets updates until the server reports no new diffs
anymore.

Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
2021-10-20 22:05:15 +02:00
Sarah Hoffmann
824562357b adapt tests for new word count mechanism 2021-10-19 12:03:48 +02:00
Sarah Hoffmann
552fb16cb2 fix template expressions for tablespaces 2021-10-15 15:11:09 +02:00
Sarah Hoffmann
3649487f5e use SP-GIST index for building index where available
Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.

Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.
2021-10-10 21:55:38 +02:00
Sarah Hoffmann
299934fd2a reorganize and complete tests around generic token analysis 2021-10-06 17:03:37 +02:00
Sarah Hoffmann
b18d042832 add tests for sanitizer tagging language 2021-10-06 12:29:25 +02:00
Sarah Hoffmann
97a10ec218 apply variants by languages
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
2021-10-06 11:09:54 +02:00
Sarah Hoffmann
d35400a7d7 use analyser provided in the 'analyzer' property
Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.
2021-10-05 14:10:32 +02:00