Commit Graph

3343 Commits

Author SHA1 Message Date
Sarah Hoffmann
c4f6c06f44 add dependency on datrie 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0d80a9b897 tests for composing decomposed suffixes 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
f70930b1a0 make compund decomposition pure import feature
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
9ff4f66f55 complete tests for icu tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
32ca631b74 fix full term token in special phrases 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e81084f35 complete tests for rule loader 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a0a7b05c9f correctly quote strings when copying in data
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2f6e4edcdb update unit tests for adapted abbreviation code 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1bd9f455fc add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e3c5d4c5b adapt tests for ICU tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
6ba00e6aee icu tokenizer: move transliteration rules in separate file
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
de4fac33dc docs: nominatim-ui should be installed from the release
The development version does not provide the pre-packaged
dist directory anymore.
2021-07-03 21:16:52 +02:00
Sarah Hoffmann
c9984669a7
Merge pull request #2373 from lonvia/tweak-search-cost
Further tweaking of search cost
2021-06-26 16:21:08 +02:00
Sarah Hoffmann
63755c31ff remove penalty for full words in address
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.

Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
161f5f5cee adjust penalty for housenumber-in-name searches
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.

Change the code to first compute the penalties and then create
the new search variants.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
c7073a1fc0 increase minimum Python to 3.6
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.

Affected distributions are Ubuntu 16.04 and Debian Stretch.
2021-06-21 18:37:37 +02:00
Sarah Hoffmann
e7b4fc70e7 make sure old data gets deleted on place type change
When changing from some other place type to place=postcode
make sure that the old place type entry in the place table
is deleted.
2021-06-18 10:58:41 +02:00
Sarah Hoffmann
457982e1d2 update postcode in place if it already exists 2021-06-18 00:28:52 +02:00
Sarah Hoffmann
aa558e6080
Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search
Do not return POIs when dropping house number in query
2021-06-17 15:30:05 +02:00
Sarah Hoffmann
fe11d3cbbd do not return POIs when dropping house number in query
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.

This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
2021-06-17 14:21:20 +02:00
Sarah Hoffmann
1ce223a83b
Merge pull request #2360 from AntoJvlt/postcodes-place-table
Use place instead of placex to compute postcodes
2021-06-16 11:45:07 +02:00
AntoJvlt
3676310efe Improved performance of the postcodes query and some code cleaning 2021-06-12 15:46:08 +02:00
AntoJvlt
ddf866c4c7 Always delete old placex entry for type=postcode when inserting a new one into the place table 2021-06-12 15:35:51 +02:00
AntoJvlt
9e07a197e9 Handle postcode type change in place insert trigger 2021-06-09 09:31:32 +02:00
AntoJvlt
1c175e3a67 Clean and update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
47fb7cd3a8 Use place_exists() into can_compute() for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
e879814e43 Update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
a4733eed90 Use place instead of placex to compute postcodes 2021-06-09 09:31:32 +02:00
Sarah Hoffmann
38fbc4fcbb do not fail CI on codecov errors
The CodeCove upload depends on unreliable external code.
2021-06-08 10:42:14 +02:00
Sarah Hoffmann
c6fe91bfa5
Merge pull request #2359 from lonvia/switch-bdd-tests-to-api-search
Remove deprecated commandline query function
2021-06-06 18:29:51 +02:00
Sarah Hoffmann
7383f05e45 remove deprecated query interface
Searches can now be done via the thin API wrapper.
2021-06-06 15:28:21 +02:00
Sarah Hoffmann
3aac51c81f switch BDD tests to always use search API 2021-06-06 15:27:52 +02:00
Sarah Hoffmann
f0a7850edf
Merge pull request #2358 from AntoJvlt/documentation-update
Update documentation
2021-06-04 23:54:37 +02:00
AntoJvlt
4336ca69c7 Update documentation 2021-06-03 18:39:40 +02:00
Sarah Hoffmann
4bca5e838b
Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries
Fix insertion of special terms and countries into word table
2021-06-02 20:58:14 +02:00
Sarah Hoffmann
bc981d0261 fix insertion of special terms and countries into word table
Special terms need to be prefixed by a space because they are
full terms.

For countries avoid duplicate entries of word tokens.

Adds tests for adding country terms.
2021-06-02 20:22:39 +02:00
Sarah Hoffmann
b1d33e6b49
Merge pull request #2356 from lonvia/freeze-after-import
Call freeze after running and non-updateable import
2021-06-02 16:25:26 +02:00
Sarah Hoffmann
38d442edf6 docs: reload SQL when migrating to 3.6
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.

Fixes #2335.
2021-06-02 16:11:29 +02:00
Sarah Hoffmann
72625dc72a call freeze after running and non-updateable import
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
2021-06-02 11:08:48 +02:00
Sarah Hoffmann
cc2f152d70 commit changes to replication log table
Fixes #2350.
2021-05-26 11:47:08 +02:00
Sarah Hoffmann
f74dc38766 always compute guessed postcode for POIs from centroid
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.

Fixes #2301.
2021-05-26 11:15:13 +02:00
Sarah Hoffmann
7d9665d8d2
Merge pull request #2349 from lonvia/fix-website-refresh
Only initialise tokenizer for refresh functions where needed
2021-05-25 20:43:44 +02:00
Sarah Hoffmann
a0e85cc17c only initialise tokenizer for refresh functions where needed
Fixes #2347.
2021-05-25 19:16:22 +02:00
Sarah Hoffmann
29b02f9e56
Merge pull request #2346 from lonvia/words-vs-tokens
Cleanup use of partial words in legacy tokenizers
2021-05-24 17:41:38 +02:00
Sarah Hoffmann
24c986c842 add tests for new full name computation with ICU 2021-05-24 10:41:42 +02:00
Sarah Hoffmann
4f4d15c28a reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
  (but not the part after the bracket)

Fixes #244.
2021-05-24 10:41:42 +02:00
Sarah Hoffmann
fa3e48c59f use make_keywords for place search terms also
Ensures that place indeed uses the same search names as other
names.
2021-05-23 23:08:11 +02:00
Sarah Hoffmann
02f6afa51b always ignore multi term partials in search
Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.
2021-05-23 22:13:03 +02:00
Sarah Hoffmann
10143e0ac7
Merge pull request #2342 from lonvia/icu-tokenizer-ci
Add BDD tests with icu tokenizer to CI runs
2021-05-22 10:36:35 +02:00