Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.
Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.
Change the code to first compute the penalties and then create
the new search variants.
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.
Affected distributions are Ubuntu 16.04 and Debian Stretch.
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.
This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
Special terms need to be prefixed by a space because they are
full terms.
For countries avoid duplicate entries of word tokens.
Adds tests for adding country terms.
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.
Fixes#2335.
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.
Fixes#2301.
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
(but not the part after the bracket)
Fixes#244.