Multi-word partial terms had an undue advantage over separate partial
terms because they only need to pay the penalty once. This changes
the behaviour by setting the penalty according to the number of
words in the token. This should get rid of search interpretations
with low chance of matching.
This also fixes handling of exact term matching. We now match against
all exact terms of the query, not just a couple of them collected
while building the interpretations.
Also adds a penalty to very short postcodes.
In cases of countries and remote places without an address
it is possible that 'addressimportance' comes back empty.
Adjust the 'foundorder' to the places importance instead
in such cases.
Fixes#2023.
Add a separate function for each property which saves necessary
information independently. Simplify computation of labels and
simple labels to not explicitly save the labels.
Switch from an recursive algorithm for computing the word sets
to an iterative one that benefits from caching intermediate
results. This considerably reduces the amount of memory needed,
so that the depth restriction can be dropped. To ensure that
the number of word sets remains manageable, only sets up to
a certain length are accepted and only a certain number of
total word sets. If word sets need to be dropped, we drop
the ones with more words per word set first.
To further reduce the number of potential word sets, the valid
tokens are looked up first and then only word sets containing
valid tokens are computed.
Fixes#1403, #1404 and #654.
The same result may be found with different result ranks
in the same search loop when housenumber or postcode are
part of the name or address. In this case we need to keep
the result with the lower result rank.
Fixes#1264.
Introduces new AddressDetails class which is responsible
for address lookups. Saves always the complete result
and then allows filtering throught the different access
function. Remove special handling in Geocode() and use
there the lookup throught PlaceLookup() as well.
The word frequency hash was only used to determine if the
name of a SearchDescription is rare. Do this already when
building the SearchDescription (when the word frequency
is still available) and get gid of the extra hash.
Adding a penalty to a search description because there
is a term at the beginning which looks like a country
turned out to be a bad idea as there are too many
abbreviations around that match against frequently
matched words.
Drop country tokens that do not match the country code list
early. Remove in turn the special country code check for
structured phrases. It is sufficient to do this during
word list building.