So far the SQL logic used the information from the address field
to determine if an address is attached to a street or place.
This changes the logic to use the information provided in the
token_info. This allows sanitizers to enforce a certain parenting
without changing the visible address information.
This lays the groundwork for adding variants for housenumbers.
When analysis is enabled, then the 'word' field in the word table
is used as usual, so that variants can be created. There will be
only one analyser allowed which must have the fixed name
'@housenumber'.
When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.
The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.
Instead of requesting the match tokens from the tokenizer
when looking for parent streets/places and address parts,
hand in the saved tokens and ask if they match. This gives
the tokenizer more freedom to decide how name matching
should be done.
The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.