If a place node is already linked against a boundary, it should not
be used for linking again. It is usually a sign of a mapping error,
when there are multiple boundary candidates. This change just avoids
inconsistent data in the database, it does not guarantee that the
linking is against the more correct boundary.
The address rank is much more interesting than the search rank
these days because it tells something about the kind of object.
Reverse did have neither rank, so add both for consistency.
Rank 30 objects usually use the address parts of their parent.
When the parent has address parts that are areas but not marked
as isaddress, then the parent might go through multiple administrative
areas. In that case recheck if the right area has been choosen
for the object in question instead of relying on isaddress.
Note that we really only have to do the recomputation in the
case of 'isarea = True and isaddress = False' which hopefully
keeps the number of additional geometric operations we have to do
to a minimum.
There is one more special case to be taken into account here: a
street may go through two administrative areas and a house along
that street is placed in one of the area while the addr:* tags
says it belongs to the other. In that case we must not switch
the isaddress to the one it is situated. To avoid that recheck
the address names against the name of the ara. That is not perfect
but should cover most cases.
Fixes#328.
Wait for 2 seconds before logging the first progress, so that we
have numbers that are a bit more reliable statistically speaking.
Also provides an actual implementation for the log_interval
parameter and fixes some small style issues.
It would be nice to always compute addresses for rank 0 objects
over the complete geometry, so that they can be found via all
the admin boundaries that they intersect. However, there are a
couple of extramely large boundaries in OSM (like timezones)
where this results in thousands of possible address candidates
that need to be checked. Fall back to getting the address of the
centroid for them.
The post codes are the last part that does not fit the new
address ranking scheme. In particular, the search rank is still
relevant for choosing if a postcode should be included into
the address terms. Filter out irrelevant postcodes in
getNearFeatures() already, to avoid having to check for
geometry relation.
Multi-word partial terms had an undue advantage over separate partial
terms because they only need to pay the penalty once. This changes
the behaviour by setting the penalty according to the number of
words in the token. This should get rid of search interpretations
with low chance of matching.
This also fixes handling of exact term matching. We now match against
all exact terms of the query, not just a couple of them collected
while building the interpretations.
Also adds a penalty to very short postcodes.
House numbers need special handling because they may appear after
the street term. That means we canot just use them as the main name
for searches where the address has its own search term entries.
Doing this right now, we are able to find '40, Main St, Town' but not
'Main St 40, Town'.
This switches to using the housenumber token as the name term instead.
House number tokens can get special handling when building the search
query that covers the case where they come after the street.
The main disadvantage is that this once more increases the numbers
of possible search interpretation of which we have already too many.
no penalty for housenumber searches
The previous behaviour was a left-over from a former version
where such POIs parented to the street. Now that they parent to
places, it should be included.
get_addressdata() now also checks if the place itself has entries
in the place_addressline table and merges them into the results.
Also restrict checking for address tag places to cases where the
name cannot be found in the parent's address search terms. Looking
up all address tags is just too slow.