Commit Graph

85 Commits

Author SHA1 Message Date
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
3aac51c81f switch BDD tests to always use search API 2021-06-06 15:27:52 +02:00
Sarah Hoffmann
4f4d15c28a reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
  (but not the part after the bracket)

Fixes #244.
2021-05-24 10:41:42 +02:00
Sarah Hoffmann
e1c5673ac3 require tokeinzer for indexer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
1fd483643b add tests for different scripts 2021-04-26 23:01:06 +02:00
Sarah Hoffmann
788baafa26 bdd tests: fix place dependen ranking tests
The ranks of places may differ for some countries. Force the
place nodes in the test on null island which always uses the
default ranking.
2021-04-22 17:31:00 +02:00
Sarah Hoffmann
65d8770b28 update country_names from OSM data
Update names in the coutry_names table on the fly from incomming
OSM country data. Adding a small sanity check that the country
must be an OSM relation and within the area where we expect the
country to be.
2020-12-09 11:38:19 +01:00
Sarah Hoffmann
987d60ccda place nodes can only be linked once against boundaries
If a place node is already linked against a boundary, it should not
be used for linking again. It is usually a sign of a mapping error,
when there are multiple boundary candidates. This change just avoids
inconsistent data in the database, it does not guarantee that the
linking is against the more correct boundary.
2020-12-02 15:31:02 +01:00
Sarah Hoffmann
63544db8f9 null entries need to be typed 2020-12-01 14:54:42 +01:00
Sarah Hoffmann
7295cad715 compute address parts for rank 30 objects on the fly
Rank 30 objects usually use the address parts of their parent.
When the parent has address parts that are areas but not marked
as isaddress, then the parent might go through multiple administrative
areas. In that case recheck if the right area has been choosen
for the object in question instead of relying on isaddress.
Note that we really only have to do the recomputation in the
case of 'isarea = True and isaddress = False' which hopefully
keeps the number of additional geometric operations we have to do
to a minimum.

There is one more special case to be taken into account here: a
street may go through two administrative areas and a house along
that street is placed in one of the area while the addr:* tags
says it belongs to the other. In that case we must not switch
the isaddress to the one it is situated. To avoid that recheck
the address names against the name of the ara. That is not perfect
but should cover most cases.

Fixes #328.
2020-12-01 11:58:25 +01:00
Sarah Hoffmann
22800d7d59 Search housenumbers with unknown address parts by housenumber term
House numbers need special handling because they may appear after
the street term. That means we canot just use them as the main name
for searches where the address has its own search term entries.
Doing this right now, we are able to find '40, Main St, Town' but not
'Main St 40, Town'.

This switches to using the housenumber token as the name term instead.
House number tokens can get special handling when building the search
query that covers the case where they come after the street.

The main disadvantage is that this once more increases the numbers
of possible search interpretation of which we have already too many.

no penalty for housenumber searches
2020-11-25 11:36:10 +01:00
Sarah Hoffmann
b4b50eef15 search rank 30 must always go with address rank 30 2020-11-24 17:57:28 +01:00
Sarah Hoffmann
49083c2597
Merge pull request #2058 from lonvia/split-address-words
Split addr:* tags into words before adding to the search index
2020-11-18 08:58:17 +01:00
Sarah Hoffmann
ffb2c93ba3 POIs with unknown addr:place must add parent name to address
The previous behaviour was a left-over from a former version
where such POIs parented to the street. Now that they parent to
places, it should be included.
2020-11-17 19:44:43 +01:00
Sarah Hoffmann
30a6b6bdac split addr: tags into words before adding to the search index
Address parts are only matched by single partial words. If
the addr: names are not split, then multi-word names cannot
be found.
2020-11-17 18:03:33 +01:00
Sarah Hoffmann
9ede048769 disallow linking for postcode areas 2020-11-17 10:53:26 +01:00
Sarah Hoffmann
885dc0a8e1 more tests for absense of additional addressline entries 2020-11-16 15:28:01 +01:00
Sarah Hoffmann
7324431b12 get additional addresses for rank 30 objects
get_addressdata() now also checks if the place itself has entries
in the place_addressline table and merges them into the results.

Also restrict checking for address tag places to cases where the
name cannot be found in the parent's address search terms. Looking
up all address tags is just too slow.
2020-11-16 15:28:01 +01:00
Sarah Hoffmann
021f2bef4c get address terms from address tags for rank 30
For rank 30 objects add extra elements into the place_addressline
table.
2020-11-16 15:28:01 +01:00
Sarah Hoffmann
6260fef2e8 add test for placex from addr tags 2020-11-16 15:28:01 +01:00
Sarah Hoffmann
c7472662a6 lookup places for address tags for rank < 30
While previously the content of addr:* tags was only added
to the list of address search keywords, we now really look up
the matching place. This has the advantage that we pull in all
potential translations from the place, just like all the other
address terms that are looked up by neighbourhood search.

If no place can be found for a given name, the content of the
addr:* tag is still added to the search keywords as before.
2020-11-16 15:28:01 +01:00
Sarah Hoffmann
b2ebf4b4b7 adapt tests to rank changes of natural 2020-11-02 11:42:10 +01:00
Sarah Hoffmann
95f83b90d2 minor fixes for geometry compuation during boundary ranking
Go back to using centroid when determining if one admin level
is within another. There are cases where boundaries are slightly
misaligned due to mapping errors (not using the same ways in the
relations).

Only declare boundaries the same when they have the same wikidata
tag _and_ have exactly the same geometry. This works around tagging
errors with the wikidata tag, which happen because of automated
edits to the wikidata tag.
2020-10-28 10:49:26 +01:00
Sarah Hoffmann
7a16909219 detect and remove admin boundary duplicates
The Polish community maps admin boundaries that span multiple
levels by duplicating the boundary relations. Detect this situation
by looking out for matching wikidata tags. The higher ranked
duplicates are then thrown out from the address pool by setting
their address rank to 0.
2020-10-28 10:49:26 +01:00
Sarah Hoffmann
b0ef84caae add tests for rank computation 2020-10-17 17:51:22 +02:00
Sarah Hoffmann
64899ef54b add tests for address computation 2020-10-16 11:07:17 +02:00
Sarah Hoffmann
c84e7e72f1 add unknown addr:place to address output
When a POI has no addr:street but an addr:place that is not
contained in the name list of the parent place, then remember
this situation and merge the content of addr:place into the
address output.

We don't need to care about translations in this case because
it is obvious that no object with translations exists if the
parent isn't the object named in addr:place.
2020-09-23 11:55:18 +02:00
Sarah Hoffmann
248d6b413a remove test for is_in 2020-09-22 21:36:49 +02:00
Sarah Hoffmann
a8dfbcef44 always bind addr:place to place instead of street
If an addr:place is given but no addr:street tag, then bind
the rank 30 object always to a <=25 object, even when there
is none found with the same name.
2020-09-21 10:15:14 +02:00
Sarah Hoffmann
caea14d035 merge addr tags into search_name table
When a place of rank 30 has addr tags that are not covered by the
search terms of the parent, add a separate entry for the POI in
the search_name table that includes the addr tags. We can only
do that with named places. For POIs without a name the housenumber
is used as name. If that is not available either, searching still
won't work.
2020-09-21 10:15:14 +02:00
Sarah Hoffmann
4c9cfe2532 remove postcodes entirely from indexing
place=postcode places are artificial places that collect addr:postcode
points for aggration. They should neither show up in the address nor
be searchable. That means that there is no need to index them at all.
Only let boundary=postal_code through which define correct areas for
postcodes.
2020-09-18 15:09:35 +02:00
Sarah Hoffmann
3600709116 make sure that all postcodes have an entry in word
It may happen that two different postcodes normalize to exactly
the same token. In that case we still need two different entries
in the word table. Token lookup will then make sure that the correct
one is choosen.

Fixes #1953.
2020-09-17 17:11:22 +02:00
Sarah Hoffmann
b6078de6f8 adapt tests to ranking changes 2020-09-01 18:03:17 +02:00
Sarah Hoffmann
be6ecc388c add support for place=square
Squares are now addressable (on address level 25) and thus can
be attached to a house number via addr:place. Needed to increase
the rank range for matching up addr:place to 25.
2020-08-26 12:12:52 +02:00
Sarah Hoffmann
d730e179bf tests: use larger grid to avoid rouding errors 2020-08-22 16:04:24 +02:00
Sarah Hoffmann
d6ff7475f1 make sure that addr:* tags can always be searched for
Always add contents of addr:* tags into address part of the search
table, even when there is no corresponding other name. This keeps
search tolerant to the kind of tagging where parts show up in the
address that have no corresponding object in the database or where
it is only an unaddressable object.
2020-08-19 11:44:10 +02:00
Sarah Hoffmann
06aa0f0b76 use address rank for address forming when available 2020-08-12 22:22:24 +02:00
Sarah Hoffmann
fb8bb30144 boundary address ranks must not go above 25
Fixes #1914.
2020-08-12 22:22:24 +02:00
Sarah Hoffmann
7429a33818 add simple tests for address rank computation 2020-08-12 22:22:24 +02:00
Sarah Hoffmann
1347abb1e7 be more strict what areas make up an address
Exclude boundaries that touch a line in only one point and
that touch areas only along the boundary.

Fixes #1900.
2020-08-04 12:08:50 +02:00
Sarah Hoffmann
9a204f6284 test: make road really cross the boundary 2020-07-26 15:57:07 +02:00
Sarah Hoffmann
6e4ee160ee adapt tests to new search ranks 2020-06-17 10:53:11 +02:00
Sarah Hoffmann
8218da27b3 adapt tests to new ranks 2020-05-23 19:40:41 +02:00
Sarah Hoffmann
19948c378a adapt tests to new borough ranking 2020-03-30 23:04:20 +02:00
Sarah Hoffmann
78526a33b4 Remove linkees from search_name
Fixes #722
2020-03-04 11:36:39 +01:00
Sarah Hoffmann
6d431aebb7 linked centroids must always be within geometry
When using a linked place as centroid for a boundary, check
first that it is really within the area. If it is outside,
just keep the computed centroid because a centroid outside the
area just causes havok.

Fixes #1352.
2020-03-04 09:59:57 +01:00
Sarah Hoffmann
acd8ca2ebd add testing for rank adaption while linking 2020-02-28 15:22:48 +01:00
Sarah Hoffmann
06fdfad89e link against place nodes by place type
If a boundary relation has no label member preferably
link against a place node with the same place type.

Also inherit the rank_address from the place node (only
has an effect when linking via lable member or place type).
2020-02-28 15:22:48 +01:00
Sarah Hoffmann
00ca493f33 move linked place type into linked_place extratags
Using linked_place means that we don't overwrite any
place tags on the boundary. This is important when we
wanto to use the information for linking.
2020-02-28 15:22:48 +01:00
Sarah Hoffmann
6073d948e6 fix duplicate keys in tests
The tests suddenly failed because the unique key constraint
is more strict and does no longer include the type.
2020-02-12 11:29:33 +01:00