Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-12-24 13:31:37 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	f74228830d	bdd: run full import on tests This uncovered a couple of outdated/wrong tests which have been fixed, too.	2022-02-24 14:27:51 +01:00
Sarah Hoffmann	0e11ca9b76	add test that interpolations are found by odd/even	2022-02-10 11:23:51 +01:00
Sarah Hoffmann	a79a3210e6	implement is-a-name option for housenumbers	2022-02-07 09:27:11 +01:00
Sarah Hoffmann	6b89624f33	adapt frontend to new interpolation table layout	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	4b28b4fed4	adapt BDD tests for new interpolation style	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	206ee87188	factor out housenumber splitting into sanitizer	2022-01-19 17:27:50 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	f9b56a8581	correctly match abbreviated addr:street This only works when addr:street is abbreviated and the street name isn't. It does not work the other way around.	2021-12-08 21:58:43 +01:00
Sarah Hoffmann	5e435b41ba	ICU: matching any street name will do again	2021-12-06 14:26:08 +01:00
Sarah Hoffmann	80e0a3cce4	change default rank for highway objects to 30 The highway key is being used more and more for non-ways these days. This clashes with Nominatim's assumption that essentially everything that has a highway tag can be used as the street part of the address. Change the default rank of highway objects to 30 to avoid this. Only the known values for streets keep the rank 26 and are now listed explicitly.	2021-11-24 22:10:40 +01:00
Sarah Hoffmann	1722fc537f	bdd: add tests for non-latin scripts	2021-10-26 17:29:03 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	40f9d52ad8	Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql ICU tokenizer: switch match method to using partial terms	2021-09-28 09:45:15 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
Sarah Hoffmann	56124546a6	fix dynamic assignment of address parts A boolean check for dynamic changes of address parts is not sufficient. The order of choice should be: 1. an addr:* part matches the name 2. the address part surrounds the object 3. the address part was declared as isaddress The implementation uses a slightly different ordering to avoid geometry checks unless strictly necessary (isaddress is false and no matching address). See #2446.	2021-09-19 12:34:39 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	5f2b9e317a	add tests for US state hacks IL, AS and LA are replaced with the US state in Geocode because the old tokenizer would simply remove the abbreviations otherwise.	2021-08-17 10:49:07 +02:00
Sarah Hoffmann	324b1b5575	bdd tests: do not query word table directly The BDD tests cannot make assumptions about the structure of the word table anymore because it depends on the tokenizer. Use more abstract descriptions instead that ask for specific kinds of tokens.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	e7b4fc70e7	make sure old data gets deleted on place type change When changing from some other place type to place=postcode make sure that the old place type entry in the place table is deleted.	2021-06-18 10:58:41 +02:00
Sarah Hoffmann	457982e1d2	update postcode in place if it already exists	2021-06-18 00:28:52 +02:00
Sarah Hoffmann	fe11d3cbbd	do not return POIs when dropping house number in query We've previously added searching through rank 30 in a house number search to enable searches for house number+name. This had the unintended side effect that rank 30 objects are also returned in s search that dropped the house number from the query. This is wrong because POIs cannot function as a parent to a house number. This fix drops all rank 30 objects from the results for a house number search if they do not match the requested house number.	2021-06-17 14:21:20 +02:00
Sarah Hoffmann	3aac51c81f	switch BDD tests to always use search API	2021-06-06 15:27:52 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	1fd483643b	add tests for different scripts	2021-04-26 23:01:06 +02:00
Sarah Hoffmann	788baafa26	bdd tests: fix place dependen ranking tests The ranks of places may differ for some countries. Force the place nodes in the test on null island which always uses the default ranking.	2021-04-22 17:31:00 +02:00
Sarah Hoffmann	16a66b5326	move transliteration of housenumbers into indexing Housenumbers are now saved in transliterated form in the housenumber column. This saves the transliteration step during lookup.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	3590e76a1c	tests for finding non-ascii housenumbers	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	5d656891ba	bdd: convert API tests to smaller test db Changes BDD API tests to restrict themselves to Liechtenstein. One test moved to DB as no appropriate data is available.	2021-01-09 16:59:46 +01:00
Sarah Hoffmann	65d8770b28	update country_names from OSM data Update names in the coutry_names table on the fly from incomming OSM country data. Adding a small sanity check that the country must be an OSM relation and within the area where we expect the country to be.	2020-12-09 11:38:19 +01:00
Sarah Hoffmann	987d60ccda	place nodes can only be linked once against boundaries If a place node is already linked against a boundary, it should not be used for linking again. It is usually a sign of a mapping error, when there are multiple boundary candidates. This change just avoids inconsistent data in the database, it does not guarantee that the linking is against the more correct boundary.	2020-12-02 15:31:02 +01:00
Sarah Hoffmann	63544db8f9	null entries need to be typed	2020-12-01 14:54:42 +01:00
Sarah Hoffmann	7295cad715	compute address parts for rank 30 objects on the fly Rank 30 objects usually use the address parts of their parent. When the parent has address parts that are areas but not marked as isaddress, then the parent might go through multiple administrative areas. In that case recheck if the right area has been choosen for the object in question instead of relying on isaddress. Note that we really only have to do the recomputation in the case of 'isarea = True and isaddress = False' which hopefully keeps the number of additional geometric operations we have to do to a minimum. There is one more special case to be taken into account here: a street may go through two administrative areas and a house along that street is placed in one of the area while the addr:* tags says it belongs to the other. In that case we must not switch the isaddress to the one it is situated. To avoid that recheck the address names against the name of the ara. That is not perfect but should cover most cases. Fixes #328.	2020-12-01 11:58:25 +01:00
Sarah Hoffmann	22800d7d59	Search housenumbers with unknown address parts by housenumber term House numbers need special handling because they may appear after the street term. That means we canot just use them as the main name for searches where the address has its own search term entries. Doing this right now, we are able to find '40, Main St, Town' but not 'Main St 40, Town'. This switches to using the housenumber token as the name term instead. House number tokens can get special handling when building the search query that covers the case where they come after the street. The main disadvantage is that this once more increases the numbers of possible search interpretation of which we have already too many. no penalty for housenumber searches	2020-11-25 11:36:10 +01:00
Sarah Hoffmann	b4b50eef15	search rank 30 must always go with address rank 30	2020-11-24 17:57:28 +01:00
Sarah Hoffmann	49083c2597	Merge pull request #2058 from lonvia/split-address-words Split addr:* tags into words before adding to the search index	2020-11-18 08:58:17 +01:00
Sarah Hoffmann	ffb2c93ba3	POIs with unknown addr:place must add parent name to address The previous behaviour was a left-over from a former version where such POIs parented to the street. Now that they parent to places, it should be included.	2020-11-17 19:44:43 +01:00
Sarah Hoffmann	30a6b6bdac	split addr: tags into words before adding to the search index Address parts are only matched by single partial words. If the addr: names are not split, then multi-word names cannot be found.	2020-11-17 18:03:33 +01:00
Sarah Hoffmann	9ede048769	disallow linking for postcode areas	2020-11-17 10:53:26 +01:00
Sarah Hoffmann	885dc0a8e1	more tests for absense of additional addressline entries	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	7324431b12	get additional addresses for rank 30 objects get_addressdata() now also checks if the place itself has entries in the place_addressline table and merges them into the results. Also restrict checking for address tag places to cases where the name cannot be found in the parent's address search terms. Looking up all address tags is just too slow.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	021f2bef4c	get address terms from address tags for rank 30 For rank 30 objects add extra elements into the place_addressline table.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	6260fef2e8	add test for placex from addr tags	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	c7472662a6	lookup places for address tags for rank < 30 While previously the content of addr:* tags was only added to the list of address search keywords, we now really look up the matching place. This has the advantage that we pull in all potential translations from the place, just like all the other address terms that are looked up by neighbourhood search. If no place can be found for a given name, the content of the addr:* tag is still added to the search keywords as before.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	b2ebf4b4b7	adapt tests to rank changes of natural	2020-11-02 11:42:10 +01:00
Sarah Hoffmann	95f83b90d2	minor fixes for geometry compuation during boundary ranking Go back to using centroid when determining if one admin level is within another. There are cases where boundaries are slightly misaligned due to mapping errors (not using the same ways in the relations). Only declare boundaries the same when they have the same wikidata tag _and_ have exactly the same geometry. This works around tagging errors with the wikidata tag, which happen because of automated edits to the wikidata tag.	2020-10-28 10:49:26 +01:00
Sarah Hoffmann	7a16909219	detect and remove admin boundary duplicates The Polish community maps admin boundaries that span multiple levels by duplicating the boundary relations. Detect this situation by looking out for matching wikidata tags. The higher ranked duplicates are then thrown out from the address pool by setting their address rank to 0.	2020-10-28 10:49:26 +01:00
Sarah Hoffmann	b0ef84caae	add tests for rank computation	2020-10-17 17:51:22 +02:00
Sarah Hoffmann	64899ef54b	add tests for address computation	2020-10-16 11:07:17 +02:00
Sarah Hoffmann	c84e7e72f1	add unknown addr:place to address output When a POI has no addr:street but an addr:place that is not contained in the name list of the parent place, then remember this situation and merge the content of addr:place into the address output. We don't need to care about translations in this case because it is obvious that no object with translations exists if the parent isn't the object named in addr:place.	2020-09-23 11:55:18 +02:00
Sarah Hoffmann	248d6b413a	remove test for is_in	2020-09-22 21:36:49 +02:00
Sarah Hoffmann	a8dfbcef44	always bind addr:place to place instead of street If an addr:place is given but no addr:street tag, then bind the rank 30 object always to a <=25 object, even when there is none found with the same name.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	caea14d035	merge addr tags into search_name table When a place of rank 30 has addr tags that are not covered by the search terms of the parent, add a separate entry for the POI in the search_name table that includes the addr tags. We can only do that with named places. For POIs without a name the housenumber is used as name. If that is not available either, searching still won't work.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	4c9cfe2532	remove postcodes entirely from indexing place=postcode places are artificial places that collect addr:postcode points for aggration. They should neither show up in the address nor be searchable. That means that there is no need to index them at all. Only let boundary=postal_code through which define correct areas for postcodes.	2020-09-18 15:09:35 +02:00
Sarah Hoffmann	3600709116	make sure that all postcodes have an entry in word It may happen that two different postcodes normalize to exactly the same token. In that case we still need two different entries in the word table. Token lookup will then make sure that the correct one is choosen. Fixes #1953.	2020-09-17 17:11:22 +02:00
Sarah Hoffmann	b6078de6f8	adapt tests to ranking changes	2020-09-01 18:03:17 +02:00
Sarah Hoffmann	6e4b7eb966	do not block deletion of large highway areas Deletion of areas should only e blocked for addressable features. Streets and POIs do not have a large impact on updates.	2020-08-28 09:49:21 +02:00
Sarah Hoffmann	be6ecc388c	add support for place=square Squares are now addressable (on address level 25) and thus can be attached to a house number via addr:place. Needed to increase the rank range for matching up addr:place to 25.	2020-08-26 12:12:52 +02:00
Sarah Hoffmann	d730e179bf	tests: use larger grid to avoid rouding errors	2020-08-22 16:04:24 +02:00
Sarah Hoffmann	d6ff7475f1	make sure that addr:* tags can always be searched for Always add contents of addr:* tags into address part of the search table, even when there is no corresponding other name. This keeps search tolerant to the kind of tagging where parts show up in the address that have no corresponding object in the database or where it is only an unaddressable object.	2020-08-19 11:44:10 +02:00
Sarah Hoffmann	e21a707166	remove linked_place from extratags when updating Before updating an admin boundary we need to make sure that any artificially generated 'linked_place' entry is removed from the extratags column. This ensures that the place designation does not linger when a linked place disappears and that it is updated when the linking changes.	2020-08-13 16:59:11 +02:00
Sarah Hoffmann	06aa0f0b76	use address rank for address forming when available	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	fb8bb30144	boundary address ranks must not go above 25 Fixes #1914.	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	7429a33818	add simple tests for address rank computation	2020-08-12 22:22:24 +02:00
Sarah Hoffmann	1347abb1e7	be more strict what areas make up an address Exclude boundaries that touch a line in only one point and that touch areas only along the boundary. Fixes #1900.	2020-08-04 12:08:50 +02:00
Sarah Hoffmann	9a204f6284	test: make road really cross the boundary	2020-07-26 15:57:07 +02:00
Sarah Hoffmann	6e4ee160ee	adapt tests to new search ranks	2020-06-17 10:53:11 +02:00
Sarah Hoffmann	8218da27b3	adapt tests to new ranks	2020-05-23 19:40:41 +02:00
Sarah Hoffmann	19948c378a	adapt tests to new borough ranking	2020-03-30 23:04:20 +02:00
Sarah Hoffmann	78526a33b4	Remove linkees from search_name Fixes #722	2020-03-04 11:36:39 +01:00
Sarah Hoffmann	6d431aebb7	linked centroids must always be within geometry When using a linked place as centroid for a boundary, check first that it is really within the area. If it is outside, just keep the computed centroid because a centroid outside the area just causes havok. Fixes #1352.	2020-03-04 09:59:57 +01:00
Sarah Hoffmann	acd8ca2ebd	add testing for rank adaption while linking	2020-02-28 15:22:48 +01:00
Sarah Hoffmann	06fdfad89e	link against place nodes by place type If a boundary relation has no label member preferably link against a place node with the same place type. Also inherit the rank_address from the place node (only has an effect when linking via lable member or place type).	2020-02-28 15:22:48 +01:00
Sarah Hoffmann	00ca493f33	move linked place type into linked_place extratags Using linked_place means that we don't overwrite any place tags on the boundary. This is important when we wanto to use the information for linking.	2020-02-28 15:22:48 +01:00
Sarah Hoffmann	6073d948e6	fix duplicate keys in tests The tests suddenly failed because the unique key constraint is more strict and does no longer include the type.	2020-02-12 11:29:33 +01:00
Sarah Hoffmann	d732dc3bb2	update place address levels Adds province and allotments and downgrades hamlet.	2020-01-08 23:53:03 +01:00
Sarah Hoffmann	2bbe5017d4	use bbox of geometry when searching for attached streets As we are doing a distance search, this improves results for large places like airports. Fixes #1442.	2019-07-28 13:28:27 +02:00
Sarah Hoffmann	4c1793b4e3	recreate interpolations when one of their support nodes changes A simple update is not enough because the interpolation splits might change as well as the housenumbers. Fixes #1360.	2019-07-03 23:15:54 +02:00
Sarah Hoffmann	e164d53fcc	adapt tests to new place address ranks	2019-06-30 23:09:43 +02:00
marc tobias	7d9dbd62c7	Support housenumber=0 in interpolations	2019-04-02 15:13:45 +02:00
Sarah Hoffmann	c822012aad	ignore admin boundary ways for countries and states Countries and states are mapped world-wide as relations by now. Fixes #543 and #1291.	2019-01-26 13:37:10 +01:00
Sarah Hoffmann	cc17aa8d6b	Remove postcodes also from word table when they no longer exist Also adds tests for postcode updates. Fixes #1273.	2019-01-04 23:11:47 +01:00
Sarah Hoffmann	52178caa98	fix tests	2018-11-28 23:40:17 +01:00
Sarah Hoffmann	e10d11c6c7	Make rank assignments configurable The initial search and address rank is saved in a table that is set up from a json configuration file. Ranks may be assigned on a country level according to class and type of the object. Special handling that depends on the geometry or OSM type is still hard-coded in placex insert. The new default config file mimicks the current assignment as close as possible. A couple of exceptions have been removed, most notably the exception for Irish townlands.	2018-11-24 16:21:16 +01:00
Sarah Hoffmann	43c2eb383e	Remove country and state nodes from address computation OSM has by now almost complete coverage of admin boundaries up to state level. Place nodes will do more harm than good in this case.	2018-11-17 23:32:08 +01:00
Sarah Hoffmann	48d4ea5542	Do not have postcode node appear in addresses directly Many of the postcode nodes are actually derived from incomplete addresses and are as such not even centroids. Better let them only take part in the address computation via the postcode table.	2018-08-05 14:25:20 +02:00
Sarah Hoffmann	743ec43460	nearest place search should match any of given tokens not all When multiple isin tokens are given, then these are duplicates and it is enough that any one of them is found in the name_vector. Fixes #1056.	2018-06-14 00:11:19 +02:00
Sarah Hoffmann	5182da9f45	add tests for address tag parsing for search name	2018-04-15 22:52:42 +02:00
Sarah Hoffmann	ae83ceab5e	ignore Unicode format characters for normalization Also adds tests. Fixes #1007.	2018-04-10 22:48:17 +02:00
Sarah Hoffmann	36fa21d7ce	fix more behave table formatting errors	2018-02-26 23:41:57 +01:00
Edward Betts	7e00a6e2ff	Correct spelling mistakes.	2018-02-18 13:11:35 +00:00
Sarah Hoffmann	c3483747eb	reimport boundaries from scratch when type is changed Fixes #895.	2018-02-12 21:19:27 +01:00
Sarah Hoffmann	35c7269bac	when linking waterway ways and relations allow all river-like types Fixes #848	2017-12-16 17:00:55 +01:00
Sarah Hoffmann	218b70fd96	test: remove road-fallback test from db tests This should be tested in the api section.	2017-10-03 14:26:08 +02:00
Sarah Hoffmann	7ca5219297	fixup tests	2017-08-19 19:37:06 +02:00
Sarah Hoffmann	e55ac77c94	add simple tests for postcode import	2017-08-19 19:37:06 +02:00
Sarah Hoffmann	a44377c7b0	fix postcode-related tests	2017-08-19 19:37:06 +02:00

1 2 3 4

175 Commits