Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-25 19:35:02 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	fea4dbba50	inherit tags from interpolation not parent Nodes on an interpolation now only get the address tags of interpolations and then compute their own parent from that. They no longer inherit the parent directly.	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	9f64c34f1a	optimize indexes for interpolation lines Do not index 'inactive' rows (with startnumber is null) where possible.	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	638ed15ada	improve handling von updates on nodes in interpolations Use the same update mechanism as for updates on the interpolations themselves. Updates must solely happen in place_insert as this is the place where actual changes of the data happen.	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	c0d8b95f67	update interpolations instead of deleting and recreating	2022-01-27 11:14:55 +01:00
Sarah Hoffmann	b44493e7f2	reorganise place_insert trigger Code cleanup and formatting as well as minor improvements, in particular removal of unnecessary code.	2022-01-24 09:12:50 +01:00
Sarah Hoffmann	c3788d765e	add consistent SPDX copyright headers	2022-01-03 16:23:58 +01:00
Sarah Hoffmann	5e435b41ba	ICU: matching any street name will do again	2021-12-06 14:26:08 +01:00
Sarah Hoffmann	b1d490ea53	add index for Tiger housenumber queries	2021-11-24 11:10:20 +01:00
Sarah Hoffmann	85797acf1e	ICU: add an index over word_ids Needed for keyword lookup in the details response.	2021-10-25 21:33:27 +02:00
Sarah Hoffmann	e8e2502e2f	make word recount a tokenizer-specific function	2021-10-19 11:21:16 +02:00
Sarah Hoffmann	3649487f5e	use SP-GIST index for building index where available Point-in-polygon queries are much faster with a SP-GIST geometry index, so use that for the index used to check if a housenumber is inside a building. Only available with Postgis 3. There is an automatic fallback to GIST for Postgis 2.	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	40f9d52ad8	Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql ICU tokenizer: switch match method to using partial terms	2021-09-28 09:45:15 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	59fe74ddf6	move name matching into tokenizer module Instead of requesting the match tokens from the tokenizer when looking for parent streets/places and address parts, hand in the saved tokens and ask if they match. This gives the tokenizer more freedom to decide how name matching should be done.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	6d7c067461	force update on rank30 children when place name changes Name changes may have an effect on parenting. Don't update surrounding rank30 objects with addr:place tags as this is potentially too expensive.	2021-09-27 11:04:17 +02:00
Sarah Hoffmann	316205e455	force update of surrounding houses when street name changes When the street changes its name then this may cause changes in the parenting of rank-30 objects with an addr:street tag. Fixes #2242.	2021-09-27 10:22:41 +02:00
Sarah Hoffmann	56124546a6	fix dynamic assignment of address parts A boolean check for dynamic changes of address parts is not sufficient. The order of choice should be: 1. an addr:* part matches the name 2. the address part surrounds the object 3. the address part was declared as isaddress The implementation uses a slightly different ordering to avoid geometry checks unless strictly necessary (isaddress is false and no matching address). See #2446.	2021-09-19 12:34:39 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	70f154be8b	switch word tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5394b1fa1b	switch postcode tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5ab0a63fd6	switch housenumber tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1618aba5f2	switch country name tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	8377528952	new word table layout for icu tokenizer The table now directly reflects the different token types. Extra information is saved in a json structure that may be dynamically extended in the future without affecting the table layout.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	2c8242c8df	remove special code for pre9.5 postgresql 9.5 is now the minimum requirement.	2021-07-19 10:24:57 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	e7b4fc70e7	make sure old data gets deleted on place type change When changing from some other place type to place=postcode make sure that the old place type entry in the place table is deleted.	2021-06-18 10:58:41 +02:00
Sarah Hoffmann	457982e1d2	update postcode in place if it already exists	2021-06-18 00:28:52 +02:00
AntoJvlt	ddf866c4c7	Always delete old placex entry for type=postcode when inserting a new one into the place table	2021-06-12 15:35:51 +02:00
AntoJvlt	9e07a197e9	Handle postcode type change in place insert trigger	2021-06-09 09:31:32 +02:00
AntoJvlt	a4733eed90	Use place instead of placex to compute postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	f74dc38766	always compute guessed postcode for POIs from centroid When guessing postcodes from the area, only postcodes within that area are accepted. For POIs that is usually not what we want as the postcode would have to be within a house for example. Fixes #2301.	2021-05-26 11:15:13 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	35efe3b41c	use tokenizer during Tiger data import This also changes the required import format to CSV.	2021-05-14 00:02:50 +02:00
Sarah Hoffmann	a4aba23a83	move filling of postcode table to python The Python code now takes care of reading postcodes from placex, enhancing them with potentially existing external postcodes and updating location_postcodes accordingly. The initial setup and updates use exactly the same function. External postcode handling has been generalized. External postcodes for any country are now accepted. The format of the external postcode file has changed. We now expect CSV, potentially gzipped. The postcodes are no longer saved in the database.	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	f44af49df9	add Python part for new ICU-based tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	0ba93e5ba9	reorganise address iteration in tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	0da481f207	remove debug code	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d75a235c1f	use address tokens in SQL	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d8ed1bfc60	move houseunumber handling to tokenizer Normalization and token computation are now done in the tokenizer. The tokenizer keeps a cache to the hundred most used house numbers to keep the numbers of calls to the database low.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d711f5a81e	move name token creation into tokenizer Name tokens are now handed in via token_info and used from there. Also moves the generic search name insertion function back to placex_triggers.sql.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	1b1ed820c3	introduce index for finding surrounding buildings	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	a73711f3cd	add extra column for tokenizer Add a jsonb column to the placex and location_property_osmline tables which can be used by the installed tokenizer as required. No other part of the software will use or otherwise rely on this column.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	185d369404	remove support for AUX housenumber tables These tables have never been actively maintained and the code is completely untested. With the upcomming changes, it is unlikely that the code remains usable. This removes the aux tables and all code that references them.	2021-04-30 10:08:29 +02:00
Sarah Hoffmann	b88b952f56	simplify token precomputation Rename function to reflect that it is only used for precomputation. The token IDs are not really needed, so don't bother to compute the array of tokens.	2021-04-19 17:24:19 +02:00
Sarah Hoffmann	d68b02d36a	remove unused word recomputation script Has been replaced by a script recomputing counts from search_name.	2021-04-19 16:40:57 +02:00
Sarah Hoffmann	830e3be1e6	Merge pull request #2281 from changpingc/changping/fix-tiger-index fix index on location_property_tiger (parent_place_id)	2021-04-19 08:42:59 +02:00
Channgping Chen	29a314a092	fix index on location_property_tiger (parent_place_id) Looks like `2af82975cd` accidentally renamed an index. Because of the added "if not exists" clause, the index doesn't get created. This significantly slows down reverse queries because they now require full scans on location_property_tiger. Without this fix, reverse queries can take 8s on a full planet install on an r5.8xlarge instance in EC2.	2021-04-19 00:33:15 +00:00
Sarah Hoffmann	e7266b52ae	simplify name matching between boundary and place node Instead of normalising the names simply compare them in lower case. This removes the dependency on the tokenizer for linking boundaries and nodes. When looking up the linked places by place type also allow that one name is simply contained in the other. This catches the frequent case where one of the names has an addendum (e.g. Newport vs. City of Newport). Drops the special index for the name lookup and insted relies on a slightly extended version of the geometry index used for reverse lookup. Saves around 100MB on a planet.	2021-04-14 17:52:59 +02:00
Sarah Hoffmann	6cbef84cad	use new transliteration in initial housenumber word computation The new create_housenumber_id() function splits housenumber lists correctly. Otherwise there is no difference.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	55fcc44c8c	correctly handle housenumber lists Lists are now standardised to use a semicolon separator.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	16a66b5326	move transliteration of housenumbers into indexing Housenumbers are now saved in transliterated form in the housenumber column. This saves the transliteration step during lookup.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	0ec3fdd3ba	return housenumbers always from address field This means that we can use normalized versions of the housenumber in the housenumber field as it is no longer a user visible field.	2021-04-04 15:26:47 +02:00
Sarah Hoffmann	8d8b1d4307	use non-key index to speed up housenumber search On Postgresql versions 11+ add an index to speed up the lookup of housenumbers for terms found in search_name. This is really just a band-aid around the query planer's interpretation of the query.	2021-04-01 17:10:44 +02:00
Sarah Hoffmann	5dabc0aca8	create postcode id index earlier Now that the indexer takes care of indexing the postcode tables, the id index is needed to find the rows to index.	2021-03-22 22:24:56 +01:00
Darkshredder	2af82975cd	Ported tiger-data-import to python and Added Tarball Support	2021-03-08 21:57:56 +05:30
Sarah Hoffmann	09f4d767e4	port index creation to python Also switches to jinja-based preprocessing, which allows to simplify the SQL files. Use 'if not exists' where possible so that the step can be rerun to fix missing indexes.	2021-03-04 11:11:47 +01:00
Sarah Hoffmann	eacabb0e96	move table creation to jinja-based preprocessing	2021-03-03 22:07:51 +01:00
Sarah Hoffmann	d2bd6aa78d	introduce jinja2 for preprocessing SQL Replaces various hand-crafted replacements of varying format with a single Jinja2 templating mechanism. Allows full access to configuration if necessary.	2021-03-03 17:51:08 +01:00
Sarah Hoffmann	976c5e9121	introduce table for in-database properties Adds a simple table where settings for the database can be saved. This is useful for state that must not change after import.	2021-03-01 16:09:17 +01:00
Sarah Hoffmann	b9517c99ae	rename sql directory to lib-sql Also introduces a separate constant for the sql directory, so that it can be put separately from the rest of the data if required.	2021-02-09 15:26:56 +01:00

1 2 3 4

170 Commits