Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-09-20 23:48:18 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	ba8ed7967d	add PHP part for new ICU-base tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	f44af49df9	add Python part for new ICU-based tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	3c67bae868	Merge pull request #2310 from RhinoDevel/master 2nd try: Add hint about replication update & recheck intervals being in seconds.	2021-05-04 12:45:26 +02:00
Marc	3dade534fd	Add hint about replication update & recheck intervals being in seconds.	2021-05-04 11:47:15 +02:00
Sarah Hoffmann	8b1a509442	Merge pull request #2305 from lonvia/tokenizer Factor out normalization into a separate module	2021-05-03 09:15:34 +02:00
Sarah Hoffmann	8bdb9aa607	mock tokenizer factory for replication tests	2021-05-01 10:50:39 +02:00
Sarah Hoffmann	36c624ec71	commit between migrations Later migrations may require tables set up by older ones.	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	7fd871a74d	increase database version for tokenizer migration	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	ced8f0f4a2	fix liniting issues	2021-04-30 17:59:50 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	20891abe1c	indexer: fetch extra place data asynchronously The indexer now fetches any extra data besides the place_id asynchronously while processing the places from the last batch. This also means that more places are now fetched at once.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	6ce6f62b8e	fetch place info asynchronously	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	602728895e	indexer: fetch ids in batches	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	be6262c6ce	move status test to tokenizer The availability of the module is now tested by the tokenizer.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	893490f94e	add more tests for legacy tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	044bb6afa5	move tokenization in query into tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	dc700c25b6	cache all postcodes	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	0ba93e5ba9	reorganise address iteration in tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	0da481f207	remove debug code	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d75a235c1f	use address tokens in SQL	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9e92759ac7	extract address tokens in tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d8ed1bfc60	move houseunumber handling to tokenizer Normalization and token computation are now done in the tokenizer. The tokenizer keeps a cache to the hundred most used house numbers to keep the numbers of calls to the database low.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	d711f5a81e	move name token creation into tokenizer Name tokens are now handed in via token_info and used from there. Also moves the generic search name insertion function back to placex_triggers.sql.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fa2bc60468	introduce name analyzer The name analyzer is the actual work horse of the tokenizer. It is instantiated on a thread-base and provides all functions for analysing names and queries.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	1b1ed820c3	introduce index for finding surrounding buildings	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	a73711f3cd	add extra column for tokenizer Add a jsonb column to the placex and location_property_osmline tables which can be used by the installed tokenizer as required. No other part of the software will use or otherwise rely on this column.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	b5540dc35c	add migration for configurable tokenizer Adds a migration that initialises a legacy tokenizer for an existing database. The migration is not active yet as it will need completion when more functionality is added to the legacy tokenizer.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	296a66558f	move module installation to legacy tokenizer	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	af968d4903	introduce tokenizer modules This adds the boilerplate for selecting configurable tokenizers. A tokenizer can be chosen at import time and will then install itself such that it is fixed for the given database import even when the software itself is updated. The legacy tokenizer implements Nominatim's traditional algorithms.	2021-04-30 11:29:57 +02:00
Sarah Hoffmann	5c7b9ef909	Merge pull request #2303 from lonvia/remove-aux-support Remove support for AUX housenumber tables	2021-04-30 11:19:35 +02:00
Sarah Hoffmann	185d369404	remove support for AUX housenumber tables These tables have never been actively maintained and the code is completely untested. With the upcomming changes, it is unlikely that the code remains usable. This removes the aux tables and all code that references them.	2021-04-30 10:08:29 +02:00
Sarah Hoffmann	51d20b19b6	Merge pull request #2299 from lonvia/update-actions Fix database check for reverse-only	2021-04-27 12:18:45 +02:00
Sarah Hoffmann	46e8c6b112	Merge pull request #2291 from AntoJvlt/special-phrases-statistics Special phrases statistics	2021-04-27 11:57:05 +02:00
Sarah Hoffmann	c8fb25201a	do not check for extra housenumber index for reverse-only Also adds a database check for reverse only import to the CI.	2021-04-27 10:14:26 +02:00
Sarah Hoffmann	1fd483643b	add tests for different scripts	2021-04-26 23:01:06 +02:00
Sarah Hoffmann	a21a0864f1	Merge pull request #2298 from lonvia/add-warming-to-ci Add warming to CI import tests and fix more Python 3.5 compatibility issues	2021-04-26 11:21:44 +02:00
Sarah Hoffmann	4457bf7528	avoid Path in subprocess parameters Not supported by Python 3.5.	2021-04-26 10:55:23 +02:00
Sarah Hoffmann	5ed6f18d83	add warming to CI import test	2021-04-26 09:54:09 +02:00
AntoJvlt	abb3d56b20	Switching to log info and only send warning for invalid phrases	2021-04-25 17:57:43 +02:00
AntoJvlt	c5ecb9bae0	Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class	2021-04-25 17:57:43 +02:00
AntoJvlt	1b68152fb2	reorganization of folder/file for the special phrases importer	2021-04-25 17:57:42 +02:00

1 2 3 4 5 ...

3086 Commits