Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-12-27 23:15:12 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	32ca631b74	fix full term token in special phrases	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	6ba00e6aee	icu tokenizer: move transliteration rules in separate file The tokenizer configuration has become difficult to handle due to the additional manual transliteration rules. Allow to have a separate rule file that is given to the ICU library as is.	2021-07-04 10:28:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	47fb7cd3a8	Use place_exists() into can_compute() for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	a4733eed90	Use place instead of placex to compute postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	72625dc72a	call freeze after running and non-updateable import Some of the tables will have already been removed but the tables for indexing are still there and should be dropped.	2021-06-02 11:08:48 +02:00
Sarah Hoffmann	cc2f152d70	commit changes to replication log table Fixes #2350.	2021-05-26 11:47:08 +02:00
Sarah Hoffmann	a0e85cc17c	only initialise tokenizer for refresh functions where needed Fixes #2347.	2021-05-25 19:16:22 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	fa3e48c59f	use make_keywords for place search terms also Ensures that place indeed uses the same search names as other names.	2021-05-23 23:08:11 +02:00
Sarah Hoffmann	16bb007135	Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer Do not hide errors when importing tokenizer	2021-05-18 23:00:10 +02:00
AntoJvlt	799a4c9ab6	Documentation update and small code fixes	2021-05-18 22:35:21 +02:00
Sarah Hoffmann	b2722650d4	do not hide errors when importing tokenizer Explicitly check for the tokenizer source file to check that the name is correct. We can't use the import error for that because it hides other import errors like a missing library. Fixes #2327.	2021-05-18 16:28:21 +02:00
AntoJvlt	3206bf59df	Resolve conflicts	2021-05-17 13:52:35 +02:00
AntoJvlt	8b8dfc46eb	Added --no-replace command for special phrases importation and added corresponding tests	2021-05-17 13:25:06 +02:00
AntoJvlt	06aab389ed	Code cleaning and SPLoader deleted	2021-05-16 16:59:12 +02:00
Sarah Hoffmann	925726222f	Merge pull request #2323 from darkshredder/disable-search-reverse-only Feat: Disabled search API for --reverse-only imports	2021-05-14 10:40:22 +02:00
Sarah Hoffmann	7d621389ee	adapt tests to new TIGER CSV format	2021-05-14 00:02:50 +02:00
Sarah Hoffmann	35efe3b41c	use tokenizer during Tiger data import This also changes the required import format to CSV.	2021-05-14 00:02:50 +02:00
Darkshredder	e5ffc59cd5	feat: Added reverse-only-search validation	2021-05-14 02:36:21 +05:30
Sarah Hoffmann	5feece64c1	use WorkerPool for Tiger data import Requires adding an option that SQL errors are ignored.	2021-05-13 20:36:50 +02:00
Sarah Hoffmann	b9a09129fa	move WorkerPool into db module The pool is independent of the indexer and may also be used by other parts of the software.	2021-05-13 17:11:17 +02:00
Sarah Hoffmann	fc860787dd	do not preload postcodes This is too expensive for updates.	2021-05-13 16:14:12 +02:00
Sarah Hoffmann	63e35574d4	Merge pull request #2324 from lonvia/generic-external-postcodes Rework postcode handling and generalised external postcode support	2021-05-13 14:52:19 +02:00
Sarah Hoffmann	db2dbf15f7	fix token_info migration A bad indent meant that only one table received the new column.	2021-05-13 14:31:41 +02:00
Sarah Hoffmann	f5977dac75	ignore invalid coordinates in external postcodes	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	8f2746fe24	ignore entries without country code	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	1ccd4360b4	correctly handle removing all postcodes for country	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	bf864b2c54	index postcodes after refreshing	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	4abaf71234	add and extend tests for new postcode handling	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	a4aba23a83	move filling of postcode table to python The Python code now takes care of reading postcodes from placex, enhancing them with potentially existing external postcodes and updating location_postcodes accordingly. The initial setup and updates use exactly the same function. External postcode handling has been generalized. External postcodes for any country are now accepted. The format of the external postcode file has changed. We now expect CSV, potentially gzipped. The postcodes are no longer saved in the database.	2021-05-13 14:15:42 +02:00
AntoJvlt	9d83da830f	Introduction of SPCsvLoader to load special phrases from a csv file	2021-05-10 23:26:39 +02:00
AntoJvlt	00959fac57	Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader	2021-05-10 21:49:31 +02:00
Sarah Hoffmann	872ab91421	fix name of transliterator Should be different from the normalisation rules.	2021-05-05 17:09:38 +02:00
Sarah Hoffmann	a263e54b94	enable BDD tests for different tokenizers The tokenizer to be used can be choosen with -DTOKENIZER. Adapt all tests, so that they work with legacy_icu tokenizer. Move lookup in word table to a function in the tokenizer. Special phrases are temporarily imported from the wiki until we have an implementation that can import from file. TIGER tests do not work yet.	2021-05-05 10:31:51 +02:00
Sarah Hoffmann	18c99a5c5f	add unit tests for legacy ICU tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	d55fc39275	cache translieration results	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	ba8ed7967d	add PHP part for new ICU-base tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	f44af49df9	add Python part for new ICU-based tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	36c624ec71	commit between migrations Later migrations may require tables set up by older ones.	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	7fd871a74d	increase database version for tokenizer migration	2021-05-01 10:47:35 +02:00
Sarah Hoffmann	ced8f0f4a2	fix liniting issues	2021-04-30 17:59:50 +02:00

1 2 3 4 5 ...

306 Commits