Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-23 13:44:36 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	9a5f75dba7	Merge pull request #2993 from biswajit-k/delete-tags Adds sanitizer for preventing certain tags to enter search index based on parameters	2023-03-09 14:31:45 +01:00
biswajit-k	ca149fb796	Adds sanitizer for preventing certain tags to enter search index based on parameters fix: pylint error added docs for delete tags sanitizer fixed typos in docs and code comments fix: python typechecking error fixed rank address type Revert "fixed typos in docs and code comments" This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60. added default parameters and refactored code added test for all parameters	2023-03-09 14:18:39 +05:30
biswajit-k	36388cafe9	fixed typos in docs and code comments	2023-03-06 17:09:38 +05:30
Sarah Hoffmann	2abe9e6fd9	use data paths from new nominatim.paths	2022-11-27 12:15:41 +01:00
Sarah Hoffmann	fd3dec8efe	add sanitizer for TIGER tags Currently only takes over cleaning the tiger:county data. This was done by the import until now.	2022-11-23 10:37:27 +01:00
Sarah Hoffmann	8d082c13e0	adapt to new type annotations from typeshed Some more functions frrom psycopg are now properly annotated. No ignoring necessary anymore.	2022-08-09 11:06:54 +02:00
Sarah Hoffmann	9864b191b1	fix various typos	2022-07-31 17:10:35 +02:00
Sarah Hoffmann	51b6d16dc6	overhaul the token analysis interface The functional split betweenthe two functions is now that the first one creates the ID that is used in the word table and the second one creates the variants. There no longer is a requirement that the ID is the normalized version. We might later reintroduce the requirement that a normalized version be available but it doesn't necessarily need to be through the ID. The function that creates the ID now gets the full PlaceName. That way it might take into account attributes that were set by the sanitizers. Finally rename both functions to something more sane.	2022-07-29 15:14:11 +02:00
Sarah Hoffmann	34d27ed45c	move PlaceName into the generic data module	2022-07-29 11:42:20 +02:00
Sarah Hoffmann	094100bbf6	harmonize spelling Stick with the American spelling of Analyze.	2022-07-29 10:52:01 +02:00
Sarah Hoffmann	c8873d34af	harmonize interface of token analysis module The configure() function now receives a Transliterator object instead of the ICU rules. This harmonizes the parameters with the create function.	2022-07-29 10:43:07 +02:00
Sarah Hoffmann	f0d640961a	add documentation for custom token analysis	2022-07-29 09:41:28 +02:00
Sarah Hoffmann	3746befd88	add documentation for sanitizer interface Also switches mkdocstrings to 0.18 with the rather unfortunate consequence that now mkdocstrings-python-legacy is needed as well.	2022-07-28 22:00:29 +02:00
Sarah Hoffmann	d819036daa	add support for external token analysis modules	2022-07-25 16:27:22 +02:00
Sarah Hoffmann	6d41046b15	add support for external sanitizer modules	2022-07-25 16:10:19 +02:00
Kian-Meng Ang	f5e52e748f	docs: fix typos	2022-07-20 22:05:31 +08:00
Sarah Hoffmann	83054af46f	remove typing_extensions requirement The typing_extensions package is only necessary now when running mypy. It won't be used at runtime anymore.	2022-07-18 09:55:58 +02:00
Sarah Hoffmann	9963261d8d	add type annotations to special phrase importer	2022-07-18 09:54:29 +02:00
Sarah Hoffmann	6c6bbe5747	add type annotations for ICU tokenizer	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	18b16e06ca	add type annotations for legacy tokenizer	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	e37cfc64d2	add type annotations to ICU tokenizer helper modules	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	d35e3c25b6	add type annotations for token analysis No annotations for ICU types yet.	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	62eedbb8f6	add type hints for sanitizers	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	d0c44431d0	add typing information for place_info and country_info	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	cbbcbb1fd7	move country_info into data submodule	2022-07-06 11:08:36 +02:00
Sarah Hoffmann	bce93d60bd	move PlaceInfo into data submodule This data structure is shared between indexer and tokenizer.	2022-07-06 10:54:47 +02:00
Sarah Hoffmann	612d34930b	handle postcodes properly on word table updates update_postcodes_from_db() needs to do the full postcode treatment in order to derive the correct word table entries.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	5be320368c	add documentation for postcode customization	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	7f2ad4ac7e	fix linting issue	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	0f00f4968c	fix up BDD tests for postcode changes Includes smaller code fixes found by the tests.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	37b2c6a830	port legacy tokenizer to new postcode handling Also documents the changes to the SQL functions of the tokenizer.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	e86db3001f	fix postcode pattern for Mozambique Optional groups are not implemented yet.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	80ea13437d	move postcode matcher in a separate file	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	b7704833e4	icu: switch postcodes to using the pre-formatted one	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	ca7b46511d	introduce and use analyzer for postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	18864afa8a	postcodes: introduce a default pattern for countries without postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	5ba75df507	postcode: generate a generic form	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9172696324	postcodes: add support for optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	baee6f3de0	postcodes: strip leading country codes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	90d4d339db	initial postcode cleaner for simple patterns Moves postcodes that are either in countries without a postcode system or don't correspond to the local pattern for postcodes into a field for a normal address part. Makes them searchable but not as a special address. This has two consequences: they are no longer a skippable part of the address and the postcodes cannot be searched on their own.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	8080625747	remove postcodes from countries that don't have them The postcodes will only be removed as a 'computed postcode' they are still searchable for the given object.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	1821f68ca0	exclude addr:inclusion from search	2022-05-31 14:19:19 +02:00
Sarah Hoffmann	d14a585cc9	pylint: disable no-self-use check This checker encourages bad behaviour (namely changing the static status of a function during inheritence) and will be made optional in upcoming versions of pylint.	2022-05-11 10:25:00 +02:00
Sarah Hoffmann	bb2bd76f91	pylint: avoid explicit use of format() function Use psycopg2 SQL formatters for SQL and formatted string literals everywhere else.	2022-05-11 09:48:56 +02:00
Sarah Hoffmann	7e70e5f503	always state encoding when opening files in text mode Also applies to Path.write_text().	2022-05-10 15:36:29 +02:00
Sarah Hoffmann	a0ed80d821	restore the tokenizer directory when missing Automatically repopulate the tokenizer/ directory with the PHP stub and the postgresql module, when the directory is missing. This allows to switch working directories and in particular run the service from a different maschine then where it was installed. Users still need to make sure that .env files are set up correctly or they will shoot themselves in the foot. See #2515.	2022-03-20 11:31:42 +01:00
Sarah Hoffmann	15beeef6ce	do not expand records in select list An expression of the form 'SELECT (func()).*' will be expanded by Postgresql _before_ execution with the result that the function will be called as many times as there are fields in the record. This is not what we want. The function call needs to go into the FROM clause instead.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	92bc3cd0a7	fix linting issue	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	4a3bbd0319	adapt housenumber cleanup to new word table structure	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	13ed184efd	housenumber analyzer: avoid creating too many variants Housenumber fields with lots of text are likely bad data. So is data with many changes from letter to digit. Exclude them from adding optional spaces.	2022-03-01 09:34:32 +01:00

1 2 3 4

175 Commits