Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-22 21:28:10 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	fd3dec8efe	add sanitizer for TIGER tags Currently only takes over cleaning the tiger:county data. This was done by the import until now.	2022-11-23 10:37:27 +01:00
Sarah Hoffmann	b6ff697ff0	add experimental option for enabling forward dependencies	2022-11-21 14:48:00 +01:00
Sarah Hoffmann	d63d7cb9a8	remove dependent territories from country list Removes territories of US, France, Australia and Netherlands from the country list. These territories have their own country code (which is why they are in the list in the first place) but are mapped as part of the admin_level 2 relations for the respective parent countries. Therefore they never had any places attached. In practical terms, the change only affects the number of tables created.	2022-11-15 11:37:30 +01:00
Sarah Hoffmann	63a9bc94f7	fix country handling in flex style If the country tag does not match a 2-letter code, it needs to be dropped.	2022-11-10 15:52:13 +01:00
Sarah Hoffmann	3683cf7ddc	optimise tag match function	2022-11-10 09:38:25 +01:00
Sarah Hoffmann	51ed55cc32	initial flex import scripts Only implements the extratags style for the moment. Tests pass for the same behaviour as the gazetteer output. Updates still need to be done.	2022-11-10 09:37:38 +01:00
Sarah Hoffmann	536f08f33a	ignore 5+ postcodes in the US for now Hierarchical postcodes need a different treatment.	2022-06-24 19:24:22 +02:00
Sarah Hoffmann	e86db3001f	fix postcode pattern for Mozambique Optional groups are not implemented yet.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	ca7b46511d	introduce and use analyzer for postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	18864afa8a	postcodes: introduce a default pattern for countries without postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9cf700e85d	add postcodes for most of the remaining countries Now includes all postcodes that have optional parts.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9172696324	postcodes: add support for optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	49626ba709	add postcode formats with optional country code If the country code is not part of the mandatory output, the country code filter will do the correct handling.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	28ab2f6048	add postcodes patterns without optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	6e0014e138	add postcode patterns for numeric postcodes Adds patterns for countries that have simple numeric-only postcodes.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	8080625747	remove postcodes from countries that don't have them The postcodes will only be removed as a 'computed postcode' they are still searchable for the given object.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	21fb501699	add info about countries without a postcode	2022-06-23 23:42:31 +02:00
bgo-eiu	04644102f2	added additional languages for pakistan in country settings	2022-06-16 06:26:44 -04:00
Sarah Hoffmann	8a67ddcb2b	remove county nodes in Canada from addresses Canada has complete coverage for administrative boundaries on county level. Removing the county nodes from the addresses avoids error due to a wide-spread doubling of place nodes for city counties.	2022-05-18 10:19:05 +02:00
Sarah Hoffmann	4002bee0c1	make ICU the default tokenizer	2022-05-10 12:02:50 +02:00
Sarah Hoffmann	9d468f6da0	support arbitrary prefixes in country name list This means we can now get rid of the last special cases for names.	2022-05-09 11:55:26 +02:00
Sarah Hoffmann	3a8ddf736e	move country names into separate include files	2022-05-09 11:55:26 +02:00
Sarah Hoffmann	63dc4b39bc	ICU: better letter identification in normalization The Letter class does not include non-spacing marks that can also have a consonant or vowel meaning, especially in Indian languages. Use the alnum propoerty instead which includes them all. Also include the vowel-canceling Virama, which is not a letter by itself but changes the transliteration.	2022-04-28 18:23:17 +02:00
Sarah Hoffmann	fd4ab3f262	Merge pull request #2629 from tareqpi/country-names-yaml-configuration Move default country names into yaml configuration	2022-04-04 09:04:25 +02:00
Tareq Al-Ahdal	7bb7ed468a	fix storing of escape sequences in database	2022-03-24 13:18:44 +08:00
Sarah Hoffmann	42f0282f14	remove special case for operator names The OSM data has been sufficiently cleaned up by now that the operator no longer needs to be considered a name tag. Use 'brand' as the searchable alternative.	2022-03-18 10:48:53 +01:00
Tareq Al-Ahdal	456d439e97	Reformatting of country keys	2022-03-18 02:23:11 +08:00
Tareq Al-Ahdal	165d17f7f7	reintroduce 'name:' prefix to country name keys	2022-03-13 18:58:27 +08:00
Tareq Al-Ahdal	8b6652a40b	move default country names into yaml configuration	2022-03-12 15:17:01 +08:00
Marc Tobias	1fcc9717bb	documentation: clarify osm2pgsql isnt in project directory by default	2022-03-10 14:16:12 +01:00
Sarah Hoffmann	f03a05f6bb	add new analyser for houenumbers This analyser makes spaces optional.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	855909b4e9	add 'healthcare' as main tag Given that the tag is most of the time duplicated by an amenity tag which is already imported, only import it as a fallback when there is no name. Fixes #2609.	2022-02-21 11:52:17 +01:00
Sarah Hoffmann	610f2cc254	sanitizer: move helpers into a configuration class	2022-02-07 10:48:00 +01:00
Sarah Hoffmann	a79a3210e6	implement is-a-name option for housenumbers	2022-02-07 09:27:11 +01:00
Sarah Hoffmann	f3c9578bca	complete documentation for new clean-houseunubmers sanatizer	2022-01-20 15:49:32 +01:00
Sarah Hoffmann	206ee87188	factor out housenumber splitting into sanitizer	2022-01-19 17:27:50 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	2034ed387b	make ISO3166-2 references searchable	2022-01-13 09:44:42 +01:00
Sarah Hoffmann	fb54bd3fcf	consider "modifier letter apostrophe" to be punctuation While technically being a letter, the apostrophe is often replaced with a normal apostrophe in writing which is a punctuation mark. This makes sure that the modifier letter apostrophe yields the same normalization results and thus is really interchangable. Only has an effect after the next reimport. Fixes #2569.	2022-01-10 17:40:03 +01:00
Sarah Hoffmann	5e792078b3	remove some odd varaints of addr:street from the styles Some import has added names in partial tags which confuse the street name matching.	2021-12-06 15:17:00 +01:00
Sarah Hoffmann	80e0a3cce4	change default rank for highway objects to 30 The highway key is being used more and more for non-ways these days. This clashes with Nominatim's assumption that essentially everything that has a highway tag can be used as the street part of the address. Change the default rank of highway objects to 30 to avoid this. Only the known values for streets keep the rank 26 and are now listed explicitly.	2021-11-24 22:10:40 +01:00
Sarah Hoffmann	1886952666	avoid special characters in word tokens Transliteration should only consist of ASCII letters and numbers. Avoid any other characters.	2021-11-10 17:14:13 +01:00
Sarah Hoffmann	0ae8d7ac08	have ADDRESS_LEVEL_CONFIG use load_sub_configuration This means that relative paths now are looked up in the project directory.	2021-10-22 16:36:52 +02:00
Sarah Hoffmann	c77df2d1eb	replace NOMINATIM_PHRASE_CONFIG with command line option	2021-10-22 14:41:14 +02:00
Sarah Hoffmann	f4acfed48f	add extended documentation of settings	2021-10-18 16:30:52 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	7f3b05c179	adjust address levels for boundaries in Slovakia Levels choosen according to OSM wiki. Mainly moves admin_level 6 to county level and admin_level 8 to city/town level. Higher levels are adjusted accordingly. Fixes #2453.	2021-09-27 23:32:11 +02:00

1 2 3 4 5

210 Commits