Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-22 12:06:27 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	3683cf7ddc	optimise tag match function	2022-11-10 09:38:25 +01:00
Sarah Hoffmann	51ed55cc32	initial flex import scripts Only implements the extratags style for the moment. Tests pass for the same behaviour as the gazetteer output. Updates still need to be done.	2022-11-10 09:37:38 +01:00
Sarah Hoffmann	536f08f33a	ignore 5+ postcodes in the US for now Hierarchical postcodes need a different treatment.	2022-06-24 19:24:22 +02:00
Sarah Hoffmann	e86db3001f	fix postcode pattern for Mozambique Optional groups are not implemented yet.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	ca7b46511d	introduce and use analyzer for postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	18864afa8a	postcodes: introduce a default pattern for countries without postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9cf700e85d	add postcodes for most of the remaining countries Now includes all postcodes that have optional parts.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9172696324	postcodes: add support for optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	49626ba709	add postcode formats with optional country code If the country code is not part of the mandatory output, the country code filter will do the correct handling.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	28ab2f6048	add postcodes patterns without optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	6e0014e138	add postcode patterns for numeric postcodes Adds patterns for countries that have simple numeric-only postcodes.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	8080625747	remove postcodes from countries that don't have them The postcodes will only be removed as a 'computed postcode' they are still searchable for the given object.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	21fb501699	add info about countries without a postcode	2022-06-23 23:42:31 +02:00
bgo-eiu	04644102f2	added additional languages for pakistan in country settings	2022-06-16 06:26:44 -04:00
Sarah Hoffmann	8a67ddcb2b	remove county nodes in Canada from addresses Canada has complete coverage for administrative boundaries on county level. Removing the county nodes from the addresses avoids error due to a wide-spread doubling of place nodes for city counties.	2022-05-18 10:19:05 +02:00
Sarah Hoffmann	4002bee0c1	make ICU the default tokenizer	2022-05-10 12:02:50 +02:00
Sarah Hoffmann	9d468f6da0	support arbitrary prefixes in country name list This means we can now get rid of the last special cases for names.	2022-05-09 11:55:26 +02:00
Sarah Hoffmann	3a8ddf736e	move country names into separate include files	2022-05-09 11:55:26 +02:00
Sarah Hoffmann	63dc4b39bc	ICU: better letter identification in normalization The Letter class does not include non-spacing marks that can also have a consonant or vowel meaning, especially in Indian languages. Use the alnum propoerty instead which includes them all. Also include the vowel-canceling Virama, which is not a letter by itself but changes the transliteration.	2022-04-28 18:23:17 +02:00
Sarah Hoffmann	fd4ab3f262	Merge pull request #2629 from tareqpi/country-names-yaml-configuration Move default country names into yaml configuration	2022-04-04 09:04:25 +02:00
Tareq Al-Ahdal	7bb7ed468a	fix storing of escape sequences in database	2022-03-24 13:18:44 +08:00
Sarah Hoffmann	42f0282f14	remove special case for operator names The OSM data has been sufficiently cleaned up by now that the operator no longer needs to be considered a name tag. Use 'brand' as the searchable alternative.	2022-03-18 10:48:53 +01:00
Tareq Al-Ahdal	456d439e97	Reformatting of country keys	2022-03-18 02:23:11 +08:00
Tareq Al-Ahdal	165d17f7f7	reintroduce 'name:' prefix to country name keys	2022-03-13 18:58:27 +08:00
Tareq Al-Ahdal	8b6652a40b	move default country names into yaml configuration	2022-03-12 15:17:01 +08:00
Marc Tobias	1fcc9717bb	documentation: clarify osm2pgsql isnt in project directory by default	2022-03-10 14:16:12 +01:00
Sarah Hoffmann	f03a05f6bb	add new analyser for houenumbers This analyser makes spaces optional.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	855909b4e9	add 'healthcare' as main tag Given that the tag is most of the time duplicated by an amenity tag which is already imported, only import it as a fallback when there is no name. Fixes #2609.	2022-02-21 11:52:17 +01:00
Sarah Hoffmann	610f2cc254	sanitizer: move helpers into a configuration class	2022-02-07 10:48:00 +01:00
Sarah Hoffmann	a79a3210e6	implement is-a-name option for housenumbers	2022-02-07 09:27:11 +01:00
Sarah Hoffmann	f3c9578bca	complete documentation for new clean-houseunubmers sanatizer	2022-01-20 15:49:32 +01:00
Sarah Hoffmann	206ee87188	factor out housenumber splitting into sanitizer	2022-01-19 17:27:50 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	2034ed387b	make ISO3166-2 references searchable	2022-01-13 09:44:42 +01:00
Sarah Hoffmann	fb54bd3fcf	consider "modifier letter apostrophe" to be punctuation While technically being a letter, the apostrophe is often replaced with a normal apostrophe in writing which is a punctuation mark. This makes sure that the modifier letter apostrophe yields the same normalization results and thus is really interchangable. Only has an effect after the next reimport. Fixes #2569.	2022-01-10 17:40:03 +01:00
Sarah Hoffmann	5e792078b3	remove some odd varaints of addr:street from the styles Some import has added names in partial tags which confuse the street name matching.	2021-12-06 15:17:00 +01:00
Sarah Hoffmann	80e0a3cce4	change default rank for highway objects to 30 The highway key is being used more and more for non-ways these days. This clashes with Nominatim's assumption that essentially everything that has a highway tag can be used as the street part of the address. Change the default rank of highway objects to 30 to avoid this. Only the known values for streets keep the rank 26 and are now listed explicitly.	2021-11-24 22:10:40 +01:00
Sarah Hoffmann	1886952666	avoid special characters in word tokens Transliteration should only consist of ASCII letters and numbers. Avoid any other characters.	2021-11-10 17:14:13 +01:00
Sarah Hoffmann	0ae8d7ac08	have ADDRESS_LEVEL_CONFIG use load_sub_configuration This means that relative paths now are looked up in the project directory.	2021-10-22 16:36:52 +02:00
Sarah Hoffmann	c77df2d1eb	replace NOMINATIM_PHRASE_CONFIG with command line option	2021-10-22 14:41:14 +02:00
Sarah Hoffmann	f4acfed48f	add extended documentation of settings	2021-10-18 16:30:52 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	7f3b05c179	adjust address levels for boundaries in Slovakia Levels choosen according to OSM wiki. Mainly moves admin_level 6 to county level and admin_level 8 to city/town level. Higher levels are adjusted accordingly. Fixes #2453.	2021-09-27 23:32:11 +02:00
Sarah Hoffmann	09b1db63f4	adjust address ranks for Spain Adjusts levels for boundaries according to the list on https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative * no admin_level 5, so drop that from addresses * admin_level 6 has the province * admin_level 7 has the county when it exists Also reranks place=province so that it matches up with admin_level 6 and introduces place=civil_parish which is used as a place node for some admin_level=9 boundaries in Galicia.	2021-09-24 18:39:44 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	0b349761a8	add country configuration The new configuration saves the default language(s) originally maintained in the OSM wiki as well as the partition information.	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	b7d4ff3201	icu: normalise simplified to traditional chinese The conversion is unambigious in most cases, so that the information loss is minimal.	2021-08-31 11:18:34 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	8bc3c0a07c	Merge pull request #2382 from lonvia/remove-json-config Remove outdated ICU tokenizer JSON config	2021-07-05 12:34:34 +02:00
Sarah Hoffmann	fd8751658f	exclude name:etymology and name:signed name:etymology contains a description of the name origin and is thus more informative than search-worthy. name:signed basically indicates that the feature does not have a name.	2021-07-05 11:04:16 +02:00
Sarah Hoffmann	4db5a1a0b8	remove outdated ICU tokenizer JSON config	2021-07-05 11:01:35 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	0894ce9dc3	import abbreviations from OSM Wiki Replaces the variant rules with a slightly cleaned-up version of the abbreviation lists at https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	4fd2e961b6	improve normalization Make sure all special symbols are removed during normalization already. Those won't be interpreted in any way because they are unlikely to be searched for.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	1bd9f455fc	add abbreviations from legacy tokenizer These abbreviations are not a perfect fit anymore because abbreviation replacement is now applied before transliteration.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	6ba00e6aee	icu tokenizer: move transliteration rules in separate file The tokenizer configuration has become difficult to handle due to the additional manual transliteration rules. Allow to have a separate rule file that is given to the ICU library as is.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	b2c6eca2c8	add missing transliterations The ICU library only offers transliterations for a limited set of script. Add transliterations for missing scripts from the PostgreSQL module. These means that the same selection of scripts is supported as with the old module.	2021-05-05 21:16:55 +02:00
Sarah Hoffmann	f44af49df9	add Python part for new ICU-based tokenizer	2021-05-05 10:15:27 +02:00
Marc	3dade534fd	Add hint about replication update & recheck intervals being in seconds.	2021-05-04 11:47:15 +02:00
Sarah Hoffmann	af968d4903	introduce tokenizer modules This adds the boilerplate for selecting configurable tokenizers. A tokenizer can be chosen at import time and will then install itself such that it is fixed for the given database import even when the software itself is updated. The legacy tokenizer implements Nominatim's traditional algorithms.	2021-04-30 11:29:57 +02:00
AntoJvlt	ff34198569	Code cleaning, tests simplification and use of python3-icu package	2021-03-23 23:56:39 +01:00
AntoJvlt	6d56cbb3e8	Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file.	2021-03-23 23:30:39 +01:00
AntoJvlt	d5acade4db	Deleted specialphrases.php and phrase_settings.php	2021-03-20 19:48:05 +01:00
AntoJvlt	17cb59efbd	Ported functions for the import of special phrases from php to python. - the command is now --import-special-phrases - the output is not an sql file anymore, data are directly imported to the database. - the little part on the documentation (section data import) has been modified.	2021-03-20 19:11:50 +01:00
Sarah Hoffmann	bddfc109f8	refer to new nominatim tool in configuration comments	2021-02-02 10:56:40 +01:00
Sarah Hoffmann	45ea73913f	remove setting for PYOSMIUM_BINARY pyosmium is now called as a library from the python code, so that pyosmium-get-changes is no longer needed.	2021-01-30 15:55:04 +01:00
Sarah Hoffmann	d78f0ba804	port replication initialisation to Python	2021-01-26 22:50:54 +01:00
Sarah Hoffmann	06d89e1d47	fix various typos	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	8676e45d88	remove old default settings	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	0947b61808	switch remaining settings to dotenv format CONST_Search_AreaPolygons and CONST_Search_ReversePlanForAll have been removed completely.	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	15a1666f8a	replace database settings with dotenv variant As we can't refer to the project root dir in the module path, the module path may now also be a relative directory which is then taken as being relative to the project root path. Moves the checkModulePresence() function into the Setup class, so that it can work on the computed absolute module path.	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	b5480f6e36	reorganise path settings in config CONST_BasePath is split into separate configuration variables for binaries, libraries and data. These variables as well as the installation path are now set in the executable directly and no longer configurable via project settings. This is the first step towards an installable software. The executables should know per installation where to find their necessary data to execute. Project configuration needs to be restricted to settings that really concern the specific Nominatim installation.	2020-12-19 14:33:04 +01:00
Sarah Hoffmann	f89e71a861	make sure that admin levels in NL are kept in order	2020-11-19 09:44:02 +01:00
Hendrik Morée	dcc075b34b	Admin levels 8 and 10 of the Netherlands are municipal / city	2020-11-18 11:30:24 +01:00
Sarah Hoffmann	cbbda1ddf0	adapt admin_levels for Belgium Fixes #272.	2020-11-03 10:46:52 +01:00
Sarah Hoffmann	f050f898bc	elevate most natural feature to address rank 22 Makes them be in par with landuse features.	2020-11-02 11:42:10 +01:00
Sarah Hoffmann	b81894d3d5	remove now unused settings related to website There are two places where the website URL is still used: for icons, replace the URL with a link to the icon repository of the UI repo. The more URL now builds the link from the server info.	2020-10-29 11:13:32 +01:00
Sarah Hoffmann	abd20d3ca6	disable admin level 5 in Russia They either interfere with cities or refer to historical boundaries.	2020-10-28 10:49:26 +01:00
Sarah Hoffmann	b661c66c00	reorganize ranks of high-level place types Rank 25 is now available for places that should appear in addresses but not when a street is present. Use this for som block-like place types. Also document the particularity of rank 25. subdevisions and allotments are now at the same level as landuse which they are frequently used together with.	2020-10-20 20:20:49 +02:00
Sarah Hoffmann	73a0ec22a3	add country-specific address ranks for Russia Removes admin level 7, which should not exist and promotes admin level 8 to municipality level. place=municipality is only used for boroughs of St. Petersburg, so demote to level 18. Fixes #926.	2020-10-17 17:54:06 +02:00
Sarah Hoffmann	6a31691121	adapt address levels for admin boundaries in Indonesia	2020-10-09 22:28:06 +02:00
Sarah Hoffmann	f2ff351da4	Merge pull request #1971 from lonvia/drop-support-for-isin Drop support for is_in tag	2020-09-23 09:20:35 +02:00
Sarah Hoffmann	72193a1c23	exclude unnamed highway areas These are used to mark large paved areas. Sometimes they exists together with named regular streets. In such cases the unnamed area may overshadow the actual street when computing the address parent. As unnamed highways are not very useful anyway, we simply remove them from the database.	2020-09-22 21:42:13 +02:00
Sarah Hoffmann	d04e87fb80	drop suport for is_in tag	2020-09-22 20:26:36 +02:00
Sarah Hoffmann	7fb62ea904	postal boundary may be imported without name Postal boundaries usually just have the postcode tag set and are therefore officially 'nameless'. We want to have them as boundary=postal_code anyways in order to distiguish them from postcode points inherited from addr: tags.	2020-09-18 11:33:45 +02:00
Sarah Hoffmann	8ff1f16b7f	remove host from default website URL Just assume that Nominatim runs under the root URL. This is a more versatile base that also makes 'make serve' work out of the box.	2020-09-16 11:13:51 +02:00
Sarah Hoffmann	a68cdc40be	improve fallback ranking Boundaries and places now always get a rank < 26 to make sure that they do not parent to a street. Skip boundary=place completely because they will be covered throught the secondary place tag.	2020-09-01 17:55:40 +02:00
Sarah Hoffmann	be6ecc388c	add support for place=square Squares are now addressable (on address level 25) and thus can be attached to a house number via addr:place. Needed to increase the rank range for matching up addr:place to 25.	2020-08-26 12:12:52 +02:00
Sarah Hoffmann	071db1fae7	remove traffic signs from full styles Traffic signs rarely have name and are therefore mostly not searchable. Remove them completely. Allow street lamps only when they have a name. Removes about 2M object from a planet instance.	2020-08-15 22:37:45 +02:00
Sarah Hoffmann	05e0d3e2d4	add wiki tags to all styles wikipedia and wikidata tags are needed to compute the importance so we need to put them into extra tags for all styles. Fixes #1885.	2020-07-25 10:00:18 +02:00
Sarah Hoffmann	d364afdf3b	reenable debug parameter The parameter got lost when switching to website settings. Given that the use of a fixed parameter is limited, debugging output can now only be set via the URL parameter.	2020-07-08 08:32:46 +02:00
Sarah Hoffmann	8b8dcea3de	exclude language-specific name:prefix and name:suffix There are about 1k suffixes and 20k prefixes with a language-speicfic variant in use. These should not show up as names.	2020-07-01 18:00:53 +02:00
Sarah Hoffmann	6e4ee160ee	adapt tests to new search ranks	2020-06-17 10:53:11 +02:00
Sarah Hoffmann	a5697c5279	change place node expansion for large area table So far we've used a buffer around a place node to define its potential address reach. This had two problems: the buffer was so large that addresses often contain false positives and the buffer is really distorted when getting closer to the poles. Change the buffer here to draw a bounndig box at a certain distance in meter. This means that we always use the same box everywhere on the planet and can make the extent much smaller. Using a box has the advantage that it is much faster to figure out if a point is within the box.	2020-06-17 10:53:11 +02:00

1 2 3 4 5 ...

256 Commits