Commit Graph

226 Commits

Author SHA1 Message Date
Sarah Hoffmann
929a13d4cd remove comma as name separator
Commas are most of the time used as a part of a name, not to
separate multiple names.

See also #2950.
2023-01-22 22:29:36 +01:00
Sarah Hoffmann
56f0d678e3 exclude names ending in :wikipedia from indexing
The wikipedia prefix is used for referencing a wikipedia article
for the given tag, not the object, so not useful to search.
2023-01-21 11:16:08 +01:00
Sarah Hoffmann
610af95ed1 remove old import styles 2022-12-23 19:29:07 +01:00
Sarah Hoffmann
200eae3bc0 add tests for examples in lua style documentation
And fix all the errors the tests have found.
2022-12-23 17:35:28 +01:00
Sarah Hoffmann
9321e425a4 add documentation for flex style
Includes minor adaptions to bring the code in line with the
documentation.
2022-12-23 11:10:40 +01:00
Sarah Hoffmann
2ca83efc36 flez: add other default styles 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
06796745ff flex: hide compiled matchers 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
093d531509 flex: switch to functions for substyles
This gives us a bit more flexibility about the implementation
in the future.
2022-12-18 10:10:58 +01:00
Sarah Hoffmann
a915815e4d explicit export for functions in flex-base 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
de3c28104c flex: add combining clean function 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
d9d13a6204 flex: simplify name handling 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
d1f5820711 flex: simplify address configuration 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
7592f8f189 update osm2pgsql (flex not building index) 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
6f51c1ba33 remove code that disables processing of forward dependencies 2022-12-11 19:35:58 +01:00
Sarah Hoffmann
0e186835b9 contract duplicate spaces in transliteration string
There are some pathological cases where an isolated letter may
be deleted because it is in itself meaningless. If this happens in
the middle of a sentence, then the transliteration contains two
consecutive spaces. Add a final rule to fix this.

See #2909.
2022-12-02 10:15:02 +01:00
Sarah Hoffmann
41e8bddaa9 remove BDD test for tiger:county
We no longer rely on the import to strip the tag.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
fd3dec8efe add sanitizer for TIGER tags
Currently only takes over cleaning the tiger:county data. This was
done by the import until now.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
b6ff697ff0 add experimental option for enabling forward dependencies 2022-11-21 14:48:00 +01:00
Sarah Hoffmann
d63d7cb9a8 remove dependent territories from country list
Removes territories of US, France, Australia and Netherlands from the
country list. These territories have their own country code (which is
why they are in the list in the first place) but are mapped as part of
the admin_level 2 relations for the respective parent countries.
Therefore they never had any places attached. In practical terms, the
change only affects the number of tables created.
2022-11-15 11:37:30 +01:00
Sarah Hoffmann
63a9bc94f7 fix country handling in flex style
If the country tag does not match a 2-letter code, it needs to
be dropped.
2022-11-10 15:52:13 +01:00
Sarah Hoffmann
3683cf7ddc optimise tag match function 2022-11-10 09:38:25 +01:00
Sarah Hoffmann
51ed55cc32 initial flex import scripts
Only implements the extratags style for the moment. Tests pass
for the same behaviour as the gazetteer output. Updates still need
to be done.
2022-11-10 09:37:38 +01:00
Sarah Hoffmann
536f08f33a ignore 5+ postcodes in the US for now
Hierarchical postcodes need a different treatment.
2022-06-24 19:24:22 +02:00
Sarah Hoffmann
e86db3001f fix postcode pattern for Mozambique
Optional groups are not implemented yet.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
ca7b46511d introduce and use analyzer for postcodes 2022-06-23 23:42:31 +02:00
Sarah Hoffmann
18864afa8a postcodes: introduce a default pattern for countries without postcodes 2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9cf700e85d add postcodes for most of the remaining countries
Now includes all postcodes that have optional parts.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324 postcodes: add support for optional spaces 2022-06-23 23:42:31 +02:00
Sarah Hoffmann
49626ba709 add postcode formats with optional country code
If the country code is not part of the mandatory output, the
country code filter will do the correct handling.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
28ab2f6048 add postcodes patterns without optional spaces 2022-06-23 23:42:31 +02:00
Sarah Hoffmann
6e0014e138 add postcode patterns for numeric postcodes
Adds patterns for countries that have simple numeric-only postcodes.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
8080625747 remove postcodes from countries that don't have them
The postcodes will only be removed as a 'computed postcode' they
are still searchable for the given object.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
21fb501699 add info about countries without a postcode 2022-06-23 23:42:31 +02:00
bgo-eiu
04644102f2
added additional languages for pakistan in country settings 2022-06-16 06:26:44 -04:00
Sarah Hoffmann
8a67ddcb2b remove county nodes in Canada from addresses
Canada has complete coverage for administrative boundaries on
county level. Removing the county nodes from the addresses avoids error
due to a wide-spread doubling of place nodes for city counties.
2022-05-18 10:19:05 +02:00
Sarah Hoffmann
4002bee0c1 make ICU the default tokenizer 2022-05-10 12:02:50 +02:00
Sarah Hoffmann
9d468f6da0 support arbitrary prefixes in country name list
This means we can now get rid of the last special cases for names.
2022-05-09 11:55:26 +02:00
Sarah Hoffmann
3a8ddf736e move country names into separate include files 2022-05-09 11:55:26 +02:00
Sarah Hoffmann
63dc4b39bc ICU: better letter identification in normalization
The Letter class does not include non-spacing marks that can also
have a consonant or vowel meaning, especially in Indian languages.
Use the alnum propoerty instead which includes them all. Also
include the vowel-canceling Virama, which is not a letter by itself
but changes the transliteration.
2022-04-28 18:23:17 +02:00
Sarah Hoffmann
fd4ab3f262
Merge pull request #2629 from tareqpi/country-names-yaml-configuration
Move default country names into yaml configuration
2022-04-04 09:04:25 +02:00
Tareq Al-Ahdal
7bb7ed468a fix storing of escape sequences in database 2022-03-24 13:18:44 +08:00
Sarah Hoffmann
42f0282f14 remove special case for operator names
The OSM data has been sufficiently cleaned up by now that
the operator no longer needs to be considered a name tag.
Use 'brand' as the searchable alternative.
2022-03-18 10:48:53 +01:00
Tareq Al-Ahdal
456d439e97 Reformatting of country keys 2022-03-18 02:23:11 +08:00
Tareq Al-Ahdal
165d17f7f7 reintroduce 'name:' prefix to country name keys 2022-03-13 18:58:27 +08:00
Tareq Al-Ahdal
8b6652a40b move default country names into yaml configuration 2022-03-12 15:17:01 +08:00
Marc Tobias
1fcc9717bb documentation: clarify osm2pgsql isnt in project directory by default 2022-03-10 14:16:12 +01:00
Sarah Hoffmann
f03a05f6bb add new analyser for houenumbers
This analyser makes spaces optional.
2022-03-01 09:34:32 +01:00
Sarah Hoffmann
855909b4e9 add 'healthcare' as main tag
Given that the tag is most of the time duplicated by an amenity
tag which is already imported, only import it as a fallback when
there is no name.

Fixes #2609.
2022-02-21 11:52:17 +01:00
Sarah Hoffmann
610f2cc254 sanitizer: move helpers into a configuration class 2022-02-07 10:48:00 +01:00
Sarah Hoffmann
a79a3210e6 implement is-a-name option for housenumbers 2022-02-07 09:27:11 +01:00