Sarah Hoffmann
f22fa992f7
move complex typing annotations to extra file
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
992e6f72cf
type annotations for DB utils
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
e6ee3c772c
type annotations for DB connection
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
95ed95c616
add type annotations to config module
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
bf36f33e79
add type annotations for version.py
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
9b636fdc10
mypy: minimal annotations to enable a clean run
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
4b12d52ef5
convert admin --analyse-indexing to new indexing method
...
A proper run of indexing requires the place information from the
analyzer. Add the pre-processing of place data, so the right
information is handed into the update function.
2022-07-07 16:20:08 +02:00
Sarah Hoffmann
856925d19b
remove analyze() from PlaceInfo class
...
The function creates circular dependencies.
2022-07-07 12:06:58 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
612d34930b
handle postcodes properly on word table updates
...
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
5be320368c
add documentation for postcode customization
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
7f2ad4ac7e
fix linting issue
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
0f00f4968c
fix up BDD tests for postcode changes
...
Includes smaller code fixes found by the tests.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
37b2c6a830
port legacy tokenizer to new postcode handling
...
Also documents the changes to the SQL functions of the tokenizer.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
e86db3001f
fix postcode pattern for Mozambique
...
Optional groups are not implemented yet.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
67dfa38e60
fix liniting problems
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
2eca9fc8af
cache postcode normalization
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
b5e5efc131
only add well-formatted postcodes to location table
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
80ea13437d
move postcode matcher in a separate file
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
bf86b45178
move postcode centroid computation to Python
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
4885fdf0f9
add class for online centroid computation
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
b7704833e4
icu: switch postcodes to using the pre-formatted one
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
ca7b46511d
introduce and use analyzer for postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
5ba75df507
postcode: generate a generic form
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324
postcodes: add support for optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
baee6f3de0
postcodes: strip leading country codes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
90d4d339db
initial postcode cleaner for simple patterns
...
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
8080625747
remove postcodes from countries that don't have them
...
The postcodes will only be removed as a 'computed postcode' they
are still searchable for the given object.
2022-06-23 23:42:31 +02:00
Luflosi
3ea87169ac
Fix typo
2022-06-20 20:41:00 +02:00
Sarah Hoffmann
cbb4749996
change indexing order for interpolations
...
Interpolations are now indexed after rank 30 objects. The housenumber
nodes no longer need information from the interpolations while the
interpolations can make use of precomputed postcodes.
2022-06-02 15:16:46 +02:00
Sarah Hoffmann
218c56f9a6
use getattr() instead of __getattr__
...
Makes the linter happy.
2022-06-01 21:26:13 +02:00
Sarah Hoffmann
12a3d51bcc
Merge pull request #2731 from lonvia/cleanup-special-phrases
...
Minor code reorganisation around special phrase parsing
2022-05-31 17:13:56 +02:00
Sarah Hoffmann
1821f68ca0
exclude addr:inclusion from search
2022-05-31 14:19:19 +02:00
Sarah Hoffmann
46689df668
custom comparison for SpecialPhrase
...
Duplicate elemination only works when a custom hash/equal function
is implemented that is based on the members.
2022-05-30 16:30:41 +02:00
Sarah Hoffmann
e828d0d3f7
move quoting hack to wiki loader
...
The bad quotes around the type for special phrases
specifically occure in the Wiki pages, so it should be
removed by the loader and not in the generic SpecialPhrase
object.
2022-05-30 14:40:33 +02:00
Sarah Hoffmann
cce0e5ea38
convert special phrase loaders to generators
...
Generators simplify the code quite a bit compared to the previous
Iterator approach.
2022-05-30 14:12:46 +02:00
Sarah Hoffmann
042e314589
remove the language parameter in the SPWikiLoader
...
Languages must always be configured through config or environment.
Also use monkeypatched environment in tests.
2022-05-30 10:26:20 +02:00
Sarah Hoffmann
61d813bfef
add get_str_list() for config
...
Converts a config value written as a comma-sparated list into
a Python list of strings.
2022-05-29 13:53:50 +02:00
Sarah Hoffmann
dc6c4bf22e
add offline import mode
...
In offline mode no attempts are made to download data from the internet.
At the moment that only concerns the computation of the database date.
It contacts the main API to get the date.
2022-05-11 15:03:02 +02:00
Sarah Hoffmann
3ba975466c
fix spacing
...
Some versions of pylint are oddly picky.
2022-05-11 10:36:09 +02:00
Sarah Hoffmann
d14a585cc9
pylint: disable no-self-use check
...
This checker encourages bad behaviour (namely changing the static
status of a function during inheritence) and will be made optional
in upcoming versions of pylint.
2022-05-11 10:25:00 +02:00
Sarah Hoffmann
7f7a7df3a2
solve assorted issue with newer pylint versions
...
Includes more use of 'with', adding encodings to open statements
and a couple of issues with parameter renaming.
2022-05-11 10:22:14 +02:00
Sarah Hoffmann
5d5f40a82f
use context management when processing Tiger data
2022-05-11 09:48:56 +02:00
Sarah Hoffmann
ae6b029543
remove redundant 'u' prefixes for unicode strings
2022-05-11 09:48:56 +02:00
Sarah Hoffmann
bb2bd76f91
pylint: avoid explicit use of format() function
...
Use psycopg2 SQL formatters for SQL and formatted string literals
everywhere else.
2022-05-11 09:48:56 +02:00
Sarah Hoffmann
4e1e166c6a
add a function to return a formatted version
...
Replaces the various repeated format strings throughout the code.
2022-05-11 09:01:24 +02:00
Sarah Hoffmann
7e70e5f503
always state encoding when opening files in text mode
...
Also applies to Path.write_text().
2022-05-10 15:36:29 +02:00
Sarah Hoffmann
ed6fda6968
Merge pull request #2702 from lonvia/move-country-names-into-includes
...
Clean up country name settings
2022-05-10 09:21:16 +02:00