Sarah Hoffmann
925726222f
Merge pull request #2323 from darkshredder/disable-search-reverse-only
...
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
550e7edb64
Merge pull request #2328 from lonvia/convert-tiger-to-csv
...
Switch external Tiger data to CSV format
2021-05-14 09:58:50 +02:00
Sarah Hoffmann
2992dea5c8
install default settings for legacy_icu tokenizer
2021-05-14 09:44:10 +02:00
Sarah Hoffmann
e76e4bd964
adapt documentation to use Tiger CSV dump
2021-05-14 00:02:50 +02:00
Sarah Hoffmann
7d621389ee
adapt tests to new TIGER CSV format
2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c
use tokenizer during Tiger data import
...
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5
feat: Added reverse-only-search validation
2021-05-14 02:36:21 +05:30
Sarah Hoffmann
d7f9d2bde9
Merge pull request #2326 from lonvia/wokerpool-for-tiger-data
...
Use WorkerPool when importing Tiger data
2021-05-13 22:09:56 +02:00
Sarah Hoffmann
5feece64c1
use WorkerPool for Tiger data import
...
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
b9a09129fa
move WorkerPool into db module
...
The pool is independent of the indexer and may also be used
by other parts of the software.
2021-05-13 17:11:17 +02:00
Sarah Hoffmann
96e6bbe3a1
Merge pull request #2325 from lonvia/do-not-precompute-postcodes
...
Do not preload postcodes in the legacy tokenizer
2021-05-13 17:00:29 +02:00
Frederik Ramm
fe39185894
Add array_key_last function for PHP <7.3
...
This patch adds an array_key_last function if it doesn't yet exist, fixes #2316 . It is tested on PHP 7.2.24 but not PHP 7.3.
2021-05-13 16:42:22 +02:00
Sarah Hoffmann
fc860787dd
do not preload postcodes
...
This is too expensive for updates.
2021-05-13 16:14:12 +02:00
Sarah Hoffmann
63e35574d4
Merge pull request #2324 from lonvia/generic-external-postcodes
...
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7
fix token_info migration
...
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75
ignore invalid coordinates in external postcodes
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24
ignore entries without country code
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
41b9bc9984
add documentation for external postcode feature
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4
correctly handle removing all postcodes for country
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54
index postcodes after refreshing
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234
add and extend tests for new postcode handling
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83
move filling of postcode table to python
...
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
cae0cf3546
Merge pull request #2322 from mtmail/type-label-already-lowercased
...
typelabel value is already lowercased
2021-05-12 20:25:22 +02:00
marc tobias
38f9e18afb
typelabel value is already lowercased
2021-05-12 19:16:51 +02:00
Sarah Hoffmann
40cb17d299
Merge pull request #2314 from lonvia/fix-status-no-import-date
...
Correctly catch the exception when import date is missing
2021-05-06 17:41:53 +02:00
Sarah Hoffmann
2ae293aeb6
Merge pull request #2312 from lonvia/icu-tokenizer
...
Add new tokenizer based on libICU
2021-05-06 17:22:04 +02:00
Sarah Hoffmann
d8ead78e03
correctly catch the exception when import date is missing
2021-05-06 16:27:42 +02:00
Sarah Hoffmann
b2c6eca2c8
add missing transliterations
...
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
2021-05-05 21:16:55 +02:00
Sarah Hoffmann
872ab91421
fix name of transliterator
...
Should be different from the normalisation rules.
2021-05-05 17:09:38 +02:00
Sarah Hoffmann
a263e54b94
enable BDD tests for different tokenizers
...
The tokenizer to be used can be choosen with -DTOKENIZER.
Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
2021-05-05 10:31:51 +02:00
Sarah Hoffmann
18c99a5c5f
add unit tests for legacy ICU tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
d55fc39275
cache translieration results
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
ba8ed7967d
add PHP part for new ICU-base tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
f44af49df9
add Python part for new ICU-based tokenizer
2021-05-05 10:15:27 +02:00
Sarah Hoffmann
3c67bae868
Merge pull request #2310 from RhinoDevel/master
...
2nd try: Add hint about replication update & recheck intervals being in seconds.
2021-05-04 12:45:26 +02:00
Marc
3dade534fd
Add hint about replication update & recheck intervals being in seconds.
2021-05-04 11:47:15 +02:00
Sarah Hoffmann
8b1a509442
Merge pull request #2305 from lonvia/tokenizer
...
Factor out normalization into a separate module
2021-05-03 09:15:34 +02:00
Sarah Hoffmann
8bdb9aa607
mock tokenizer factory for replication tests
2021-05-01 10:50:39 +02:00
Sarah Hoffmann
36c624ec71
commit between migrations
...
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
7fd871a74d
increase database version for tokenizer migration
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
ced8f0f4a2
fix liniting issues
2021-04-30 17:59:50 +02:00
Sarah Hoffmann
388ebcbae2
move index creation for word table to tokenizer
...
This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
20891abe1c
indexer: fetch extra place data asynchronously
...
The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
6ce6f62b8e
fetch place info asynchronously
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
602728895e
indexer: fetch ids in batches
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
fc995ea6b9
move database check for module to tokenizer
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
be6262c6ce
move status test to tokenizer
...
The availability of the module is now tested by the tokenizer.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
893490f94e
add more tests for legacy tokenizer
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
044bb6afa5
move tokenization in query into tokenizer
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057
boilerplate for PHP code of tokenizer
...
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00