Sarah Hoffmann
143ff14466
remove special status of partial tokens
...
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.
The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5
introduce a separate token type for partials
...
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
bc8b2d4ae0
Merge pull request #2393 from lonvia/fix-flake8-issues
...
Fix flake8 issues
2021-07-13 16:46:12 +02:00
Sarah Hoffmann
14f777da18
use psycopg's SQL quoting where possible
...
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67
add helper function for execute_values
...
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00
Sarah Hoffmann
06602b4ec0
provide wrapper function for DROP TABLE
...
Use psycopg2 formatting to ensure correct quoting.
2021-07-12 20:32:46 +02:00
Sarah Hoffmann
cf98cff2a1
more formatting fixes
...
Found by flake8.
2021-07-12 17:45:42 +02:00
Sarah Hoffmann
b4fec57b6d
Merge pull request #2391 from lonvia/fix-sonar-issues
...
Fix bugs and code smells found by Sonarqube
2021-07-12 17:14:59 +02:00
Sarah Hoffmann
f8b5a63de3
factor out connection reset code
2021-07-12 14:58:44 +02:00
Sarah Hoffmann
568316f07c
simplify analyse function
2021-07-12 14:47:50 +02:00
Sarah Hoffmann
daa597b300
split up variant computation for better readability
2021-07-12 14:43:50 +02:00
Sarah Hoffmann
47adb2a3fc
reorganise process_place function
...
Move address processing into its own function as it is
rather extensive.
2021-07-12 11:57:55 +02:00
Sarah Hoffmann
fff0012249
simplify website setup code
...
Use formaat strings and move variable quoting code into extra
function.
2021-07-12 11:41:05 +02:00
Sarah Hoffmann
d5a1883b62
avoid repeated patterns for table name
2021-07-12 11:33:09 +02:00
Sarah Hoffmann
a08ef43e40
simplify if statements
2021-07-12 11:28:47 +02:00
Sarah Hoffmann
bc5e15996a
convert single case switch to if statement
2021-07-12 11:28:47 +02:00
Sarah Hoffmann
128ca800cd
avoid local variable assignment
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
000d133af6
fix more missing braces on one-liners
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
1e40d65aa9
remove dead code
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
bffbe68ec3
do not intermix params with and without default
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
58b10074ad
directly return data in function
...
The temporary variable is not necessary.
2021-07-11 19:24:04 +02:00
Sarah Hoffmann
d933ead2b5
remove unnecessayly nested ifs
...
Found by Sonarqube.
2021-07-11 19:11:37 +02:00
Sarah Hoffmann
1cdc30c5e8
remove unused functions
...
The functions were necessary for the transitory code
to Python and are no longer used.
2021-07-11 19:10:04 +02:00
Sarah Hoffmann
3661f7a321
avoid multiple returns of same value
...
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
27af9b102c
always use brackets on if statements
...
This adds bracket around all one-line if statements that did
not have them yet.
2021-07-10 17:04:46 +02:00
Sarah Hoffmann
500c61685b
remove unused variables
...
As reported by sonarqube.
2021-07-09 16:36:42 +02:00
Sarah Hoffmann
106d960f84
fix bad use of echo in PHP output
2021-07-09 12:50:35 +02:00
Sarah Hoffmann
322fa19ceb
Merge pull request #2390 from lonvia/responsible-disclosure
...
Add security issue disclosure policy
2021-07-09 12:32:37 +02:00
Sarah Hoffmann
5bea0b6086
add security issue disclosure policy
2021-07-09 11:36:59 +02:00
Sarah Hoffmann
a5970d7548
Merge pull request #2384 from lonvia/actions-add-icu-tokenizer
...
CI: run tests on Ubuntu 18
2021-07-07 14:39:53 +02:00
Sarah Hoffmann
c216144dd1
add missing pyyaml requirement
2021-07-07 11:29:33 +02:00
Sarah Hoffmann
42e08da7ca
enable PHP 7.2 for Ubuntu 18 CI
2021-07-07 11:29:33 +02:00
Sarah Hoffmann
a2edbbf78a
cannot use capture_output in subprocess.run
...
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
1e86dc1d93
remove default parameter for namedtuple
...
This is only available in Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
54f295be52
CI: run tests on older Ubuntu version as well
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8bc3c0a07c
Merge pull request #2382 from lonvia/remove-json-config
...
Remove outdated ICU tokenizer JSON config
2021-07-05 12:34:34 +02:00
Sarah Hoffmann
d75bc20174
Merge pull request #2383 from lonvia/remove-more-names
...
Exclude name:etymology and name:signed
2021-07-05 12:34:16 +02:00
Sarah Hoffmann
fd8751658f
exclude name:etymology and name:signed
...
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.
name:signed basically indicates that the feature does not have
a name.
2021-07-05 11:04:16 +02:00
Sarah Hoffmann
4db5a1a0b8
remove outdated ICU tokenizer JSON config
2021-07-05 11:01:35 +02:00
Sarah Hoffmann
4c52777ef0
Merge pull request #2371 from lonvia/increase-python-version
...
Increase minimum required Python version to 3.6
2021-07-05 10:32:38 +02:00
Sarah Hoffmann
d4c7bf20a2
Merge pull request #2381 from lonvia/reorganise-abbreviations
...
Reorganise abbreviation handling
2021-07-05 10:32:16 +02:00
Sarah Hoffmann
affe1300d9
add warning about experimental nature of ICU tokenizer
2021-07-04 10:44:58 +02:00
Sarah Hoffmann
62d5984b1b
limit the number of variants that can be produced
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
c32551b4e0
restrict partial word counting to names of reasoanble length
...
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
e85f7e7aa9
fix subsequent replacements
...
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.
Also forbit composing a word after a space was added in the
end by a previous replacement.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
7b0f6b7905
leave ICU variant properties empty for now
...
Saving unused properties causes unnecessary duplicates.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0894ce9dc3
import abbreviations from OSM Wiki
...
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
4fd2e961b6
improve normalization
...
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
b9fbfeff67
only consider partials in multi-words for initial count
...
This ensures that it is less likely that we exclude meaningful
words like 'hauptstrasse' just because they are frequent.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
5dd24b3ef0
add documentation for ICU tokenizer configuration
2021-07-04 10:28:20 +02:00