Commit Graph

3390 Commits

Author SHA1 Message Date
Sarah Hoffmann
68bff31cc9 docs: add developer doc page for Tokenizer 2021-08-16 08:58:56 +02:00
Sarah Hoffmann
31d9545702
Merge pull request #2424 from lonvia/multi-country-import
Update instructions for importing multiple regions
2021-08-16 08:48:28 +02:00
Sarah Hoffmann
e449071a35
Merge pull request #2423 from hummeltech/patch-1
Fix old paths for `phpcs` when using `make test`
2021-08-15 22:00:50 +02:00
Sarah Hoffmann
23e3724abb ignore words without id for status 2021-08-15 21:59:36 +02:00
Sarah Hoffmann
75a5c7013f split up large setup function 2021-08-15 12:24:13 +02:00
Sarah Hoffmann
56d24085f9 port multi-region update scripts to nominatim tool
Also updates the documentation. For the simple case of just
importing multiple regions, provide simplified instructions
that use the new multi-file import feature.

Fixes #2365.
2021-08-14 23:55:48 +02:00
Sarah Hoffmann
95b82af42a update osm2pgsql to 1.5.1 2021-08-14 22:46:35 +02:00
Sarah Hoffmann
87dedde5d6 allow multiple files for the import command
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
David Hummel
8b6489c60e
Fix old paths for phpcs when using make test
These paths no longer exist since db3ced17bb, they are now all located under `lib-php`
2021-08-12 13:34:18 -07:00
Sarah Hoffmann
bf4f05fff3
Merge pull request #2413 from osm-search/helm-chart
Installation docs - link to Kubernetes install project
2021-08-08 11:09:36 +02:00
mtmail
b0aaa25f0d
Installation docs - link to Kubernetes install project
As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412
2021-08-03 12:02:35 +02:00
Sarah Hoffmann
c3ddc7579a
Merge pull request #2408 from lonvia/icu-change-word-table-layout
Change table layout of word table for ICU tokenizer
2021-07-28 14:28:49 +02:00
Sarah Hoffmann
fdff579188 php: force use of global Exception class 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
d48793c22c fix Python linitin errors 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
001b2aa9f9 fix linitin issues in PHP 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1db098c05d reinstate word column in icu word table
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
e42878eeda adapt unit test for new word table
Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
eb6814d74e convert word info column to json before copying 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
6ad35aca4a adapt special terms lookup to new word table 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
70f154be8b switch word tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
4342b28882 switch special phrases to new word table format 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5394b1fa1b switch postcode tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5ab0a63fd6 switch housenumber tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1618aba5f2 switch country name tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
8377528952 new word table layout for icu tokenizer
The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
34dcf02dee fix typos in tokenizer docs 2021-07-28 11:28:49 +02:00
Sarah Hoffmann
5d7d7f15d9
Merge pull request #2401 from lonvia/port-add-data-to-python
Port add-data functions from PHP to Python
2021-07-26 12:38:56 +02:00
Sarah Hoffmann
0c023fb4d2 adapt cli tests to Python port for add-data 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
1bd068d42d remove unused update script 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
e42349c963 replace add-data function with native Python code 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
878835e4bd move add-data subcommand into a separate file 2021-07-25 18:14:12 +02:00
Sarah Hoffmann
8096a1d67f fix parameters for TokenWord creation 2021-07-20 10:21:40 +02:00
Sarah Hoffmann
e16c5d5f70
Merge pull request #2397 from lonvia/increase-minimum-required-versions
Increase minimum required PostgreSQL version to 9.5
2021-07-19 14:28:02 +02:00
Sarah Hoffmann
2c8242c8df remove special code for pre9.5 postgresql
9.5 is now the minimum requirement.
2021-07-19 10:24:57 +02:00
Sarah Hoffmann
e7d6f89aca increase minimum version for PostgreSQL to 9.5
This is the minimum version we can test with the CI.
With 9.5 there is also complete support for jsonb available.
2021-07-19 10:21:19 +02:00
Sarah Hoffmann
379f5db516 require Python 3.6 also in CMakeFile
This had been forgotten when increasing the minimum Python version.
2021-07-19 10:14:14 +02:00
Sarah Hoffmann
ee32315378
Merge pull request #2396 from lonvia/partial-word-token
Reorganise code that build the SearchDescription
2021-07-19 09:42:37 +02:00
Sarah Hoffmann
cca912af4e make all Token menbers private 2021-07-18 22:54:55 +02:00
Sarah Hoffmann
86ea077092 merge marking rare name with adding name token
Only name tokens can be rare, so this should be the same
function.
2021-07-18 16:52:37 +02:00
Sarah Hoffmann
5d6aabc457 add documentation for public interface of SearchDescription 2021-07-18 16:10:42 +02:00
Sarah Hoffmann
b14ce959d9 factor out check if a token fits current search
Saves allocating an empty array.
2021-07-17 22:01:35 +02:00
Sarah Hoffmann
a48ebd9b47 move SearchDescription building into tokens
Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.
2021-07-17 20:24:33 +02:00
Sarah Hoffmann
3cd85eaaf1 remove Token from explicit input for SearchDescription extension
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
2021-07-17 18:18:31 +02:00
Sarah Hoffmann
ec3f6c9c42 factor out query position
Moves token and phrase position and phrase type into a separate
class that is handed in when assembling the search description.
This drastically reduces the number of parameters for the function
to extend the search descriptions and gives us more flexibility
in the future for more complex positional analysis.
2021-07-15 14:12:59 +02:00
Sarah Hoffmann
143ff14466 remove special status of partial tokens
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5 introduce a separate token type for partials
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
bc8b2d4ae0
Merge pull request #2393 from lonvia/fix-flake8-issues
Fix flake8 issues
2021-07-13 16:46:12 +02:00
Sarah Hoffmann
14f777da18 use psycopg's SQL quoting where possible
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67 add helper function for execute_values
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00