Sarah Hoffmann
2e82a6ce03
docs: extend explanation of query phrase
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
c4b8a3b768
add documentation for PHP part of tokenizer
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
1147b83b22
php: make word list a first-class object
...
This separates the logic of creating word sets from the Phrase
class. A tokenizer may now derived the word sets any way they
like. The SimpleWordList class provides a standard implementation
for splitting phrases on spaces.
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
0fb8eade13
remove country restriction from tokenizer
...
Restricting tokens due to the search context is better done in
the generic search part instead of repeating the same test in
every tokenizer implementation.
2021-08-16 11:41:54 +02:00
Sarah Hoffmann
78d11fe628
document tokenizer SQL interface
2021-08-16 11:41:54 +02:00
Sarah Hoffmann
90b40fc3e6
define formal public Python interface for tokenizer
...
This introduces an abstract class for the Tokenizer/Analyzer
for documentation purposes.
2021-08-16 11:41:54 +02:00
Sarah Hoffmann
e25e268e2e
docs: querying and tokenizers
2021-08-16 08:59:44 +02:00
Sarah Hoffmann
68bff31cc9
docs: add developer doc page for Tokenizer
2021-08-16 08:58:56 +02:00
Sarah Hoffmann
31d9545702
Merge pull request #2424 from lonvia/multi-country-import
...
Update instructions for importing multiple regions
2021-08-16 08:48:28 +02:00
Sarah Hoffmann
e449071a35
Merge pull request #2423 from hummeltech/patch-1
...
Fix old paths for `phpcs` when using `make test`
2021-08-15 22:00:50 +02:00
Sarah Hoffmann
23e3724abb
ignore words without id for status
2021-08-15 21:59:36 +02:00
Sarah Hoffmann
75a5c7013f
split up large setup function
2021-08-15 12:24:13 +02:00
Sarah Hoffmann
56d24085f9
port multi-region update scripts to nominatim tool
...
Also updates the documentation. For the simple case of just
importing multiple regions, provide simplified instructions
that use the new multi-file import feature.
Fixes #2365 .
2021-08-14 23:55:48 +02:00
Sarah Hoffmann
95b82af42a
update osm2pgsql to 1.5.1
2021-08-14 22:46:35 +02:00
Sarah Hoffmann
87dedde5d6
allow multiple files for the import command
...
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
David Hummel
8b6489c60e
Fix old paths for phpcs
when using make test
...
These paths no longer exist since db3ced17bb
, they are now all located under `lib-php`
2021-08-12 13:34:18 -07:00
Sarah Hoffmann
bf4f05fff3
Merge pull request #2413 from osm-search/helm-chart
...
Installation docs - link to Kubernetes install project
2021-08-08 11:09:36 +02:00
mtmail
b0aaa25f0d
Installation docs - link to Kubernetes install project
...
As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412
2021-08-03 12:02:35 +02:00
Sarah Hoffmann
c3ddc7579a
Merge pull request #2408 from lonvia/icu-change-word-table-layout
...
Change table layout of word table for ICU tokenizer
2021-07-28 14:28:49 +02:00
Sarah Hoffmann
fdff579188
php: force use of global Exception class
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
d48793c22c
fix Python linitin errors
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
001b2aa9f9
fix linitin issues in PHP
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1db098c05d
reinstate word column in icu word table
...
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575
bdd tests: do not query word table directly
...
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
e42878eeda
adapt unit test for new word table
...
Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
eb6814d74e
convert word info column to json before copying
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
6ad35aca4a
adapt special terms lookup to new word table
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
70f154be8b
switch word tokens to new word table layout
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
4342b28882
switch special phrases to new word table format
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5394b1fa1b
switch postcode tokens to new word table layout
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5ab0a63fd6
switch housenumber tokens to new word table layout
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1618aba5f2
switch country name tokens to new word table layout
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
8377528952
new word table layout for icu tokenizer
...
The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
34dcf02dee
fix typos in tokenizer docs
2021-07-28 11:28:49 +02:00
Sarah Hoffmann
5d7d7f15d9
Merge pull request #2401 from lonvia/port-add-data-to-python
...
Port add-data functions from PHP to Python
2021-07-26 12:38:56 +02:00
Sarah Hoffmann
0c023fb4d2
adapt cli tests to Python port for add-data
2021-07-26 10:41:37 +02:00
Sarah Hoffmann
1bd068d42d
remove unused update script
2021-07-26 10:41:37 +02:00
Sarah Hoffmann
e42349c963
replace add-data function with native Python code
2021-07-26 10:41:37 +02:00
Sarah Hoffmann
878835e4bd
move add-data subcommand into a separate file
2021-07-25 18:14:12 +02:00
Sarah Hoffmann
8096a1d67f
fix parameters for TokenWord creation
2021-07-20 10:21:40 +02:00
Sarah Hoffmann
e16c5d5f70
Merge pull request #2397 from lonvia/increase-minimum-required-versions
...
Increase minimum required PostgreSQL version to 9.5
2021-07-19 14:28:02 +02:00
Sarah Hoffmann
2c8242c8df
remove special code for pre9.5 postgresql
...
9.5 is now the minimum requirement.
2021-07-19 10:24:57 +02:00
Sarah Hoffmann
e7d6f89aca
increase minimum version for PostgreSQL to 9.5
...
This is the minimum version we can test with the CI.
With 9.5 there is also complete support for jsonb available.
2021-07-19 10:21:19 +02:00
Sarah Hoffmann
379f5db516
require Python 3.6 also in CMakeFile
...
This had been forgotten when increasing the minimum Python version.
2021-07-19 10:14:14 +02:00
Sarah Hoffmann
ee32315378
Merge pull request #2396 from lonvia/partial-word-token
...
Reorganise code that build the SearchDescription
2021-07-19 09:42:37 +02:00
Sarah Hoffmann
cca912af4e
make all Token menbers private
2021-07-18 22:54:55 +02:00
Sarah Hoffmann
86ea077092
merge marking rare name with adding name token
...
Only name tokens can be rare, so this should be the same
function.
2021-07-18 16:52:37 +02:00
Sarah Hoffmann
5d6aabc457
add documentation for public interface of SearchDescription
2021-07-18 16:10:42 +02:00
Sarah Hoffmann
b14ce959d9
factor out check if a token fits current search
...
Saves allocating an empty array.
2021-07-17 22:01:35 +02:00
Sarah Hoffmann
a48ebd9b47
move SearchDescription building into tokens
...
Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.
2021-07-17 20:24:33 +02:00