Commit Graph

918 Commits

Author SHA1 Message Date
Sarah Hoffmann
996026e5ed provide full URL in more field
This is a regression against the PHP version.

Fixes #3138.
2023-08-06 17:50:02 +02:00
Sarah Hoffmann
afdbdb02a1 do not lookup by address vector when only few tokens are available
Names of countries and states are exceedingly rare in the word count
but are very frequent in the address. A short name has the danger
of producing too many results.
2023-08-02 09:25:47 +02:00
Sarah Hoffmann
252fe42612
Merge pull request #3122 from miku0/sanitizer-final
Adds sanitizer for Japanese addresses to correspond to block address
2023-08-01 10:38:58 +02:00
miku0
67e1c7dc72 Moved KANJI_MAP to icu-rules 2023-07-31 11:57:49 +00:00
miku0
4d61cc87cf Add the test of reconbine_place 2023-07-31 02:39:56 +00:00
Sarah Hoffmann
e523da9e12 reintroduce file logging for Python frontend 2023-07-30 19:58:00 +02:00
miku0
67706cec4e add @fail-legacy 2023-07-27 07:33:53 +00:00
Sarah Hoffmann
9448c5e16f add tests for new arm and export Python functions 2023-07-26 11:09:52 +02:00
miku0
0722495434 add japanese sanitizer 2023-07-26 07:54:58 +00:00
Sarah Hoffmann
d545c6d73c mostly remove php-cgi requirement
This is now only needed for BDD tests against the php API.
2023-07-26 00:10:11 +02:00
Sarah Hoffmann
f69fea4210 remove now unused run_api_script function 2023-07-25 22:45:29 +02:00
Sarah Hoffmann
4cd0a4ced4 remove now unused run_legacy_script() 2023-07-25 21:39:23 +02:00
Sarah Hoffmann
0804cc0cff port export function to Python
Some of the parameters have been renoved as they don't make sense
anymore.
2023-07-25 21:39:23 +02:00
Sarah Hoffmann
faeee7528f move warm script to python code 2023-07-25 21:39:23 +02:00
Sarah Hoffmann
66ecb56cea add tests for new endpoints 2023-07-25 10:57:19 +02:00
Sarah Hoffmann
927d2cc824 do not split names from typed phrases
When phrases are typed, they should only contain exactly one term.
2023-07-17 20:09:08 +02:00
Sarah Hoffmann
cc45930ef9 avoid lookup via partials on frequent words
Drops expensive searches via partials on terms like 'rue de'.

See #2979.
2023-07-06 12:16:57 +02:00
Sarah Hoffmann
ce17b0eeca
Merge pull request #3101 from lonvia/custom-geometry-type
Improve use of SQLAlchemy statement cache with search queries
2023-07-03 11:03:26 +02:00
Sarah Hoffmann
82216ebf8b always run function update on migrations
This means that we can have migrations which require nothing but
an update of the functions.
2023-07-01 20:18:59 +02:00
Sarah Hoffmann
9f6f12cfeb move search to bind parameters 2023-07-01 18:03:07 +02:00
Sarah Hoffmann
a873f260cf fix merging of linked names into unnamed boundaries
The NULL value of the boundaries' name field was erasing all
content when used in SQL operations.
2023-06-30 22:14:11 +02:00
Sarah Hoffmann
d7a3039c2a also switch legacy tokenizer to new street/place choice behaviour 2023-06-30 17:03:17 +02:00
Sarah Hoffmann
645ea5a057 use information from tokenizer to determine street vs. place address
So far the SQL logic used the information from the address field
to determine if an address is attached to a street or place.
This changes the logic to use the information provided in the
token_info. This allows sanitizers to enforce a certain parenting
without changing the visible address information.
2023-06-30 11:08:25 +02:00
Sarah Hoffmann
2755ebe883
Merge pull request #3094 from lonvia/fix-failing-bdd-tests
Add BDD tests against Python frontend to CI
2023-06-22 22:28:31 +02:00
Sarah Hoffmann
2d05ff0190 slightly adapt postcode tests 2023-06-22 16:51:59 +02:00
Sarah Hoffmann
0d338fa4c0 bdd: fix faking HTTP headers for python web frameworks 2023-06-22 14:00:33 +02:00
mtmail
15a66e7b7d
Merge branch 'osm-search:master' into check-database-on-frozen-database 2023-06-22 12:14:55 +02:00
Marc Tobias
2337cc653b check-database on frozen db shouldnt recommend indexing 2023-06-21 17:47:57 +02:00
Sarah Hoffmann
9bc5be837b remove useless check
Found by new mypy version.
2023-06-21 11:56:39 +02:00
Sarah Hoffmann
b79d5494f9 remove support for sanic framework
There is no performance gain over falcon or starlette but the special
structure of sanic makes it hard to have exchangable code
2023-06-21 10:53:57 +02:00
Sarah Hoffmann
36df56b093 fix header name for browser languages 2023-06-20 11:56:43 +02:00
Sarah Hoffmann
d0a1e8e311 tweak postcode search
Give a preference to left-right reading, i.e <postcode>,<address>
prefers a postcode search while <address>,<postcode> rather does
an address search.

Also exclude non-addressables, countries and state from results when a
postcode is contained in the query.
2023-06-20 11:56:43 +02:00
Sarah Hoffmann
1f83efa8f2
Merge pull request #3086 from lonvia/close-connection-on-replication
Close database connections while waiting for the next update cycle
2023-06-19 15:48:00 +02:00
Sarah Hoffmann
6f3339cc49 close DB connection when waiting for next update cycle 2023-06-19 12:02:51 +02:00
Sarah Hoffmann
771be0e056 do not fail php script generation when curly braces are present
Fixes #3084.
2023-06-19 11:23:30 +02:00
Sarah Hoffmann
41bf162306 remove tests for old PHP cli commands 2023-05-26 17:36:05 +02:00
Sarah Hoffmann
8f299838f7 fix various failing BDD tests 2023-05-26 15:08:48 +02:00
Sarah Hoffmann
146a0b29c0 add support for search by houenumber 2023-05-26 14:10:57 +02:00
Sarah Hoffmann
371a780ef4 add server fronting for search endpoint
This also implements some of the quirks of free-text search of the
V1 API, in particular, search for categories and coordinates.
2023-05-26 11:40:45 +02:00
Sarah Hoffmann
0608cf1476 switch CLI search command to python implementation 2023-05-24 22:54:54 +02:00
Sarah Hoffmann
f335e78d1e make localisation of results explicit
Localisation was previously done as part of the formatting but might
also be useful on its own when working with the results directly.
2023-05-24 18:12:34 +02:00
Sarah Hoffmann
dcfb228c9a add API functions for search functions
Search is now split into three functions: for free-text search,
for structured search and for search by category. Note that the
free-text search does not have as many hidden features like
coordinate search. Use the search parameters for that.
2023-05-24 18:05:43 +02:00
Sarah Hoffmann
dc99bbb0af implement actual database searches 2023-05-24 13:52:31 +02:00
Sarah Hoffmann
c42273a4db implement search builder 2023-05-23 11:23:44 +02:00
Sarah Hoffmann
3bf489cd7c implement token assignment 2023-05-22 15:49:03 +02:00
Sarah Hoffmann
d8240f9ee4 add query analyser for legacy tokenizer 2023-05-22 11:07:14 +02:00
Sarah Hoffmann
2448cf2a14 add factory for query analyzer 2023-05-22 09:23:19 +02:00
Sarah Hoffmann
004883bdb1 query analyzer for ICU tokenizer 2023-05-22 08:46:19 +02:00
Sarah Hoffmann
ff66595f7a add data structure for tokenized query 2023-05-21 09:30:57 +02:00
Sarah Hoffmann
d9d8b9c526 add tests for parameter converter 2023-05-18 18:09:07 +02:00
Sarah Hoffmann
bef5cea48e switch API parameters to keyword arguments
This switches the input parameters for API calls to a generic
keyword argument catch-all which is then loaded into a dataclass
where the parameters are checked and forwarded to internal
function.

The dataclass gives more flexibility with the parameters and makes
it easier to reuse common parameters for the different API calls.
2023-05-18 17:42:23 +02:00
Marc Tobias
e5f332bd71 when adding Tiger data, check first if database is in frozen state 2023-05-08 14:35:30 +02:00
Sarah Hoffmann
5751686fdc
Merge pull request #3006 from biswajit-k/generalize-filter
generalize filter function for sanitizers
2023-04-11 19:20:08 +02:00
Sarah Hoffmann
60c1301fca fix a number of corner cases with interpolation splitting
Snapping a line to a point before splitting was meant to ensure
that the split point is really on the line. However, ST_Snap() does
not always behave well for this case. It may shorten the interpolation
line in some cases with the result that two points housenumbers
suddenly fall on the same point. It might also shorten the line down
to a single point which then makes ST_Split() crash.

Switch to a combination of ST_LineLocatePoint and ST_LineSubString
instead, which guarantees to keep the original geometry. Explicitly
handle the corner cases, where the split point falls on the beginning
or end of the line.
2023-04-06 16:54:00 +02:00
Sarah Hoffmann
1dce2b98b4 switch CLI lookup command to Python implementation 2023-04-03 14:40:41 +02:00
Sarah Hoffmann
86c4897c9b add lookup call to server glue 2023-04-03 14:40:41 +02:00
Sarah Hoffmann
2237603677 add tests for new lookup API 2023-04-03 14:40:41 +02:00
Sarah Hoffmann
6e81596609 rename lookup() API to details and add lookup call
The initial plan to serve /details and /lookup endpoints from
the same API call turned out to be impractical, so the API now
also has deparate functions for both.
2023-04-03 14:40:41 +02:00
Sarah Hoffmann
ed9cd9f0e5 bdd: disable detail tests searching by place ID
Place IDs are not stable and cannot be used in tests.
2023-04-03 10:07:06 +02:00
Sarah Hoffmann
7d30dbebc5 flex style: reinstate postcode boundaries
Postcode boundaries don't have a name, so need to be imported
unconditionally.
2023-04-03 09:17:50 +02:00
biswajit-k
8f03c80ce8 generalize filter for sanitizers 2023-04-01 19:24:09 +05:30
Sarah Hoffmann
683a3cb3ec call osm2pgsql postprocessing flush_deleted_places() when adding data 2023-03-31 18:05:07 +02:00
Sarah Hoffmann
1feac2069b add BDD tests for new layers parameter 2023-03-30 09:54:55 +02:00
Sarah Hoffmann
26ee6b6dde python reverse: add support for point geometries in interpolations 2023-03-29 17:21:33 +02:00
Sarah Hoffmann
6c67a4b500 switch reverse CLI command to Python implementation 2023-03-26 18:09:33 +02:00
Sarah Hoffmann
86b43dc605 make sure PHP and Python reverse code does the same
The only allowable difference is precision of coordinates. Python uses
a precision of 7 digits where possible, which corresponds to the
precision of OSM data.

Also fixes some smaller bugs found by the BDD tests.
2023-03-26 16:21:43 +02:00
Sarah Hoffmann
35b52c4656 add output formatters for ReverseResults
These formatters are written in a way that they can be reused for
search results later.
2023-03-25 15:45:03 +01:00
Sarah Hoffmann
2f54732500 python: implement reverse lookup function
The implementation follows for most part the PHP code but introduces an
additional layer parameter with which the kind of places to be returned
can be restricted. This replaces the hard-coded exclusion lists.
2023-03-23 22:38:37 +01:00
Sarah Hoffmann
1facfd019b api: generalize error handling
Return a consistent error response which takes into account the chosen
content type. Also adds tests for V1 server glue.
2023-03-23 10:16:50 +01:00
Sarah Hoffmann
00e3a752c9 split SearchResult type
Use adapted types for the different result types. This makes it
easier to have adapted output formatting and means there are only
result fields that are filled.
2023-03-23 10:16:50 +01:00
Sarah Hoffmann
81430bd3bd bdd: be more fuzzy with coordinate comparisons 2023-03-09 22:37:45 +01:00
Sarah Hoffmann
93203f355a avoid recent Python dialect 2023-03-09 20:57:43 +01:00
Sarah Hoffmann
3f2296e3ea bdd: extend reverse API tests for format checks
Reorganise the API reverse tests and extend the checks for the
output format, testing for all expected fields.
2023-03-09 20:20:50 +01:00
Sarah Hoffmann
2b7eb4906a bdd: add tests for valid debug output 2023-03-09 20:10:51 +01:00
Sarah Hoffmann
db1aa4d02e bdd: replace old formatting strings 2023-03-09 19:49:55 +01:00
Sarah Hoffmann
ad88d7a3e0 bdd: more format checks for reverse XML 2023-03-09 19:40:24 +01:00
Sarah Hoffmann
e42c1c9c7a bdd: new step variant 'result contains in field'
This replaces the + notation for recursing into result dictionaries.
2023-03-09 19:31:21 +01:00
Sarah Hoffmann
556bb2386d bdd: factor out computation of result to-check lists 2023-03-09 18:01:45 +01:00
Sarah Hoffmann
1e58cef174 bdd: replace property_list construct with standard check functions 2023-03-09 17:56:28 +01:00
Sarah Hoffmann
01010e443f bdd: remove special case for osm_type field
The fuzzy field check hide cover formatting errors. Use 'osm' when
only caring about the conent.
2023-03-09 17:44:34 +01:00
Sarah Hoffmann
da0a7a765e bdd: reorganise field comparisons
Move comparision on Field values from assert_field() into a
comparator class. Replace BadRowValueAssert with a simpler
check_row() function.
2023-03-09 17:05:05 +01:00
Sarah Hoffmann
9769a0dcdb bdd: use new check_for_attributes() function also in steps 2023-03-09 16:44:07 +01:00
Sarah Hoffmann
fbff4fa218 bdd: fully check correctness of geojson and geocodejson
Parse code now checks presence of all required fields and exports
all fields for inspection.
2023-03-09 16:36:46 +01:00
Sarah Hoffmann
d17ec56e54 bdd: remove OrderedDict
dicts are guaranteed to keep insertion order by since Python 3.7, making
use of ORderedDict mute.
2023-03-09 16:08:39 +01:00
biswajit-k
ca149fb796 Adds sanitizer for preventing certain tags to enter search index based on parameters
fix: pylint error

added docs for delete tags sanitizer

fixed typos in docs and code comments

fix: python typechecking error

fixed rank address type

Revert "fixed typos in docs and code comments"

This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60.

added default parameters and refactored code

added test for all parameters
2023-03-09 14:18:39 +05:30
Sarah Hoffmann
412ead5f2d adapt PHP tests for debug output 2023-02-20 16:23:28 +01:00
Sarah Hoffmann
d574ceb598 restrict place rank inheritance to address items
Place tags must have no influence on street- or POI-level
objects.
2023-02-17 16:25:26 +01:00
Sarah Hoffmann
ee0c5e24bb add a WKB decoder for the Point class
This allows to return point geometries from the database and makes
the SQL a bit simpler.
2023-02-16 17:29:56 +01:00
Sarah Hoffmann
8557105c40 add debug output for unit tests
This uses the debug output facility meant for pretty HTML output
to give us debugging output for the unit tests.
2023-02-14 11:57:37 +01:00
Sarah Hoffmann
42c3754dcd add tests for details result formatting and trim results
Values that are None are no longer included in the output to save
a bit of bandwidth.
2023-02-04 21:22:22 +01:00
Sarah Hoffmann
b742200442 expand details BDD tests
There are now minor differences in the output between PHP and
Python versions, so introduce specific tests.
2023-02-04 21:22:22 +01:00
Sarah Hoffmann
3ac70f7cc2 implement details endpoint in Python servers 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
104722a56a switch details cli command to new Python implementation 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
1924beeb20 add lookup of postcdoe data 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
70f6f9a711 add lookup of tiger data 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
f1ceefe9a6 add lookup of address interpolations 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
189f74a40d add unit tests for lookup function 2023-02-04 21:22:22 +01:00
Sarah Hoffmann
370c9b38c0 improve scaffolding for API unit tests
Use the static table definition to create the test database.
Add helper function to simplify filling the tables.
2023-02-04 21:22:22 +01:00
Sarah Hoffmann
16b6484c65 add property cache for API
This caches results from querying nominatim_properties.
2023-01-30 09:36:17 +01:00
Sarah Hoffmann
77bec1261e add streaming json writer for JSON output 2023-01-25 15:05:33 +01:00
Sarah Hoffmann
8f4426fbc8 reorganize code around result formatting
Code is now organized by api version. So formatting has moved to
the api.v1 module. Instead of holding a separate ResultFormatter
object per result format, simply move the functions to the
formater collector and hand in the requested format as a parameter.
Thus reorganized, the api.v1 module can export three simple functions
for result formatting which in turn makes the code that uses
the formatters much simpler.
2023-01-24 17:20:51 +01:00
Sarah Hoffmann
32c1e59622 reorganize api submodule
Use a directory for the submodule where the __init__ file contains
the public API. This makes it easier to separate public interface
from the internal implementation.
2023-01-24 13:28:04 +01:00
Sarah Hoffmann
3cc357bffa
Merge pull request #2955 from lonvia/fix-importance-refresh
Fix importance recalculation
2023-01-23 09:07:43 +01:00
Sarah Hoffmann
ce9ed993c8 fix importance recalculation
The signature of the compute_importance() function has changed.
2023-01-22 22:32:16 +01:00
Sarah Hoffmann
929a13d4cd remove comma as name separator
Commas are most of the time used as a part of a name, not to
separate multiple names.

See also #2950.
2023-01-22 22:29:36 +01:00
Sarah Hoffmann
5f4e98e0d9 update Makefile in test directory 2023-01-09 20:49:33 +01:00
Sarah Hoffmann
a72e2ecb3f update dependencies for Actions 2023-01-03 10:03:00 +01:00
Sarah Hoffmann
0c47558729 convert version to named tuple
Also return the new NominatimVersion rather than a string in the
status result.
2023-01-03 10:03:00 +01:00
Sarah Hoffmann
93b9288c30 fix error message for non-existing database 2023-01-03 10:03:00 +01:00
Sarah Hoffmann
9d31a67116 add unit tests for new Python API 2023-01-03 10:03:00 +01:00
Sarah Hoffmann
7219ee6532 extend BDD API tests to query via Python frameworks
A new config option ENGINE allows to choose between php and any of the
supported Python engines.
2023-01-03 10:03:00 +01:00
Sarah Hoffmann
018ef5bd53 bdd: recreate project directory for every run 2022-12-23 18:36:41 +01:00
Sarah Hoffmann
200eae3bc0 add tests for examples in lua style documentation
And fix all the errors the tests have found.
2022-12-23 17:35:28 +01:00
Sarah Hoffmann
89a34e7508 adapt tests for new lua styles 2022-12-19 17:32:28 +01:00
Sarah Hoffmann
a915815e4d explicit export for functions in flex-base 2022-12-18 10:10:58 +01:00
Sarah Hoffmann
2231401483 clean up uses of cli.nominatim()
They should not hand in data paths anymore.
2022-11-27 15:27:04 +01:00
Sarah Hoffmann
2abe9e6fd9 use data paths from new nominatim.paths 2022-11-27 12:15:41 +01:00
Sarah Hoffmann
0ed60d29cb remove NOMINATIM_NOMINATIM_TOOL variable
This was used by the old PHP scripts to call the Python tool.
With the scripts now gone, the variable can be removed.
2022-11-26 16:40:20 +01:00
Sarah Hoffmann
41e8bddaa9 remove BDD test for tiger:county
We no longer rely on the import to strip the tag.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
fd3dec8efe add sanitizer for TIGER tags
Currently only takes over cleaning the tiger:county data. This was
done by the import until now.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
c9ff7d2130 drop illegal values for addr:interpolation on update 2022-11-18 17:26:56 +01:00
Sarah Hoffmann
52456230cc
Merge pull request #2887 from lonvia/lookup-linked-places
Add support for lookup of linked places
2022-11-17 13:35:53 +01:00
Sarah Hoffmann
c4b13f2b7f add support for lookup of linked places 2022-11-16 21:34:45 +01:00
Sarah Hoffmann
4f05a03d13 handle associatedStreet relations with multiple streets
When a associatedStreet relation has multiple street members
always take the closest one. Avoid geometry operations for
the frequent case that there is only one street.
2022-11-16 17:25:51 +01:00
Sarah Hoffmann
93ada250f7 bdd: add tests for osm2pgsql update of postcode nodes 2022-11-14 17:27:04 +01:00
Sarah Hoffmann
d8e3ba3b54 bdd: add osm2pgsql tests for updating interpolations 2022-11-14 16:57:31 +01:00
Sarah Hoffmann
a46348da38 bdd: test placex content when updating with osm2pgsql 2022-11-14 14:48:44 +01:00
Sarah Hoffmann
36cf0eb922 reorganize handling of place type changes
Always replace existing entries in place, never delete them because
a direct delete will cause conflicts.
2022-11-14 13:57:26 +01:00
Sarah Hoffmann
63a9bc94f7 fix country handling in flex style
If the country tag does not match a 2-letter code, it needs to
be dropped.
2022-11-10 15:52:13 +01:00
Sarah Hoffmann
2dafc4cf4f remove tests that differ between lua and gazetteer versions 2022-11-10 15:51:55 +01:00
Sarah Hoffmann
68d09f9cad node locations must be stable for osm2pgsql update tests 2022-11-10 11:11:45 +01:00
Sarah Hoffmann
b98d3d3f00 bdd: extend osm2pgsql update tests
Now also checks for correct indexing state of placex table.
2022-11-10 09:38:25 +01:00
Sarah Hoffmann
2fac507453 change updates to handle delete/insert workflow
This makes Nominatim compatible with osm2pgsql's default update
modus operandi of deleting and reinserting data. Deletes are diverted
into a TODO table instead of executing them. When data is reinserted,
the corresponding entry in the TODO table is deleted. After updates are
finished, the remaining entries in the TODO table are executed, doing
the same work as the delete trigger did before.

The new behaviour also works against the gazetteer output with its
insert-only mechanism.
2022-11-10 09:38:23 +01:00
Sarah Hoffmann
51ed55cc32 initial flex import scripts
Only implements the extratags style for the moment. Tests pass
for the same behaviour as the gazetteer output. Updates still need
to be done.
2022-11-10 09:37:38 +01:00
Sarah Hoffmann
de2a3bd5f8 bdd tests: make import style configurable
The switch is for development. Tests are not guaranteed to still
work when run with anything but the 'extratags' style.
2022-11-10 09:37:38 +01:00
Sarah Hoffmann
981e9700be add osm2pgsql gazetteer tests
This ports the gazetteer tests from osm2pgsql to BDD tests.
2022-11-10 09:37:38 +01:00
Sarah Hoffmann
5f6dcd36ed fix flaky API test
The search 'landstr' produces many duplicates so that with
some bad luck 4 or less results may appear. Disable deduplication
to make it more predictable.
2022-10-05 15:16:14 +02:00
Sarah Hoffmann
5877b69d51 do not run unit test when postgis_raster is not available 2022-10-01 11:01:49 +02:00
Sarah Hoffmann
5ec2c1b712 adapt unit tests to changed function names 2022-10-01 11:01:49 +02:00
Sarah Hoffmann
0a73ed7d64 add secondary importance to API BDD tests
Also fixes a path issue during API test DB creation that could
never possibly have worked.
2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
0ab0f0ea44 Integrated OSM views into importance computation 2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
ac467c7a2d Enhanced the implementation of OSM views GeoTIFF import functionality 2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal
c85b74497b Initial implementation of GeoTIFF import functionality 2022-10-01 11:01:49 +02:00
Sarah Hoffmann
f4d3ae6f70 consolidate indexes over geometry_sectors
The index over geometry_sectors are mainly used for ordering
the places which need indexing. That means they function effectively
as a TODO list. Consolodate them so that they always only contain
the places which are still to do. Also add the appropriate index
for the boundary indexing phase.
2022-09-21 10:38:58 +02:00
Sarah Hoffmann
dddfa3a075 ignore irrelevant extra tags on address interpolations
When deciding if an address interpolation has address information, only
look for addr:street and addr:place. If they are not there go looking
for the address on the address nodes. Ignores irrelevant tags like
addr:inclusion.

Fixes #2797.
2022-08-13 14:07:06 +02:00
Sarah Hoffmann
487e81fe3c more invalidations when boundary changes rank
When a boundary or place changes its address rank, all places where
it participates as address need to be potentially reindexed.
Also use the computed rank when testing place nodes against
boundaries. Boundaries are computed earlier.

Fixes #2794.
2022-08-12 09:48:46 +02:00
Sarah Hoffmann
51b6d16dc6 overhaul the token analysis interface
The functional split betweenthe two functions is now that the
first one creates the ID that is used in the word table and
the second one creates the variants. There no longer is a
requirement that the ID is the normalized version. We might
later reintroduce the requirement that a normalized version be available
but it doesn't necessarily need to be through the ID.

The function that creates the ID now gets the full PlaceName. That way
it might take into account attributes that were set by the sanitizers.

Finally rename both functions to something more sane.
2022-07-29 15:14:11 +02:00
Sarah Hoffmann
c8873d34af harmonize interface of token analysis module
The configure() function now receives a Transliterator object instead
of the ICU rules. This harmonizes the parameters with the create
function.
2022-07-29 10:43:07 +02:00
Sarah Hoffmann
6d41046b15 add support for external sanitizer modules 2022-07-25 16:10:19 +02:00
Sarah Hoffmann
7b7203c149 add function for loading plugin modules
Loads modules for configurable code like tokenizers, sanitizers, etc.
Supports internal modules, external libraries and code from the
project directory.
2022-07-25 16:10:10 +02:00