Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-29 08:36:24 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	2f54732500	python: implement reverse lookup function The implementation follows for most part the PHP code but introduces an additional layer parameter with which the kind of places to be returned can be restricted. This replaces the hard-coded exclusion lists.	2023-03-23 22:38:37 +01:00
Sarah Hoffmann	1facfd019b	api: generalize error handling Return a consistent error response which takes into account the chosen content type. Also adds tests for V1 server glue.	2023-03-23 10:16:50 +01:00
Sarah Hoffmann	00e3a752c9	split SearchResult type Use adapted types for the different result types. This makes it easier to have adapted output formatting and means there are only result fields that are filled.	2023-03-23 10:16:50 +01:00
biswajit-k	ca149fb796	Adds sanitizer for preventing certain tags to enter search index based on parameters fix: pylint error added docs for delete tags sanitizer fixed typos in docs and code comments fix: python typechecking error fixed rank address type Revert "fixed typos in docs and code comments" This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60. added default parameters and refactored code added test for all parameters	2023-03-09 14:18:39 +05:30
Sarah Hoffmann	ee0c5e24bb	add a WKB decoder for the Point class This allows to return point geometries from the database and makes the SQL a bit simpler.	2023-02-16 17:29:56 +01:00
Sarah Hoffmann	8557105c40	add debug output for unit tests This uses the debug output facility meant for pretty HTML output to give us debugging output for the unit tests.	2023-02-14 11:57:37 +01:00
Sarah Hoffmann	42c3754dcd	add tests for details result formatting and trim results Values that are None are no longer included in the output to save a bit of bandwidth.	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	104722a56a	switch details cli command to new Python implementation	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	1924beeb20	add lookup of postcdoe data	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	70f6f9a711	add lookup of tiger data	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	f1ceefe9a6	add lookup of address interpolations	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	189f74a40d	add unit tests for lookup function	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	370c9b38c0	improve scaffolding for API unit tests Use the static table definition to create the test database. Add helper function to simplify filling the tables.	2023-02-04 21:22:22 +01:00
Sarah Hoffmann	16b6484c65	add property cache for API This caches results from querying nominatim_properties.	2023-01-30 09:36:17 +01:00
Sarah Hoffmann	77bec1261e	add streaming json writer for JSON output	2023-01-25 15:05:33 +01:00
Sarah Hoffmann	8f4426fbc8	reorganize code around result formatting Code is now organized by api version. So formatting has moved to the api.v1 module. Instead of holding a separate ResultFormatter object per result format, simply move the functions to the formater collector and hand in the requested format as a parameter. Thus reorganized, the api.v1 module can export three simple functions for result formatting which in turn makes the code that uses the formatters much simpler.	2023-01-24 17:20:51 +01:00
Sarah Hoffmann	32c1e59622	reorganize api submodule Use a directory for the submodule where the __init__ file contains the public API. This makes it easier to separate public interface from the internal implementation.	2023-01-24 13:28:04 +01:00
Sarah Hoffmann	ce9ed993c8	fix importance recalculation The signature of the compute_importance() function has changed.	2023-01-22 22:32:16 +01:00
Sarah Hoffmann	0c47558729	convert version to named tuple Also return the new NominatimVersion rather than a string in the status result.	2023-01-03 10:03:00 +01:00
Sarah Hoffmann	93b9288c30	fix error message for non-existing database	2023-01-03 10:03:00 +01:00
Sarah Hoffmann	9d31a67116	add unit tests for new Python API	2023-01-03 10:03:00 +01:00
Sarah Hoffmann	89a34e7508	adapt tests for new lua styles	2022-12-19 17:32:28 +01:00
Sarah Hoffmann	2231401483	clean up uses of cli.nominatim() They should not hand in data paths anymore.	2022-11-27 15:27:04 +01:00
Sarah Hoffmann	2abe9e6fd9	use data paths from new nominatim.paths	2022-11-27 12:15:41 +01:00
Sarah Hoffmann	fd3dec8efe	add sanitizer for TIGER tags Currently only takes over cleaning the tiger:county data. This was done by the import until now.	2022-11-23 10:37:27 +01:00
Sarah Hoffmann	5877b69d51	do not run unit test when postgis_raster is not available	2022-10-01 11:01:49 +02:00
Sarah Hoffmann	5ec2c1b712	adapt unit tests to changed function names	2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal	0ab0f0ea44	Integrated OSM views into importance computation	2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal	ac467c7a2d	Enhanced the implementation of OSM views GeoTIFF import functionality	2022-10-01 11:01:49 +02:00
Tareq Al-Ahdal	c85b74497b	Initial implementation of GeoTIFF import functionality	2022-10-01 11:01:49 +02:00
Sarah Hoffmann	f4d3ae6f70	consolidate indexes over geometry_sectors The index over geometry_sectors are mainly used for ordering the places which need indexing. That means they function effectively as a TODO list. Consolodate them so that they always only contain the places which are still to do. Also add the appropriate index for the boundary indexing phase.	2022-09-21 10:38:58 +02:00
Sarah Hoffmann	51b6d16dc6	overhaul the token analysis interface The functional split betweenthe two functions is now that the first one creates the ID that is used in the word table and the second one creates the variants. There no longer is a requirement that the ID is the normalized version. We might later reintroduce the requirement that a normalized version be available but it doesn't necessarily need to be through the ID. The function that creates the ID now gets the full PlaceName. That way it might take into account attributes that were set by the sanitizers. Finally rename both functions to something more sane.	2022-07-29 15:14:11 +02:00
Sarah Hoffmann	c8873d34af	harmonize interface of token analysis module The configure() function now receives a Transliterator object instead of the ICU rules. This harmonizes the parameters with the create function.	2022-07-29 10:43:07 +02:00
Sarah Hoffmann	6d41046b15	add support for external sanitizer modules	2022-07-25 16:10:19 +02:00
Sarah Hoffmann	7b7203c149	add function for loading plugin modules Loads modules for configurable code like tokenizers, sanitizers, etc. Supports internal modules, external libraries and code from the project directory.	2022-07-25 16:10:10 +02:00
Kian-Meng Ang	f5e52e748f	docs: fix typos	2022-07-20 22:05:31 +08:00
Sarah Hoffmann	9963261d8d	add type annotations to special phrase importer	2022-07-18 09:54:29 +02:00
Sarah Hoffmann	62eedbb8f6	add type hints for sanitizers	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	aaf2b6032e	fix uses of config.get_path() to expect None	2022-07-18 09:47:57 +02:00
Sarah Hoffmann	b1903f0fbf	Merge pull request #2761 from lonvia/repair-index-analysis Repair `admin --analyse-indexing`	2022-07-18 09:38:08 +02:00
marc tobias	c70ca7f57b	In tests for PHP 8 disable Just-in-time, it conflicts with tools that determine coverage	2022-07-09 22:03:48 +02:00
Sarah Hoffmann	4b12d52ef5	convert admin --analyse-indexing to new indexing method A proper run of indexing requires the place information from the analyzer. Add the pre-processing of place data, so the right information is handed into the update function.	2022-07-07 16:20:08 +02:00
Sarah Hoffmann	cbbcbb1fd7	move country_info into data submodule	2022-07-06 11:08:36 +02:00
Sarah Hoffmann	bce93d60bd	move PlaceInfo into data submodule This data structure is shared between indexer and tokenizer.	2022-07-06 10:54:47 +02:00
Sarah Hoffmann	69e51aebab	test: avoid column names with upper-case letters This may cause problems when the column names get quoted.	2022-07-05 09:12:55 +02:00
Sarah Hoffmann	612d34930b	handle postcodes properly on word table updates update_postcodes_from_db() needs to do the full postcode treatment in order to derive the correct word table entries.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	7b6ec4fc6c	add tests for discarding bad postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	80ea13437d	move postcode matcher in a separate file	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	4885fdf0f9	add class for online centroid computation	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	18864afa8a	postcodes: introduce a default pattern for countries without postcodes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	9172696324	postcodes: add support for optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	baee6f3de0	postcodes: strip leading country codes	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	28ab2f6048	add postcodes patterns without optional spaces	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	90d4d339db	initial postcode cleaner for simple patterns Moves postcodes that are either in countries without a postcode system or don't correspond to the local pattern for postcodes into a field for a normal address part. Makes them searchable but not as a special address. This has two consequences: they are no longer a skippable part of the address and the postcodes cannot be searched on their own.	2022-06-23 23:42:31 +02:00
Sarah Hoffmann	cbb4749996	change indexing order for interpolations Interpolations are now indexed after rank 30 objects. The housenumber nodes no longer need information from the interpolations while the interpolations can make use of precomputed postcodes.	2022-06-02 15:16:46 +02:00
Sarah Hoffmann	46689df668	custom comparison for SpecialPhrase Duplicate elemination only works when a custom hash/equal function is implemented that is based on the members.	2022-05-30 16:30:41 +02:00
Sarah Hoffmann	e828d0d3f7	move quoting hack to wiki loader The bad quotes around the type for special phrases specifically occure in the Wiki pages, so it should be removed by the loader and not in the generic SpecialPhrase object.	2022-05-30 14:40:33 +02:00
Sarah Hoffmann	cce0e5ea38	convert special phrase loaders to generators Generators simplify the code quite a bit compared to the previous Iterator approach.	2022-05-30 14:12:46 +02:00
Sarah Hoffmann	042e314589	remove the language parameter in the SPWikiLoader Languages must always be configured through config or environment. Also use monkeypatched environment in tests.	2022-05-30 10:26:20 +02:00
Sarah Hoffmann	61d813bfef	add get_str_list() for config Converts a config value written as a comma-sparated list into a Python list of strings.	2022-05-29 13:53:50 +02:00
Sarah Hoffmann	adeebec32a	switch tests to ICU tokenizer as default	2022-05-10 14:54:50 +02:00
Sarah Hoffmann	ed6fda6968	Merge pull request #2702 from lonvia/move-country-names-into-includes Clean up country name settings	2022-05-10 09:21:16 +02:00
Marc Tobias	821dabb138	add git commit hash to --version output	2022-05-09 23:56:13 +02:00
Sarah Hoffmann	9d468f6da0	support arbitrary prefixes in country name list This means we can now get rid of the last special cases for names.	2022-05-09 11:55:26 +02:00
Marc Tobias	0de83c4a51	fix typos of name Nominatim	2022-05-05 01:04:47 +02:00
Marc Tobias	a79ab41782	new nominatim --version CLI argument	2022-05-04 01:33:25 +02:00
Sarah Hoffmann	4f59644cc2	add tests for new data invalidation functions	2022-04-14 14:52:13 +02:00
Sarah Hoffmann	fd4ab3f262	Merge pull request #2629 from tareqpi/country-names-yaml-configuration Move default country names into yaml configuration	2022-04-04 09:04:25 +02:00
Tareq Al-Ahdal	e9f979b67b	'read_config' is no longer a fixture add 'read_config' to test cases that need it	2022-04-01 22:52:17 +08:00
Tareq Al-Ahdal	a323b8f63a	test for loading special characters from country_settings.yaml	2022-04-01 21:58:57 +08:00
Tareq Al-Ahdal	9411c14fd2	fix reset country info before loading custom data	2022-04-01 21:55:34 +08:00
Tareq Al-Ahdal	8525e7542f	custom country config loads correctly	2022-04-01 21:46:56 +08:00
Sarah Hoffmann	de18cd1523	add test for new table_has_column function	2022-03-31 15:55:20 +02:00
Tareq Al-Ahdal	b5f311d6bc	separate unit test function into three functions	2022-03-30 22:06:59 +08:00
Tareq Al-Ahdal	9db13aac72	Added unit tests for loading country info from yaml file	2022-03-25 22:22:44 +08:00
Sarah Hoffmann	a0ed80d821	restore the tokenizer directory when missing Automatically repopulate the tokenizer/ directory with the PHP stub and the postgresql module, when the directory is missing. This allows to switch working directories and in particular run the service from a different maschine then where it was installed. Users still need to make sure that .env files are set up correctly or they will shoot themselves in the foot. See #2515.	2022-03-20 11:31:42 +01:00
Sarah Hoffmann	0a9f971e44	add tests for new analyzed housenumbers	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	837d44391c	move generation of normalized token form to analyzer This gives the analyzer more flexibility in choosing the normalized form. In particular, an analyzer creating different variants can choose the variant that will be used as the canonical form.	2022-03-01 09:34:32 +01:00
Sarah Hoffmann	a6b4e8ff67	add tests for housenumber-as-name feature	2022-02-07 11:45:12 +01:00
Sarah Hoffmann	38c3ef3da0	add tests for get_string_list() Renaming test file for sanitizer config because pytest requires unique names for test files.	2022-02-07 11:22:24 +01:00
Sarah Hoffmann	610f2cc254	sanitizer: move helpers into a configuration class	2022-02-07 10:48:00 +01:00
Sarah Hoffmann	c170d323d9	add tests for cleaning housenumbers	2022-01-20 23:47:20 +01:00
Sarah Hoffmann	d09db09849	adapt ICU tets to new housenumber sanitizer Restrict tests to making sure that handing in multiple housenumbers works.	2022-01-20 16:05:49 +01:00
Sarah Hoffmann	3741afa6dc	generalize filter-kind parameter for sanatizers Now behaves the same for tag_analyzer_by_language and clean_housenumbers. Adds tests.	2022-01-20 15:42:42 +01:00
Sarah Hoffmann	560a006892	add pytest config We are using custom marks now which need to be registered to avoid warnings.	2022-01-20 15:38:02 +01:00
Sarah Hoffmann	4774e45218	clean_housenumbers: make kinds and delimiters configurable Also adds unit tests for various options.	2022-01-20 12:07:12 +01:00
Sarah Hoffmann	b453b0ea95	introduce mutation variants to generic token analyser Mutations are regular-expression-based replacements that are applied after variants have been computed. They are meant to be used for variations on character level. Add spelling variations for German umlauts.	2022-01-18 11:09:21 +01:00
Sarah Hoffmann	c3788d765e	add consistent SPDX copyright headers	2022-01-03 16:23:58 +01:00
Sarah Hoffmann	7f7d2fd5b3	skip most addr: tags with suffixes Only one addr: tag can be processed currently, so make sure it is the one without suffixes to not get odd data. addr:street is the exception because it uses a different matching mechanism.	2021-12-06 14:55:10 +01:00
Sarah Hoffmann	44cfce1ca4	revert to using full names for street name matching Using partial names turned out to not work well because there are often similarly named streets next to each other. It also prevents us from being able to take into account all addr:street:* tags. This change gets all the full term tokens for the addr:street tags from the DB. As they are used for matching only, we can assume that the term must already be there or there will be no match. This avoid creating unused full name tags.	2021-12-06 11:38:38 +01:00
Sarah Hoffmann	5a9fb6eaf7	specify text type in test SQL Older version of postgres fail otherwise.	2021-12-03 13:56:23 +01:00
Sarah Hoffmann	54d35ddfe9	split cli tests by subcommand and extend coverage	2021-12-02 23:45:48 +01:00
Sarah Hoffmann	14a78f55cd	more unit tests for tokenizers	2021-12-02 15:46:36 +01:00
Sarah Hoffmann	7617a9316e	extend API unit tests	2021-12-01 20:48:29 +01:00
Sarah Hoffmann	a52ed366e4	add tests for migration	2021-12-01 20:27:40 +01:00
Sarah Hoffmann	7be164e2a5	more testing for refresh functions	2021-12-01 14:58:54 +01:00
Sarah Hoffmann	a24f25c0d8	more tests for exec utilities	2021-12-01 14:23:51 +01:00
Sarah Hoffmann	993b238a41	add more tests for database import	2021-12-01 11:54:58 +01:00
Sarah Hoffmann	bbbfc8201c	add tests for adding additional data Also adds checks that parameters for osm2pgsql are set as expected.	2021-12-01 11:22:46 +01:00
Sarah Hoffmann	6f03a4d6ce	add tests for flatten_config_file and other than yaml formats	2021-12-01 10:24:11 +01:00
Sarah Hoffmann	c8958a22d2	tests: add fixture for making test project directory	2021-11-30 18:01:46 +01:00
Sarah Hoffmann	37afa2180b	generalize fixtures for cli tests	2021-11-30 14:07:39 +01:00
Sarah Hoffmann	b2df8e478a	python test: move single-use fixtures to subdirectories	2021-11-30 12:03:16 +01:00
Sarah Hoffmann	50fccb52be	remove unused test files	2021-11-30 11:44:10 +01:00
Sarah Hoffmann	b90e719da5	organise python tests in subdirectories The directories follow the same structure as the modules in nominatim/.	2021-11-30 11:22:26 +01:00
Sarah Hoffmann	10e979e841	only instantiate indexer once for replication Also makes sure that indexer object exists everywhere were needed. See #2518.	2021-11-19 14:48:58 +01:00
Sarah Hoffmann	345c812e43	better error reporting when API script does not exist Check if the API script exists on the expected location before running php-cli. This way we can add a useful hint about the project directory. Fixes #2513.	2021-11-10 11:58:20 +01:00
Sarah Hoffmann	37eeccbf4c	ICU: use normalization from config in PHP The TERM_NORMALIZATION config option is no longer applicable. That was already documented but not yet implemented.	2021-10-27 11:32:44 +02:00
Sarah Hoffmann	5a1c3dbea3	fix parsing of operator in special phrases Because of unstripped input, the operators wouldn't match.	2021-10-25 19:46:30 +02:00
Sarah Hoffmann	1098ab732f	allow relative paths for flatnode file	2021-10-22 17:32:51 +02:00
Sarah Hoffmann	507fdd4f40	switch IMPORT_STYLE to use generic file search Allows relative paths wrt project directory.	2021-10-22 16:49:57 +02:00
Sarah Hoffmann	0ae8d7ac08	have ADDRESS_LEVEL_CONFIG use load_sub_configuration This means that relative paths now are looked up in the project directory.	2021-10-22 16:36:52 +02:00
Sarah Hoffmann	c77df2d1eb	replace NOMINATIM_PHRASE_CONFIG with command line option	2021-10-22 14:41:14 +02:00
Sarah Hoffmann	c1fa70639b	add new replication mode catch-up This mode gets updates until the server reports no new diffs anymore. Also adds additional indexing, when the main indexing step left a couple of objects to process. This happens only when the next update is expected to be more than 40min away.	2021-10-20 22:05:15 +02:00
Sarah Hoffmann	824562357b	adapt tests for new word count mechanism	2021-10-19 12:03:48 +02:00
Sarah Hoffmann	552fb16cb2	fix template expressions for tablespaces	2021-10-15 15:11:09 +02:00
Sarah Hoffmann	3649487f5e	use SP-GIST index for building index where available Point-in-polygon queries are much faster with a SP-GIST geometry index, so use that for the index used to check if a housenumber is inside a building. Only available with Postgis 3. There is an automatic fallback to GIST for Postgis 2.	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	299934fd2a	reorganize and complete tests around generic token analysis	2021-10-06 17:03:37 +02:00
Sarah Hoffmann	b18d042832	add tests for sanitizer tagging language	2021-10-06 12:29:25 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	d35400a7d7	use analyser provided in the 'analyzer' property Implements per-name choice of analyzer. If a non-default analyzer is choosen, then the 'word' identifier is extended with the name of the ana;yzer, so that we still have unique items.	2021-10-05 14:10:32 +02:00
Sarah Hoffmann	9ba2019470	precompute replacements while loading configuration	2021-10-05 10:20:08 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	6b348d43c6	replace test variable for PG env tests 'tty' was removed in PG14 and causes an error.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	09c9fad6c3	adapt tests to new ICU address token handling	2021-09-27 17:36:23 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00
Sarah Hoffmann	28c98584c1	add tests for generic YAML config reader	2021-09-03 22:31:30 +02:00
Sarah Hoffmann	1c42780bb5	introduce generic YAML config loader Adds a function to the Configuration class to load a YAML file. This means that searching for the file is generalised and works the same now for all configuration files. Changes the search logic, so that it is always possible to have a custom version of the configuration file in the project directory. Move ICU tokenizer to use new load function.	2021-09-03 18:20:07 +02:00
Sarah Hoffmann	79da96b369	read partition and languages from config file	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	78fcabade8	move country name generation to country_info module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	284645f505	move generation of country tables in own module	2021-09-02 14:41:11 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	0c023fb4d2	adapt cli tests to Python port for add-data	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	62d5984b1b	limit the number of variants that can be produced	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00

1 2 3 4 5 ...

382 Commits