Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-12-18 10:32:08 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	2c97af8021	CI: use packaged source also for test runs	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	832f75a55e	CI: unify jobs for different vagrant scripts	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	4e77969545	add workflow for centos 8	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	6ebbbfee61	CI: use vagrant scripts for import tests Use vanilla docker images of Ubuntu and leave the setup to the vagrant scripts. Then do the usual import tests. Also fixes a couple of issues found with the scripts	2021-08-24 10:10:01 +02:00
Sarah Hoffmann	0fabeefc3e	Merge pull request #2432 from Mastercuber/patch-1 Added postcode	2021-08-22 09:32:31 +02:00
Mastercuber	c70d72f06b	Added postcode Added postcode to the list of addressdetails	2021-08-22 02:52:41 +02:00
Sarah Hoffmann	cc141bf1a5	Add link to fixthemap to issue template	2021-08-21 20:36:16 +02:00
Sarah Hoffmann	199532c802	Merge pull request #2429 from lonvia/place-name-to-admin-boundary Indexing: move linking of places to the preparation stage	2021-08-21 10:21:39 +02:00
Sarah Hoffmann	28ee3d0949	move linking of places to the preparation stage Linked places may bring in extra names. These names need to be processed by the tokenizer. That means that the linking needs to be done before the data is handed to the tokenizer. Move finding the linked place into the preparation stage and update the name fields. Everything else is still done in the indexing stage.	2021-08-20 22:44:17 +02:00
Sarah Hoffmann	925195725d	Merge pull request #2428 from lonvia/rename-icu-tokenizer Rename legacy_icu tokenizer to icu tokenizer	2021-08-18 15:02:19 +02:00
Sarah Hoffmann	f6d22df76e	adapt CI workflow to new tokenizer name	2021-08-18 09:08:20 +02:00
Sarah Hoffmann	118858a55e	rename legacy_icu tokenizer to icu tokenizer The new icu tokenizer is now no longer compatible with the old legacy tokenizer in terms of data structures. Therefore there is also no longer a need to refer to the legacy tokenizer in the name.	2021-08-17 23:11:47 +02:00
Sarah Hoffmann	656c1291b1	Merge pull request #2427 from lonvia/remove-us-states-special-casing Move US state hack into legacy tokenizer	2021-08-17 21:55:32 +02:00
Sarah Hoffmann	f00b8dd1c3	move special hack for US states to legacy tokenizer The hack for IL, AL and LA is only needed because these abbreviations are removed by the legacy tokenizer as a stop word. There is no need to keep the hack for future tokenizers. Move it therefore to the token extraction function.	2021-08-17 14:28:55 +02:00
Sarah Hoffmann	5f2b9e317a	add tests for US state hacks IL, AS and LA are replaced with the US state in Geocode because the old tokenizer would simply remove the abbreviations otherwise.	2021-08-17 10:49:07 +02:00
Sarah Hoffmann	4ae5ba7fc4	Merge pull request #2425 from lonvia/tokenizer-documentation Introduce official Tokenizer API	2021-08-17 09:38:03 +02:00
Sarah Hoffmann	3656eed9ad	add mkdocstrings requirement for building docs mkdocstrings also needs access to the Python sources, so set a PYTHONPATH accordingly. This makes running mkdocs directly a bit awkward, therefore add a `make serve-doc` target.	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	2e82a6ce03	docs: extend explanation of query phrase	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	c4b8a3b768	add documentation for PHP part of tokenizer	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	1147b83b22	php: make word list a first-class object This separates the logic of creating word sets from the Phrase class. A tokenizer may now derived the word sets any way they like. The SimpleWordList class provides a standard implementation for splitting phrases on spaces.	2021-08-16 11:51:49 +02:00
Sarah Hoffmann	0fb8eade13	remove country restriction from tokenizer Restricting tokens due to the search context is better done in the generic search part instead of repeating the same test in every tokenizer implementation.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	78d11fe628	document tokenizer SQL interface	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	90b40fc3e6	define formal public Python interface for tokenizer This introduces an abstract class for the Tokenizer/Analyzer for documentation purposes.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	e25e268e2e	docs: querying and tokenizers	2021-08-16 08:59:44 +02:00
Sarah Hoffmann	68bff31cc9	docs: add developer doc page for Tokenizer	2021-08-16 08:58:56 +02:00
Sarah Hoffmann	31d9545702	Merge pull request #2424 from lonvia/multi-country-import Update instructions for importing multiple regions	2021-08-16 08:48:28 +02:00
Sarah Hoffmann	e449071a35	Merge pull request #2423 from hummeltech/patch-1 Fix old paths for `phpcs` when using `make test`	2021-08-15 22:00:50 +02:00
Sarah Hoffmann	23e3724abb	ignore words without id for status	2021-08-15 21:59:36 +02:00
Sarah Hoffmann	75a5c7013f	split up large setup function	2021-08-15 12:24:13 +02:00
Sarah Hoffmann	56d24085f9	port multi-region update scripts to nominatim tool Also updates the documentation. For the simple case of just importing multiple regions, provide simplified instructions that use the new multi-file import feature. Fixes #2365.	2021-08-14 23:55:48 +02:00
Sarah Hoffmann	95b82af42a	update osm2pgsql to 1.5.1	2021-08-14 22:46:35 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
David Hummel	8b6489c60e	Fix old paths for `phpcs` when using `make test` These paths no longer exist since `db3ced17bb`, they are now all located under `lib-php`	2021-08-12 13:34:18 -07:00
Sarah Hoffmann	bf4f05fff3	Merge pull request #2413 from osm-search/helm-chart Installation docs - link to Kubernetes install project	2021-08-08 11:09:36 +02:00
mtmail	b0aaa25f0d	Installation docs - link to Kubernetes install project As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412	2021-08-03 12:02:35 +02:00
Sarah Hoffmann	c3ddc7579a	Merge pull request #2408 from lonvia/icu-change-word-table-layout Change table layout of word table for ICU tokenizer	2021-07-28 14:28:49 +02:00
Sarah Hoffmann	fdff579188	php: force use of global Exception class	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	d48793c22c	fix Python linitin errors	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	001b2aa9f9	fix linitin issues in PHP	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	324b1b5575	bdd tests: do not query word table directly The BDD tests cannot make assumptions about the structure of the word table anymore because it depends on the tokenizer. Use more abstract descriptions instead that ask for specific kinds of tokens.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	6ad35aca4a	adapt special terms lookup to new word table	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	70f154be8b	switch word tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	4342b28882	switch special phrases to new word table format	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5394b1fa1b	switch postcode tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5ab0a63fd6	switch housenumber tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1618aba5f2	switch country name tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	8377528952	new word table layout for icu tokenizer The table now directly reflects the different token types. Extra information is saved in a json structure that may be dynamically extended in the future without affecting the table layout.	2021-07-28 11:31:47 +02:00

... 5 6 7 8 9 ...

3614 Commits