Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2024-11-27 00:49:55 +03:00

Author	SHA1	Message	Date
Sarah Hoffmann	0fb8eade13	remove country restriction from tokenizer Restricting tokens due to the search context is better done in the generic search part instead of repeating the same test in every tokenizer implementation.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	78d11fe628	document tokenizer SQL interface	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	90b40fc3e6	define formal public Python interface for tokenizer This introduces an abstract class for the Tokenizer/Analyzer for documentation purposes.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	e25e268e2e	docs: querying and tokenizers	2021-08-16 08:59:44 +02:00
Sarah Hoffmann	68bff31cc9	docs: add developer doc page for Tokenizer	2021-08-16 08:58:56 +02:00
Sarah Hoffmann	31d9545702	Merge pull request #2424 from lonvia/multi-country-import Update instructions for importing multiple regions	2021-08-16 08:48:28 +02:00
Sarah Hoffmann	e449071a35	Merge pull request #2423 from hummeltech/patch-1 Fix old paths for `phpcs` when using `make test`	2021-08-15 22:00:50 +02:00
Sarah Hoffmann	23e3724abb	ignore words without id for status	2021-08-15 21:59:36 +02:00
Sarah Hoffmann	75a5c7013f	split up large setup function	2021-08-15 12:24:13 +02:00
Sarah Hoffmann	56d24085f9	port multi-region update scripts to nominatim tool Also updates the documentation. For the simple case of just importing multiple regions, provide simplified instructions that use the new multi-file import feature. Fixes #2365.	2021-08-14 23:55:48 +02:00
Sarah Hoffmann	95b82af42a	update osm2pgsql to 1.5.1	2021-08-14 22:46:35 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
David Hummel	8b6489c60e	Fix old paths for `phpcs` when using `make test` These paths no longer exist since `db3ced17bb`, they are now all located under `lib-php`	2021-08-12 13:34:18 -07:00
Sarah Hoffmann	bf4f05fff3	Merge pull request #2413 from osm-search/helm-chart Installation docs - link to Kubernetes install project	2021-08-08 11:09:36 +02:00
mtmail	b0aaa25f0d	Installation docs - link to Kubernetes install project As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412	2021-08-03 12:02:35 +02:00
Sarah Hoffmann	c3ddc7579a	Merge pull request #2408 from lonvia/icu-change-word-table-layout Change table layout of word table for ICU tokenizer	2021-07-28 14:28:49 +02:00
Sarah Hoffmann	fdff579188	php: force use of global Exception class	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	d48793c22c	fix Python linitin errors	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	001b2aa9f9	fix linitin issues in PHP	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	324b1b5575	bdd tests: do not query word table directly The BDD tests cannot make assumptions about the structure of the word table anymore because it depends on the tokenizer. Use more abstract descriptions instead that ask for specific kinds of tokens.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	6ad35aca4a	adapt special terms lookup to new word table	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	70f154be8b	switch word tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	4342b28882	switch special phrases to new word table format	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5394b1fa1b	switch postcode tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5ab0a63fd6	switch housenumber tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1618aba5f2	switch country name tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	8377528952	new word table layout for icu tokenizer The table now directly reflects the different token types. Extra information is saved in a json structure that may be dynamically extended in the future without affecting the table layout.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	34dcf02dee	fix typos in tokenizer docs	2021-07-28 11:28:49 +02:00
Sarah Hoffmann	5d7d7f15d9	Merge pull request #2401 from lonvia/port-add-data-to-python Port add-data functions from PHP to Python	2021-07-26 12:38:56 +02:00
Sarah Hoffmann	0c023fb4d2	adapt cli tests to Python port for add-data	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	1bd068d42d	remove unused update script	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	e42349c963	replace add-data function with native Python code	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	8096a1d67f	fix parameters for TokenWord creation	2021-07-20 10:21:40 +02:00
Sarah Hoffmann	e16c5d5f70	Merge pull request #2397 from lonvia/increase-minimum-required-versions Increase minimum required PostgreSQL version to 9.5	2021-07-19 14:28:02 +02:00
Sarah Hoffmann	2c8242c8df	remove special code for pre9.5 postgresql 9.5 is now the minimum requirement.	2021-07-19 10:24:57 +02:00
Sarah Hoffmann	e7d6f89aca	increase minimum version for PostgreSQL to 9.5 This is the minimum version we can test with the CI. With 9.5 there is also complete support for jsonb available.	2021-07-19 10:21:19 +02:00
Sarah Hoffmann	379f5db516	require Python 3.6 also in CMakeFile This had been forgotten when increasing the minimum Python version.	2021-07-19 10:14:14 +02:00
Sarah Hoffmann	ee32315378	Merge pull request #2396 from lonvia/partial-word-token Reorganise code that build the SearchDescription	2021-07-19 09:42:37 +02:00
Sarah Hoffmann	cca912af4e	make all Token menbers private	2021-07-18 22:54:55 +02:00
Sarah Hoffmann	86ea077092	merge marking rare name with adding name token Only name tokens can be rare, so this should be the same function.	2021-07-18 16:52:37 +02:00
Sarah Hoffmann	5d6aabc457	add documentation for public interface of SearchDescription	2021-07-18 16:10:42 +02:00
Sarah Hoffmann	b14ce959d9	factor out check if a token fits current search Saves allocating an empty array.	2021-07-17 22:01:35 +02:00
Sarah Hoffmann	a48ebd9b47	move SearchDescription building into tokens Moving the logic for extending the SearchDescription into the token classes splits up the code and makes it more readable. More importantly: it allows tokenizer to define custom token classes in the future.	2021-07-17 20:24:33 +02:00
Sarah Hoffmann	3cd85eaaf1	remove Token from explicit input for SearchDescription extension The token string is only required by the PartialToken type, so it can simply save the token string internally. No need to pass it to every type. Also moves the check for multi-word partials to the token loader code in the tokenizer. Multi-word partials can only happen with the legacy tokenizer and when the database was loaded with an older version of Nominatim. No need to keep the check for everybody.	2021-07-17 18:18:31 +02:00
Sarah Hoffmann	ec3f6c9c42	factor out query position Moves token and phrase position and phrase type into a separate class that is handed in when assembling the search description. This drastically reduces the number of parameters for the function to extend the search descriptions and gives us more flexibility in the future for more complex positional analysis.	2021-07-15 14:12:59 +02:00
Sarah Hoffmann	143ff14466	remove special status of partial tokens Full-word tokens are no longer marked by a space at the beginning of the token. Use the new Partial token category instead. This removes a couple of special casing, we don't really need. The word table still has the space for compatibility reasons, so the tokenizer code needs to get rid of it when loading the tokens.	2021-07-14 22:17:17 +02:00

1 2 3 4 5 ...

3294 Commits