Adjusts levels for boundaries according to the list on
https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative
* no admin_level 5, so drop that from addresses
* admin_level 6 has the province
* admin_level 7 has the county when it exists
Also reranks place=province so that it matches up with
admin_level 6 and introduces place=civil_parish which
is used as a place node for some admin_level=9 boundaries
in Galicia.
The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.
name:signed basically indicates that the feature does not have
a name.
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.
Also forbit composing a word after a space was added in the
end by a previous replacement.
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
As we can't refer to the project root dir in the module path, the
module path may now also be a relative directory which is then
taken as being relative to the project root path.
Moves the checkModulePresence() function into the Setup class, so
that it can work on the computed absolute module path.
CONST_BasePath is split into separate configuration variables
for binaries, libraries and data. These variables as well as
the installation path are now set in the executable directly and
no longer configurable via project settings.
This is the first step towards an installable software. The
executables should know per installation where to find their
necessary data to execute. Project configuration needs to be
restricted to settings that really concern the specific Nominatim
installation.
There are two places where the website URL is still used:
for icons, replace the URL with a link to the icon repository
of the UI repo. The more URL now builds the link from the
server info.
Rank 25 is now available for places that should appear in addresses
but not when a street is present. Use this for som block-like
place types. Also document the particularity of rank 25.
subdevisions and allotments are now at the same level as landuse
which they are frequently used together with.
Removes admin level 7, which should not exist and promotes
admin level 8 to municipality level.
place=municipality is only used for boroughs of St. Petersburg,
so demote to level 18.
Fixes#926.
These are used to mark large paved areas. Sometimes they exists
together with named regular streets. In such cases the unnamed
area may overshadow the actual street when computing the address
parent. As unnamed highways are not very useful anyway, we
simply remove them from the database.
Postal boundaries usually just have the postcode tag set and are
therefore officially 'nameless'. We want to have them as
boundary=postal_code anyways in order to distiguish them from postcode
points inherited from addr: tags.
Boundaries and places now always get a rank < 26 to make sure that
they do not parent to a street. Skip boundary=place completely
because they will be covered throught the secondary place tag.
Squares are now addressable (on address level 25) and thus can
be attached to a house number via addr:place. Needed to increase
the rank range for matching up addr:place to 25.
Traffic signs rarely have name and are therefore mostly not
searchable. Remove them completely. Allow street lamps only when
they have a name. Removes about 2M object from a planet instance.
The parameter got lost when switching to website settings.
Given that the use of a fixed parameter is limited,
debugging output can now only be set via the URL parameter.
So far we've used a buffer around a place node to define its
potential address reach. This had two problems: the buffer was
so large that addresses often contain false positives and the
buffer is really distorted when getting closer to the poles.
Change the buffer here to draw a bounndig box at a certain
distance in meter. This means that we always use the same
box everywhere on the planet and can make the extent much
smaller. Using a box has the advantage that it is much faster
to figure out if a point is within the box.
A lot of streams in OSM are of minor importance, they certainly
should show up lower in the list of results than villages. Those
rivers/streams that are well known have a wikipedia page and get
a higher importance from that.
The disadvantage with downgrading is that the address gets even
more useless but that's something that needs to be solved outside
the rank search.
Gets rid of the hard-coded expection for place nodes and sets
the address rank generally via the address level config instead.
That means only administrative boundaries are now used at that
level in addresses.
Remove unnamed landuses and postcode points from
importing. The latter will cause all objects with
address tags to be imported after all. Not expected
in the admin import style.
Instead add it as a configurable path with the one from
the source directory as the default.
Also reinstates that settings/defaults.php is installed as
settings/settings.php.
The initial search and address rank is saved in a table
that is set up from a json configuration file. Ranks may
be assigned on a country level according to class and
type of the object. Special handling that depends on the
geometry or OSM type is still hard-coded in placex insert.
The new default config file mimicks the current assignment
as close as possible. A couple of exceptions have been
removed, most notably the exception for Irish townlands.
Don't do anything if a downloaded diff is empty after all
(may be happening when an empty diff was published upstream).
Correctly compute the waiting interval before checking for new
data. As the interval is now computed based on the date of the
newest object in the database, the configured intervals need
to be adjusted slightly to take into account the time it takes
for the server to publish a diff.
Compare the normalized terms imported with the special
terms script with the normalized version of the query string.
Disregard them if they cannot be found. This avoids a significant
number of mismatches due to transliteration issues.
The match will only be done when a normalized word has been set
making this change backwards compatible with older databases.
Pyosmium comes with convenient functions for finding the
right state and does not require external files for
rembering the state. Updates can now conveniently
set up by simply running ./utils/update.php --init-updates
and state is kept directly in the import_status table.
This change requires an update in the database schema.
Run the following to update:
ALTER TABLE import_status ADD COLUMN sequence_id integer;
ALTER TABLE import_status ADD COLUMN indexed boolean;
ALTER TABLE import_osmosis_log ADD COLUMN batchseq integer;
Search results can become odd without the country search
terms, so make their inclusion a mandatory part of the
setup.
Also adds a new configuration variable to restrict the
languages taken into account by Nominatim.
Utils scripts must be run from the build directory to make sure
we get the right paths. Rename the settings file in source and
replace the original one with an error, so that scripts
fail with an understandable error message when run from the
source directory.