This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
As we can't refer to the project root dir in the module path, the
module path may now also be a relative directory which is then
taken as being relative to the project root path.
Moves the checkModulePresence() function into the Setup class, so
that it can work on the computed absolute module path.
CONST_BasePath is split into separate configuration variables
for binaries, libraries and data. These variables as well as
the installation path are now set in the executable directly and
no longer configurable via project settings.
This is the first step towards an installable software. The
executables should know per installation where to find their
necessary data to execute. Project configuration needs to be
restricted to settings that really concern the specific Nominatim
installation.
There are two places where the website URL is still used:
for icons, replace the URL with a link to the icon repository
of the UI repo. The more URL now builds the link from the
server info.
Rank 25 is now available for places that should appear in addresses
but not when a street is present. Use this for som block-like
place types. Also document the particularity of rank 25.
subdevisions and allotments are now at the same level as landuse
which they are frequently used together with.
Removes admin level 7, which should not exist and promotes
admin level 8 to municipality level.
place=municipality is only used for boroughs of St. Petersburg,
so demote to level 18.
Fixes#926.
These are used to mark large paved areas. Sometimes they exists
together with named regular streets. In such cases the unnamed
area may overshadow the actual street when computing the address
parent. As unnamed highways are not very useful anyway, we
simply remove them from the database.
Postal boundaries usually just have the postcode tag set and are
therefore officially 'nameless'. We want to have them as
boundary=postal_code anyways in order to distiguish them from postcode
points inherited from addr: tags.
Boundaries and places now always get a rank < 26 to make sure that
they do not parent to a street. Skip boundary=place completely
because they will be covered throught the secondary place tag.
Squares are now addressable (on address level 25) and thus can
be attached to a house number via addr:place. Needed to increase
the rank range for matching up addr:place to 25.
Traffic signs rarely have name and are therefore mostly not
searchable. Remove them completely. Allow street lamps only when
they have a name. Removes about 2M object from a planet instance.
The parameter got lost when switching to website settings.
Given that the use of a fixed parameter is limited,
debugging output can now only be set via the URL parameter.
So far we've used a buffer around a place node to define its
potential address reach. This had two problems: the buffer was
so large that addresses often contain false positives and the
buffer is really distorted when getting closer to the poles.
Change the buffer here to draw a bounndig box at a certain
distance in meter. This means that we always use the same
box everywhere on the planet and can make the extent much
smaller. Using a box has the advantage that it is much faster
to figure out if a point is within the box.
A lot of streams in OSM are of minor importance, they certainly
should show up lower in the list of results than villages. Those
rivers/streams that are well known have a wikipedia page and get
a higher importance from that.
The disadvantage with downgrading is that the address gets even
more useless but that's something that needs to be solved outside
the rank search.
Gets rid of the hard-coded expection for place nodes and sets
the address rank generally via the address level config instead.
That means only administrative boundaries are now used at that
level in addresses.
Remove unnamed landuses and postcode points from
importing. The latter will cause all objects with
address tags to be imported after all. Not expected
in the admin import style.
Instead add it as a configurable path with the one from
the source directory as the default.
Also reinstates that settings/defaults.php is installed as
settings/settings.php.
The initial search and address rank is saved in a table
that is set up from a json configuration file. Ranks may
be assigned on a country level according to class and
type of the object. Special handling that depends on the
geometry or OSM type is still hard-coded in placex insert.
The new default config file mimicks the current assignment
as close as possible. A couple of exceptions have been
removed, most notably the exception for Irish townlands.
Don't do anything if a downloaded diff is empty after all
(may be happening when an empty diff was published upstream).
Correctly compute the waiting interval before checking for new
data. As the interval is now computed based on the date of the
newest object in the database, the configured intervals need
to be adjusted slightly to take into account the time it takes
for the server to publish a diff.
Compare the normalized terms imported with the special
terms script with the normalized version of the query string.
Disregard them if they cannot be found. This avoids a significant
number of mismatches due to transliteration issues.
The match will only be done when a normalized word has been set
making this change backwards compatible with older databases.
Pyosmium comes with convenient functions for finding the
right state and does not require external files for
rembering the state. Updates can now conveniently
set up by simply running ./utils/update.php --init-updates
and state is kept directly in the import_status table.
This change requires an update in the database schema.
Run the following to update:
ALTER TABLE import_status ADD COLUMN sequence_id integer;
ALTER TABLE import_status ADD COLUMN indexed boolean;
ALTER TABLE import_osmosis_log ADD COLUMN batchseq integer;
Search results can become odd without the country search
terms, so make their inclusion a mandatory part of the
setup.
Also adds a new configuration variable to restrict the
languages taken into account by Nominatim.
Utils scripts must be run from the build directory to make sure
we get the right paths. Rename the settings file in source and
replace the original one with an error, so that scripts
fail with an understandable error message when run from the
source directory.
Introduces two new settings CONST_Use_US_Tiger_Data and
CONST_Use_Aux_Location_data, which are disabled by default.
When false the corresponding tables are ignored in queries
and updates.
Aux and tiger tables are no longer created by default. This
has to be done by the corresponding import scripts. The former
aux table creation can be found in sql/aux_tables.sql for
reference.
- remove query_log table, keeping only new_query_log
- drop unused import_npi_log table
- disable DB logging per default
- use file logging structure from osm.org
Get the version from the database where necessary or simply
probe for existence of features. Fake hstore_to_json when
necessary.
Bumps the minimum required versions fro postgres to 9.1 and
for postgis to 2.0.
Executables (including websites) need to be installed in the
build directory, so that they can find the right settings.php.
settings now defines build and source dir.
Also added a sanity check to ensure that accidental removal of admin_level
tags on large areas doesn't cause huge reindexing load. That can be disabled
by setting CONST_Limit_Reindexing to false.
use osmosis --read-replication-lag to determine if there are changes before trying to process updates, useful when we are tracking hourly or daily replication updates
set CONST_Replication_Recheck_Interval to 60
skip lag check if CONST_Replication_Update_Interval > 60, for minutelies there's always new diffs to process
use tabs for indent
change sleep for non-minutely updates so that we dont drift tto much or poll excessively
unset $aReplicationLag before each exec attempt
unset $aReplicationLag inside while loop
Adds word counts from a full planet to the word table. There is a
new configuration option CONST_Max_Word_Frequency which allows to
take into account the word count: the value that was set on import
is used to determine if a word is added to the search_name table.
The value during runtime determines if a single term should be
used for partial search or simply be ignored.
function changes:
-----------------
Move to ST_PointOnSurface from ST_Centroid in various places to avoid looking up a point outside the polygon
Move to ST_Covers from ST_Contains to include points on admin boundaries
Re-order preference for get_country_code now our data is better. country_osm_grid is now the preffered source.
Fix code to calculate country code in placex_insert, rank_search test was too early
Add extra field to placex 'calculated_country_code' to improve structure of code
Move split_geometery function out of add_location into its own function
Rewrite split_geometery to be more efficient.
Change place_insert to do more updates and less delete/inserts (delete is slow)
Include wikipedia links in details.php ouput
Cleanup no longer used geometry validation (adding overhead)
Include debug statements in function.sql (--DEBUG: ) and add flag to setup.php to turn them on
setup.php:
----------
add flag --disable-token-precalc to speed up debuging
add flag --index-noanalyse to disable analysising DB at rank 4 and 26 (previously removed, but on my local DB it seems to be required)
add flag --enable-diff-updates (modifier to --create-functions) to turn on the code required for diff updates without having to modify functions.sql
add flag --enable-debug-statements (modifier to --create-functions) to turn on debug warning statements
update.php:
-----------
added flag --no-index to import osmosis changes without indexing them
extend the hack to allow import of JOSM generated osm files
country_grid.sql - reference copy of the sql used to generate the country_osm_grid table, needs cleanup
label relation member
admin_center, admin_centre relation member (with same name)
exact name, search_rank and location match
Adding this requires a new column and index:
SELECT AddGeometryColumn('placex', 'centroid', 4326, 'GEOMETRY', 2);
CREATE INDEX idx_placex_linked_place_id ON placex USING BTREE (linked_place_id);
Allows to restrict the special phrases imported from the wiki.
Blacklist allows to exclude certain class/type combinations.
Whitelist allows to define an allowed subset of types for a class.
Adjust to your liking in settings/phrase_settings.php before running
the specialphrases script.
- new config options for postgresql version and
location of osm2pgsql and osmosis binaries
- new option --no-npi for update.php that supresses writing of NPIs in
osmosis update mode
- add more GRANTs for www-data