6.4 KiB
OSM Data Import
OSM data is initially imported using osm2pgsql. Nominatim uses its own data output style 'gazetteer', which differs from the output style created for map rendering.
Database Layout
The gazetteer style produces a single table place
with the following rows:
osm_type
- kind of OSM object (N - node, W - way, R - relation)osm_id
- original OSM IDclass
- key of principal tag defining the object typetype
- value of principal tag defining the object typename
- collection of tags that contain a name or referenceadmin_level
- numerical value of the tagged administrative leveladdress
- collection of tags defining the address of an objectextratags
- collection of additional interesting tags that are not directly relevant for searchinggeometry
- geometry of the object (in WGS84)
A single OSM object may appear multiple times in this table when it is tagged
with multiple tags that may constitute a principal tag. Take for example a
motorway bridge. In OSM, this would be a way which is tagged with
highway=motorway
and bridge=yes
. This way would appear in the place
table
once with class
of highway
and once with a class
of bridge
. Thus the
unique key for place
is (osm_type
, osm_id
, class
).
Configuring the Import
How tags are interpreted and assigned to the different place
columns can be
configured via the import style configuration file (CONST_Import_style
). This
is a JSON file which contains a list of rules which are matched against every
tag of every object and then assign the tag its specific role.
Configuration Rules
A single rule looks like this:
{
"keys" : ["key1", "key2", ...],
"values" : {
"value1" : "prop",
"value2" : "prop1,prop2"
}
}
A rule first defines a list of keys to apply the rule to. This is always a list
of strings. The string may have four forms. An empty string matches against
any key. A string that ends in an asterisk *
is a prefix match and accordingly
matches against any key that starts with the given string (minus the *
). A
suffix match can be defined similarly with a string that starts with a *
. Any
other string constitutes an exact match.
The second part of the rules defines a list of values and the properties that apply to a successful match. Value strings may be either empty, which means that they match any value, or describe an exact match. Prefix or suffix matching of values is not possible.
For a rule to match, it has to find a valid combination of keys and values. The resulting property is that of the matched values.
The rules in a configuration file are processed sequentially and the first match for each tag wins.
A rule where key and value are the empty string is special. This defines the fallback when none of the rules match. The fallback is always used as a last resort when nothing else matches, no matter where the rule appears in the file. Defining multiple fallback rules is not allowed. What happens in this case, is undefined.
Tag Properties
One or more of the following properties may be given for each tag:
-
main
A principal tag. A new row will be added for the object with key and value as
class
andtype
. -
with_name
When the tag is a principal tag (
main
property set): only really add a new row, if there is any name tag found (a reference tag is not sufficient, see below). -
with_name_key
When the tag is a principal tag (
main
property set): only really add a new row, if there is also a name tag that matches the key of the principal tag. For example, if the main tag isbridge=yes
, then it will only be added as an extra row, if there is a tagbridge:name[:XXX]
for the same object. If this property is set, all other names that are not domain-specific are ignored. -
fallback
When the tag is a principal tag (
main
property set): only really add a new row, when no other principal tags for this object have been found. Only one fallback tag can win for an object. -
operator
When the tag is a principal tag (
main
property set): also include theoperator
tag in the list of names. This is a special construct for an out-dated tagging practise in OSM. Fuel stations and chain restaurants in particular used to have the name of the chain tagged asoperator
. These days the chain can be more commonly found in thebrand
tag but there is still enough old data around to warrant this special case. -
name
Add tag to the list of names.
-
ref
Add tag to the list of names as a reference. At the moment this only means that the object is not considered to be named for
with_name
. -
address
Add tag to the list of address tags. If the tag starts with
addr:
oris_in:
, then this prefix is cut off before adding it to the list. -
postcode
Add the value as a postcode to the address tags. If multiple tags are candidate for postcodes, one wins out and the others are dropped.
-
country
Add the value as a country code to the address tags. The value must be a two letter country code, otherwise it is ignored. If there are multiple tags that match, then one wins out and the others are dropped.
-
house
If no principle tags can be found for the object, still add the object with
class
=place
andtype
=house
. Use this for address nodes that have no other function. -
interpolation
Add this object as an address interpolation (appears as
class
=place
andtype
=houses
in the database). -
extra
Add tag to the list of extra tags.
-
skip
Skip the tag completely. Useful when a custom default fallback is defined or to define exceptions to rules.
A rule can define as many of these properties for one match as it likes. For
example, if the property is "main,extra"
then the tag will open a new row
but also have the tag appear in the list of extra tags.
There are a number of pre-defined styles in the settings/
directory. It is
advisable to start from one of these styles when defining your own.
Changing the Style of Existing Databases
There is normally no issue changing the style of a database that is already imported and now kept up-to-date with change files. Just be aware that any change in the style applies to updates only. If you want to change the data that is already in the database, then a reimport is necessary.