Commit Graph

170 Commits

Author SHA1 Message Date
Sarah Hoffmann
9397bf54b8 introduce external processing in indexer
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399 move word table and normalisation SQL into tokenizer
Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
185d369404 remove support for AUX housenumber tables
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
b88b952f56 simplify token precomputation
Rename function to reflect that it is only used for precomputation.
The token IDs are not really needed, so don't bother to compute
the array of tokens.
2021-04-19 17:24:19 +02:00
Sarah Hoffmann
d68b02d36a remove unused word recomputation script
Has been replaced by a script recomputing counts from search_name.
2021-04-19 16:40:57 +02:00
Sarah Hoffmann
830e3be1e6
Merge pull request #2281 from changpingc/changping/fix-tiger-index
fix index on location_property_tiger (parent_place_id)
2021-04-19 08:42:59 +02:00
Channgping Chen
29a314a092 fix index on location_property_tiger (parent_place_id)
Looks like 2af82975cd
accidentally renamed an index. Because of the added "if not
exists" clause, the index doesn't get created. This
significantly slows down reverse queries because they now
require full scans on location_property_tiger.

Without this fix, reverse queries can take 8s on a full
planet install on an r5.8xlarge instance in EC2.
2021-04-19 00:33:15 +00:00
Sarah Hoffmann
e7266b52ae simplify name matching between boundary and place node
Instead of normalising the names simply compare them in lower
case. This removes the dependency on the tokenizer for
linking boundaries and nodes. When looking up the linked places
by place type also allow that one name is simply contained in the
other. This catches the frequent case where one of the names has
an addendum (e.g. Newport vs. City of Newport).

Drops the special index for the name lookup and insted relies
on a slightly extended version of the geometry index used for
reverse lookup. Saves around 100MB on a planet.
2021-04-14 17:52:59 +02:00
Sarah Hoffmann
6cbef84cad use new transliteration in initial housenumber word computation
The new create_housenumber_id() function splits housenumber
lists correctly. Otherwise there is no difference.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
55fcc44c8c correctly handle housenumber lists
Lists are now standardised to use a semicolon separator.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
16a66b5326 move transliteration of housenumbers into indexing
Housenumbers are now saved in transliterated form in the housenumber
column. This saves the transliteration step during lookup.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
0ec3fdd3ba return housenumbers always from address field
This means that we can use normalized versions of the
housenumber in the housenumber field as it is no longer
a user visible field.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
8d8b1d4307 use non-key index to speed up housenumber search
On Postgresql versions 11+ add an index to speed up the lookup
of housenumbers for terms found in search_name. This is really
just a band-aid around the query planer's interpretation of the
query.
2021-04-01 17:10:44 +02:00
Sarah Hoffmann
5dabc0aca8 create postcode id index earlier
Now that the indexer takes care of indexing the postcode tables,
the id index is needed to find the rows to index.
2021-03-22 22:24:56 +01:00
Darkshredder
2af82975cd Ported tiger-data-import to python and Added Tarball Support 2021-03-08 21:57:56 +05:30
Sarah Hoffmann
09f4d767e4 port index creation to python
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
2021-03-04 11:11:47 +01:00
Sarah Hoffmann
eacabb0e96 move table creation to jinja-based preprocessing 2021-03-03 22:07:51 +01:00
Sarah Hoffmann
d2bd6aa78d introduce jinja2 for preprocessing SQL
Replaces various hand-crafted replacements of varying format with
a single Jinja2 templating mechanism. Allows full access to
configuration if necessary.
2021-03-03 17:51:08 +01:00
Sarah Hoffmann
976c5e9121 introduce table for in-database properties
Adds a simple table where settings for the database can be
saved. This is useful for state that must not change after
import.
2021-03-01 16:09:17 +01:00
Sarah Hoffmann
b9517c99ae rename sql directory to lib-sql
Also introduces a separate constant for the sql directory, so that
it can be put separately from the rest of the data if required.
2021-02-09 15:26:56 +01:00