Commit Graph

14 Commits

Author SHA1 Message Date
Sarah Hoffmann
9963261d8d add type annotations to special phrase importer 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
18b16e06ca add type annotations for legacy tokenizer 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
e37cfc64d2 add type annotations to ICU tokenizer helper modules 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
d0c44431d0 add typing information for place_info and country_info 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
bce93d60bd move PlaceInfo into data submodule
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
344a2bfc1a add new command for cleaning word tokens
Just pulls outdated housenumbers for the moment.
2022-01-20 20:05:15 +01:00
Sarah Hoffmann
c3788d765e add consistent SPDX copyright headers 2022-01-03 16:23:58 +01:00
Sarah Hoffmann
7beccb7997 remove unnecessary pass statements 2021-12-02 15:54:24 +01:00
Sarah Hoffmann
e8e2502e2f make word recount a tokenizer-specific function 2021-10-19 11:21:16 +02:00
Sarah Hoffmann
2a94bfc703 fix argument description for check_database 2021-10-07 09:49:13 +02:00
Sarah Hoffmann
16daa57e47 unify ICUNameProcessorRules and ICURuleLoader
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
5e5addcdbf fix typo 2021-09-29 14:16:09 +02:00
Sarah Hoffmann
231250f2eb add wrapper class for place data passed to tokenizer
This is mostly for convenience and documentation purposes.
2021-09-29 11:54:07 +02:00
Sarah Hoffmann
90b40fc3e6 define formal public Python interface for tokenizer
This introduces an abstract class for the Tokenizer/Analyzer
for documentation purposes.
2021-08-16 11:41:54 +02:00