Commit Graph

3986 Commits

Author SHA1 Message Date
Sarah Hoffmann
a0cd96e05e
Merge pull request #2786 from lonvia/export-centroid-for-tokenizer
Export centroid to tokenizer
2022-08-01 11:38:24 +02:00
Sarah Hoffmann
b19c90b9a6 export centroid to tokenizer
May come in handy when developping sanitizers for an area smaller
than country size.
2022-07-31 22:10:58 +02:00
Sarah Hoffmann
e427712cb0
Merge pull request #2784 from lonvia/doscs-customizing-icu-tokenizer
Document the public API of sanitizers and token analysis modules
2022-07-31 19:15:50 +02:00
Sarah Hoffmann
9864b191b1 fix various typos 2022-07-31 17:10:35 +02:00
Sarah Hoffmann
e7574f119e add simple examples of sanitizers and token analysis 2022-07-29 17:15:25 +02:00
Sarah Hoffmann
51b6d16dc6 overhaul the token analysis interface
The functional split betweenthe two functions is now that the
first one creates the ID that is used in the word table and
the second one creates the variants. There no longer is a
requirement that the ID is the normalized version. We might
later reintroduce the requirement that a normalized version be available
but it doesn't necessarily need to be through the ID.

The function that creates the ID now gets the full PlaceName. That way
it might take into account attributes that were set by the sanitizers.

Finally rename both functions to something more sane.
2022-07-29 15:14:11 +02:00
Sarah Hoffmann
34d27ed45c move PlaceName into the generic data module 2022-07-29 11:42:20 +02:00
Sarah Hoffmann
094100bbf6 harmonize spelling
Stick with the American spelling of Analyze.
2022-07-29 10:52:01 +02:00
Sarah Hoffmann
c8873d34af harmonize interface of token analysis module
The configure() function now receives a Transliterator object instead
of the ICU rules. This harmonizes the parameters with the create
function.
2022-07-29 10:43:07 +02:00
Sarah Hoffmann
f0d640961a add documentation for custom token analysis 2022-07-29 09:41:28 +02:00
Sarah Hoffmann
3746befd88 add documentation for sanitizer interface
Also switches mkdocstrings to 0.18 with the rather unfortunate
consequence that now mkdocstrings-python-legacy is needed as well.
2022-07-28 22:00:29 +02:00
Sarah Hoffmann
a8b037669a
Merge pull request #2780 from lonvia/python-modules-in-project-directory
Support for external sanitizer and token analysis modules
2022-07-28 21:58:04 +02:00
Sarah Hoffmann
d819036daa add support for external token analysis modules 2022-07-25 16:27:22 +02:00
Sarah Hoffmann
6d41046b15 add support for external sanitizer modules 2022-07-25 16:10:19 +02:00
Sarah Hoffmann
7b7203c149 add function for loading plugin modules
Loads modules for configurable code like tokenizers, sanitizers, etc.
Supports internal modules, external libraries and code from the
project directory.
2022-07-25 16:10:10 +02:00
Sarah Hoffmann
95d4061b2a
Merge pull request #2775 from lonvia/remove-centos-instructions
Remove vagrant scripts for CentOS
2022-07-25 10:29:32 +02:00
Sarah Hoffmann
375b57a96a vagrant: remove proj dependency and only require php-cli 2022-07-24 10:24:18 +02:00
Sarah Hoffmann
12ace4329d remove CentOS installation instructions
Fixes #2601.
2022-07-24 10:22:22 +02:00
Sarah Hoffmann
09e0be0e39
Merge pull request #2774 from lonvia/parameter-arrays
Ignore URL parameters in array notation
2022-07-23 23:56:32 +02:00
Sarah Hoffmann
cd4bcea894 ignore API parameters in array notation
PHP automatically parses parameters in an array notation(foo[]) into
array types. Ignore these parameters as 'unknown'.

Fixes #2763.
2022-07-23 10:51:44 +02:00
Sarah Hoffmann
1bee151fe3
Merge pull request #2772 from kianmeng/fix-typos
docs: fix typos
2022-07-20 17:13:30 +02:00
Kian-Meng Ang
f5e52e748f docs: fix typos 2022-07-20 22:05:31 +08:00
Sarah Hoffmann
b7f6c7c76a docs: slightly increase recommended hardware requirements 2022-07-20 10:16:23 +02:00
Sarah Hoffmann
bc7f6209d8
Merge pull request #2770 from lonvia/typed-python
Type annotations for Python code
2022-07-19 09:03:30 +02:00
Sarah Hoffmann
372a548c28 CI: remove installation of pip on Ubuntu 20 2022-07-18 12:19:04 +02:00
Sarah Hoffmann
5aad105c73 add explicit cast for fetchone 2022-07-18 10:18:51 +02:00
Sarah Hoffmann
f40c83d025 CIL use psutil type stubs 2022-07-18 09:55:58 +02:00
Sarah Hoffmann
83054af46f remove typing_extensions requirement
The typing_extensions package is only necessary now when running mypy.
It won't be used at runtime anymore.
2022-07-18 09:55:58 +02:00
Sarah Hoffmann
cb81f11422 CI: make type checking strict 2022-07-18 09:55:58 +02:00
Sarah Hoffmann
a849f3c9ec add type annotations for command line functions 2022-07-18 09:55:54 +02:00
Sarah Hoffmann
25d854dc5c add type annotations for Tiger import function 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
9963261d8d add type annotations to special phrase importer 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
459ab3bbdc add type annotations to database check functions 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
a21d4d3ac4 add type annotations for database import functions 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
4da1f0da6f add type annotations for migrations 2022-07-18 09:54:29 +02:00
Sarah Hoffmann
17bbe2637a add type annotations to tool functions 2022-07-18 09:54:27 +02:00
Sarah Hoffmann
6c6bbe5747 add type annotations for ICU tokenizer 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
18b16e06ca add type annotations for legacy tokenizer 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
e37cfc64d2 add type annotations to ICU tokenizer helper modules 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
77510f4a3b add typing extensions for Ubuntu22.04 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
d35e3c25b6 add type annotations for token analysis
No annotations for ICU types yet.
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
62eedbb8f6 add type hints for sanitizers 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
5617bffe2f add type annotations for indexer 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
8adab2c6ca add typing information for postcode formatter 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
d0c44431d0 add typing information for place_info and country_info 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
282a61ce51 add typing information for utils submodule 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
7a1d22ff15 type annotations for non-blocking DB connection 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
0dff71a410 add type annotations for SQL preprocessor 2022-07-18 09:47:57 +02:00
Sarah Hoffmann
26f30bff28 add type annotation to DB utils
As a cursor is needed as type, make this a public type.
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
e6775e713c add typing information to DB properties 2022-07-18 09:47:57 +02:00