Commit Graph

6 Commits

Author SHA1 Message Date
Kian-Meng Ang
f5e52e748f docs: fix typos 2022-07-20 22:05:31 +08:00
Sarah Hoffmann
c3788d765e add consistent SPDX copyright headers 2022-01-03 16:23:58 +01:00
Sarah Hoffmann
f00b8dd1c3 move special hack for US states to legacy tokenizer
The hack for IL, AL and LA is only needed because these abbreviations
are removed by the legacy tokenizer as a stop word. There is no need
to keep the hack for future tokenizers. Move it therefore to the
token extraction function.
2021-08-17 14:28:55 +02:00
Sarah Hoffmann
1147b83b22 php: make word list a first-class object
This separates the logic of creating word sets from the Phrase
class. A tokenizer may now derived the word sets any way they
like. The SimpleWordList class provides a standard implementation
for splitting phrases on spaces.
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
044bb6afa5 move tokenization in query into tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
db3ced17bb rename lib to lib-php 2021-02-09 11:52:07 +01:00