.. |
mosestokenizer
|
rename directory to work with python import
|
2018-11-09 13:01:17 +00:00 |
basic-protected-patterns
|
makemteval and small change to tokenizer. /Tom Hoar and Tomas Fulajtar
|
2014-11-21 13:55:13 +00:00 |
deescape-special-chars-PTB.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
deescape-special-chars.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
delete-long-words.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
detokenizer.perl
|
Korean words has spaces =)
|
2018-01-19 13:29:53 +08:00 |
escape-special-chars.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
lowercase.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
normalize-punctuation.perl
|
Add license notices to scripts.
|
2015-05-29 18:30:26 +07:00 |
pre_tokenize_cleaning.py
|
Add license notices to scripts.
|
2015-05-29 18:30:26 +07:00 |
pre-tok-clean.perl
|
Add license notices to scripts.
|
2015-05-29 18:30:26 +07:00 |
pre-tokenizer.perl
|
Add license notices to scripts.
|
2015-05-29 18:30:26 +07:00 |
remove-non-printing-char.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
replace-unicode-punctuation.perl
|
Add option "-b" (unbuffer output) to tokenizer scripts
|
2018-11-09 22:53:33 +01:00 |
tokenizer_PTB.perl
|
ga (mostly) behaves more like fr/it
|
2015-09-23 14:33:18 +01:00 |
tokenizer.perl
|
tokenizer.perl: split final dots unconditionally
|
2018-11-07 10:59:54 +01:00 |