mosesdecoder/scripts/tokenizer
Cristina España i Bonet 8d78dae634
adding rules for Catalan
special characters within words and contractions closer to French than to English
2020-07-31 15:22:47 +02:00
..
mosestokenizer rename directory to work with python import 2018-11-09 13:01:17 +00:00
basic-protected-patterns Modernized 2019-12-17 20:40:51 +01:00
deescape-special-chars-PTB.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
deescape-special-chars.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
delete-long-words.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
detokenizer.perl Revert "line buffering for tokeniser and truecaser" 2020-02-20 09:52:08 +00:00
escape-special-chars.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
lowercase.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
normalize-punctuation.perl Single quotes should be escaped as single quotes. 2019-11-25 10:10:40 +08:00
pre_tokenize_cleaning.py Add license notices to scripts. 2015-05-29 18:30:26 +07:00
pre-tok-clean.perl Add license notices to scripts. 2015-05-29 18:30:26 +07:00
pre-tokenizer.perl Add license notices to scripts. 2015-05-29 18:30:26 +07:00
remove-non-printing-char.perl Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
replace-unicode-punctuation.perl Update replace-unicode-punctuation.perl 2019-10-14 16:33:58 +08:00
tokenizer_PTB.perl ga (mostly) behaves more like fr/it 2015-09-23 14:33:18 +01:00
tokenizer.perl adding rules for Catalan 2020-07-31 15:22:47 +02:00