mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-09-11 11:25:40 +03:00

History

HjalmarrSv fa747062dc Modernized I wanted to properly parse links on https://dumps.wikimedia.org/mirrors.html when page copied as text My proposed changes does the job. Basically I had to change by replacing the + at end of line 5 with *(\/)? The pipe symbol could lead to crashes why I broke up line 5 to three lines. I suggest not using the pipe (\|) after reading various posts.		2019-12-17 20:40:51 +01:00
..
mosestokenizer	rename directory to work with python import	2018-11-09 13:01:17 +00:00
basic-protected-patterns	Modernized	2019-12-17 20:40:51 +01:00
deescape-special-chars-PTB.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
deescape-special-chars.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
delete-long-words.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
detokenizer.perl	Korean words has spaces =)	2018-01-19 13:29:53 +08:00
escape-special-chars.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
lowercase.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
normalize-punctuation.perl	Single quotes should be escaped as single quotes.	2019-11-25 10:10:40 +08:00
pre_tokenize_cleaning.py	Add license notices to scripts.	2015-05-29 18:30:26 +07:00
pre-tok-clean.perl	Add license notices to scripts.	2015-05-29 18:30:26 +07:00
pre-tokenizer.perl	Add license notices to scripts.	2015-05-29 18:30:26 +07:00
remove-non-printing-char.perl	Add option "-b" (unbuffer output) to tokenizer scripts	2018-11-09 22:53:33 +01:00
replace-unicode-punctuation.perl	Update replace-unicode-punctuation.perl	2019-10-14 16:33:58 +08:00
tokenizer_PTB.perl	ga (mostly) behaves more like fr/it	2015-09-23 14:33:18 +01:00
tokenizer.perl	tokenizer.perl: split final dots unconditionally	2018-11-07 10:59:54 +01:00