mosesdecoder/scripts/tokenizer
Jeremy Gwinnup a5fb4d1550 Fixed bug in tokenizer.perl where comma separated lists of single
characters aren't handled correctly

input> A,B,C,D,E,F

yielded> A, B,C , D,E , F

now yields> A, B, C, D, E, F

Updated Russian nonbreaking prefixes list with capital letters
2013-08-16 14:39:50 -04:00
..
deescape-special-chars.perl escape bar character with proper html escape sequence 2012-06-25 23:37:59 +01:00
detokenizer.perl Add option to do Penn Treebank style tokenization 2013-07-24 13:41:21 +01:00
escape-special-chars.perl bug fix 2012-06-26 22:49:59 +01:00
lowercase.perl add lowercaser 2010-08-02 14:05:23 +00:00
normalize-punctuation.perl add normalize-punctuation.perl, from WMT 2013-05-16 17:03:37 +01:00
tokenizer.perl Fixed bug in tokenizer.perl where comma separated lists of single 2013-08-16 14:39:50 -04:00