Commit Graph

17472 Commits

Author SHA1 Message Date
Hieu Hoang
5a600dfe62
Merge pull request #206 from alvations/patch-truecaser
Patching truecaser
2018-12-29 20:21:37 +00:00
Hieu Hoang
4b2872fad8 rename file so it appears on github website. Clarify mailing list 2018-12-28 15:15:09 +00:00
alvations
dfbb17e549 use ucfirst instead of defined uppercase function 2018-12-20 11:57:48 +08:00
alvations
40748e528d split_xml should be consistent for training and using 2018-12-20 11:53:02 +08:00
Hieu Hoang
413ba6b583 increase cores to 16. For bitextor azure pipeline 2018-12-10 16:17:16 +00:00
Hieu Hoang
dd9ff66479 put fix into UnorderedComparer again. Maybe weird template bug 2018-12-10 13:27:57 +00:00
Hieu Hoang
baefaa1b12 fix weird unordered set error on ubuntu 18.04, gcc 7.3.0, boost 1.65. May be over-optimizing or bug in gcc or boost 2018-12-10 13:15:03 +00:00
Hieu Hoang
20edd331bc debug 2018-12-10 12:29:58 +00:00
Hieu Hoang
c753350641 ems config for moses2 2018-12-08 19:47:10 +00:00
Hieu Hoang
3d4bf99367 sacre bleu 2018-12-04 15:40:00 +00:00
Hieu Hoang
dbbc47292f sacre bleu 2018-12-04 15:27:09 +00:00
Hieu Hoang
345dabcde6 use --discount_fallback 2018-12-04 14:34:47 +00:00
Hieu Hoang
1591cf3676 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-11-12 14:03:54 +00:00
Hieu Hoang
13e48bc8b4 removing python port. Sacremoses is newer 2018-11-12 14:03:38 +00:00
Hieu Hoang
19a31ca3f1
Merge pull request #205 from coylz/master
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-10 22:41:52 +00:00
Loïc Vial
4133726ef9 Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
Hieu Hoang
a2315ffd3a rename directory to work with python import 2018-11-09 13:01:17 +00:00
Hieu Hoang
a70086c1e6 python wrapper works 2018-11-09 12:58:22 +00:00
Hieu Hoang
2451c46960 start borging Luis Gomes code 2018-11-07 17:12:05 +00:00
Hieu Hoang
2217bc136e
Merge pull request #204 from ozancaglayan/nb-fix
tokenizer.perl: split final dots unconditionally
2018-11-07 14:36:41 +00:00
Ozan Caglayan
9fc964da7f tokenizer.perl: split final dots unconditionally
Allow tokenization of non-breaking prefixes at end of sentences. This should
be a fair compromise in many cases to construct a cleaner vocabulary.

EN-old: So am I.
EN-new: So am I .

DE-old: ... schwer wie ein iPhone 5.
DE-new: ... schwer wie ein iPhone 5 .

FR-old: Des gens admirent une œuvre d' art.
FR-new: Des gens admirent une œuvre d' art .

CS-old: Dvě děti, které běží bez bot.
CS-new: Dvě děti, které běží bez bot .
2018-11-07 10:59:54 +01:00
Barry Haddow
d2b558728f basic support for Gujarati and Hindi, backported from one of the many upstreams 2018-10-30 14:16:16 +00:00
Hieu Hoang
979dd5a403 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-10-26 18:57:07 +02:00
Hieu Hoang
cbee7096bc bump again 2018-10-26 18:52:27 +02:00
Hieu Hoang
ab161b6f7f
Merge pull request #203 from maxthomas/contrib-modular-boost
contrib: make boost variable modular; update version to 1.68.0
2018-10-26 18:49:27 +02:00
Hieu Hoang
4180b932b1 bump 2018-10-26 18:46:26 +02:00
max thomas
c43a84516c
contrib: make boost variable modular; update version to 1.68.0 2018-10-24 21:51:48 -05:00
Hieu Hoang
4dd747e5db
Merge pull request #202 from thuvh/python3_compatible
fix print to compatible with python2 and python3
2018-09-27 11:30:54 +01:00
Hoai-Thu Vuong
90c8464c53 fix print to compatible with python2 and python3 2018-09-26 23:17:19 +07:00
Rico Sennrich
411f45f249 multi-bleu-detok should take raw reference 2018-09-26 12:24:07 +01:00
Hieu Hoang
48fa6e92a9 grammar 2018-09-16 14:58:39 +01:00
Hieu Hoang
fd1758ba74 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-09-10 18:30:46 +01:00
Hieu Hoang
e760db2d17 unused script 2018-09-10 18:30:36 +01:00
Barry Haddow
06f519d4e2 Handle glottal stops in Somalian 2018-09-06 16:09:36 +01:00
Hieu Hoang
3545225c0b
Merge pull request #201 from louismartin/bleu-fix-newline
[BLEU] Fix multi-bleu.perl bug (no newline at end of file)
2018-07-05 14:02:10 +01:00
Louis MARTIN
53da5f4dbe Fix multi-bleu.perl bug when file does not end with newline
When reading hypothesis and reference files, multi-bleu.perl uses the
chop function to remove the trailing newline character.
If one of these files happens to not end with a newline, then chop will
remove the last character of the last line (instead of the newline).
This causes the BLEU score to be slightly off from its theoretical
value.
Using the safest chomp function solves this problem, i.e. it only
removes newlines when present.
2018-07-03 04:06:09 -06:00
Hieu Hoang
8c5eaa1a12 Merge branch 'RELEASE-4.0' of github.com:jowagner/mosesdecoder 2018-06-25 09:21:50 +01:00
Joachim Wagner
5bbd5ca160
fix syntax error; credit https://www.mail-archive.com/moses-support@mit.edu/msg15226.html 2018-06-23 08:19:36 +01:00
Joachim Wagner
2aa5cd2152
fix syntax error in regular expression 2018-06-22 18:16:11 +01:00
Joachim Wagner
1d675ba956
fix syntax error; credit: https://patchwork.ozlabs.org/patch/735705/ 2018-06-22 16:28:06 +01:00
Hieu Hoang
03578921cc
Merge pull request #198 from tofula/master
EMS: Added missing step for the "TRAINING:build-generation-custom".
2018-05-22 15:08:59 +01:00
Prashant Mathur
b56d97dee0
Merge pull request #199 from mtresearcher/dev-chrf-tuning
Tuning with character FScore
2018-05-21 21:06:06 +02:00
Prashant Mathur
c817980025 Update email 2018-05-18 16:20:47 +02:00
Prashant Mathur
e315438bea Make CHRFscorer compile 2018-05-18 16:18:47 +02:00
Prashant Mathur
fb478bf1db Include chrf as a metric 2018-05-18 16:18:19 +02:00
Prashant Mathur
8b59644945 Adding chrf scorers 2018-05-18 16:16:22 +02:00
Tomas Fulajtar
3a2a63b9dc * Added missing step for the "TRAINING:build-generation-custom".
* Fixed the $cmd parameter - should be "-corpus" instead of "-generation-corpus".
2018-05-18 14:18:11 +02:00
Hieu Hoang
999e83d128
Merge pull request #196 from astronautguo/master
fix bug when copying to cache
2018-05-04 14:42:35 +01:00
Kenneth Heafield
ae47469919 Don't drop last character if file does not end with newline 2018-05-03 10:28:11 +01:00
astro
f47e670f20 fix bug when copying to cache 2018-04-27 19:52:20 -04:00