Matt Post
63c450b401
escape angle brackets
...
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.
2019-04-26 14:24:07 -04:00
Hieu Hoang
187a75cb55
Merge pull request #209 from joelb-git/multi-bleu-detok-non-ascii-fix
...
Fix non-ASCII lowercasing
2019-03-01 23:26:12 +00:00
Joel Barry
fdb7384d3d
Fix non-ASCII lowercasing
2019-02-27 10:17:29 -05:00
Hieu Hoang
49b388ac79
check state object are not null before using it. For alternate weights setting where some feature functions are not used for a particular sentence
2019-01-17 14:34:55 +00:00
Hieu Hoang
26940e714a
Revert "use ucfirst instead of defined uppercase function"
...
This reverts commit dfbb17e549
.
2019-01-04 14:55:55 +00:00
Hieu Hoang
7bc56b66f2
Merge pull request #207 from alvations/patch-truecaser
...
Reverting split_xml()
2019-01-03 16:56:56 +00:00
alvations
8fdbc74bbf
Reverting split_xml()
2019-01-03 20:51:27 +08:00
Hieu Hoang
db1894ad24
consistent output
2018-12-30 12:05:57 +00:00
Hieu Hoang
5a600dfe62
Merge pull request #206 from alvations/patch-truecaser
...
Patching truecaser
2018-12-29 20:21:37 +00:00
Hieu Hoang
4b2872fad8
rename file so it appears on github website. Clarify mailing list
2018-12-28 15:15:09 +00:00
alvations
dfbb17e549
use ucfirst instead of defined uppercase function
2018-12-20 11:57:48 +08:00
alvations
40748e528d
split_xml should be consistent for training and using
2018-12-20 11:53:02 +08:00
Hieu Hoang
413ba6b583
increase cores to 16. For bitextor azure pipeline
2018-12-10 16:17:16 +00:00
Hieu Hoang
dd9ff66479
put fix into UnorderedComparer again. Maybe weird template bug
2018-12-10 13:27:57 +00:00
Hieu Hoang
baefaa1b12
fix weird unordered set error on ubuntu 18.04, gcc 7.3.0, boost 1.65. May be over-optimizing or bug in gcc or boost
2018-12-10 13:15:03 +00:00
Hieu Hoang
20edd331bc
debug
2018-12-10 12:29:58 +00:00
Hieu Hoang
c753350641
ems config for moses2
2018-12-08 19:47:10 +00:00
Hieu Hoang
3d4bf99367
sacre bleu
2018-12-04 15:40:00 +00:00
Hieu Hoang
dbbc47292f
sacre bleu
2018-12-04 15:27:09 +00:00
Hieu Hoang
345dabcde6
use --discount_fallback
2018-12-04 14:34:47 +00:00
Hieu Hoang
1591cf3676
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-11-12 14:03:54 +00:00
Hieu Hoang
13e48bc8b4
removing python port. Sacremoses is newer
2018-11-12 14:03:38 +00:00
Hieu Hoang
19a31ca3f1
Merge pull request #205 from coylz/master
...
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-10 22:41:52 +00:00
Loïc Vial
4133726ef9
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-09 22:53:33 +01:00
Hieu Hoang
a2315ffd3a
rename directory to work with python import
2018-11-09 13:01:17 +00:00
Hieu Hoang
a70086c1e6
python wrapper works
2018-11-09 12:58:22 +00:00
Hieu Hoang
2451c46960
start borging Luis Gomes code
2018-11-07 17:12:05 +00:00
Hieu Hoang
2217bc136e
Merge pull request #204 from ozancaglayan/nb-fix
...
tokenizer.perl: split final dots unconditionally
2018-11-07 14:36:41 +00:00
Ozan Caglayan
9fc964da7f
tokenizer.perl: split final dots unconditionally
...
Allow tokenization of non-breaking prefixes at end of sentences. This should
be a fair compromise in many cases to construct a cleaner vocabulary.
EN-old: So am I.
EN-new: So am I .
DE-old: ... schwer wie ein iPhone 5.
DE-new: ... schwer wie ein iPhone 5 .
FR-old: Des gens admirent une œuvre d' art.
FR-new: Des gens admirent une œuvre d' art .
CS-old: Dvě děti, které běží bez bot.
CS-new: Dvě děti, které běží bez bot .
2018-11-07 10:59:54 +01:00
Barry Haddow
d2b558728f
basic support for Gujarati and Hindi, backported from one of the many upstreams
2018-10-30 14:16:16 +00:00
Hieu Hoang
979dd5a403
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-10-26 18:57:07 +02:00
Hieu Hoang
cbee7096bc
bump again
2018-10-26 18:52:27 +02:00
Hieu Hoang
ab161b6f7f
Merge pull request #203 from maxthomas/contrib-modular-boost
...
contrib: make boost variable modular; update version to 1.68.0
2018-10-26 18:49:27 +02:00
Hieu Hoang
4180b932b1
bump
2018-10-26 18:46:26 +02:00
max thomas
c43a84516c
contrib: make boost variable modular; update version to 1.68.0
2018-10-24 21:51:48 -05:00
Hieu Hoang
4dd747e5db
Merge pull request #202 from thuvh/python3_compatible
...
fix print to compatible with python2 and python3
2018-09-27 11:30:54 +01:00
Hoai-Thu Vuong
90c8464c53
fix print to compatible with python2 and python3
2018-09-26 23:17:19 +07:00
Rico Sennrich
411f45f249
multi-bleu-detok should take raw reference
2018-09-26 12:24:07 +01:00
Hieu Hoang
48fa6e92a9
grammar
2018-09-16 14:58:39 +01:00
Hieu Hoang
fd1758ba74
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-09-10 18:30:46 +01:00
Hieu Hoang
e760db2d17
unused script
2018-09-10 18:30:36 +01:00
Barry Haddow
06f519d4e2
Handle glottal stops in Somalian
2018-09-06 16:09:36 +01:00
Hieu Hoang
3545225c0b
Merge pull request #201 from louismartin/bleu-fix-newline
...
[BLEU] Fix multi-bleu.perl bug (no newline at end of file)
2018-07-05 14:02:10 +01:00
Louis MARTIN
53da5f4dbe
Fix multi-bleu.perl bug when file does not end with newline
...
When reading hypothesis and reference files, multi-bleu.perl uses the
chop function to remove the trailing newline character.
If one of these files happens to not end with a newline, then chop will
remove the last character of the last line (instead of the newline).
This causes the BLEU score to be slightly off from its theoretical
value.
Using the safest chomp function solves this problem, i.e. it only
removes newlines when present.
2018-07-03 04:06:09 -06:00
Hieu Hoang
8c5eaa1a12
Merge branch 'RELEASE-4.0' of github.com:jowagner/mosesdecoder
2018-06-25 09:21:50 +01:00
Joachim Wagner
5bbd5ca160
fix syntax error; credit https://www.mail-archive.com/moses-support@mit.edu/msg15226.html
2018-06-23 08:19:36 +01:00
Joachim Wagner
2aa5cd2152
fix syntax error in regular expression
2018-06-22 18:16:11 +01:00
Joachim Wagner
1d675ba956
fix syntax error; credit: https://patchwork.ozlabs.org/patch/735705/
2018-06-22 16:28:06 +01:00
Hieu Hoang
03578921cc
Merge pull request #198 from tofula/master
...
EMS: Added missing step for the "TRAINING:build-generation-custom".
2018-05-22 15:08:59 +01:00
Prashant Mathur
b56d97dee0
Merge pull request #199 from mtresearcher/dev-chrf-tuning
...
Tuning with character FScore
2018-05-21 21:06:06 +02:00