Hieu Hoang
7bc56b66f2
Merge pull request #207 from alvations/patch-truecaser
...
Reverting split_xml()
2019-01-03 16:56:56 +00:00
alvations
8fdbc74bbf
Reverting split_xml()
2019-01-03 20:51:27 +08:00
Hieu Hoang
db1894ad24
consistent output
2018-12-30 12:05:57 +00:00
alvations
dfbb17e549
use ucfirst instead of defined uppercase function
2018-12-20 11:57:48 +08:00
alvations
40748e528d
split_xml should be consistent for training and using
2018-12-20 11:53:02 +08:00
Hieu Hoang
413ba6b583
increase cores to 16. For bitextor azure pipeline
2018-12-10 16:17:16 +00:00
Hieu Hoang
c753350641
ems config for moses2
2018-12-08 19:47:10 +00:00
Hieu Hoang
3d4bf99367
sacre bleu
2018-12-04 15:40:00 +00:00
Hieu Hoang
dbbc47292f
sacre bleu
2018-12-04 15:27:09 +00:00
Hieu Hoang
345dabcde6
use --discount_fallback
2018-12-04 14:34:47 +00:00
Hieu Hoang
1591cf3676
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-11-12 14:03:54 +00:00
Hieu Hoang
13e48bc8b4
removing python port. Sacremoses is newer
2018-11-12 14:03:38 +00:00
Loïc Vial
4133726ef9
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-09 22:53:33 +01:00
Hieu Hoang
a2315ffd3a
rename directory to work with python import
2018-11-09 13:01:17 +00:00
Hieu Hoang
a70086c1e6
python wrapper works
2018-11-09 12:58:22 +00:00
Hieu Hoang
2451c46960
start borging Luis Gomes code
2018-11-07 17:12:05 +00:00
Ozan Caglayan
9fc964da7f
tokenizer.perl: split final dots unconditionally
...
Allow tokenization of non-breaking prefixes at end of sentences. This should
be a fair compromise in many cases to construct a cleaner vocabulary.
EN-old: So am I.
EN-new: So am I .
DE-old: ... schwer wie ein iPhone 5.
DE-new: ... schwer wie ein iPhone 5 .
FR-old: Des gens admirent une œuvre d' art.
FR-new: Des gens admirent une œuvre d' art .
CS-old: Dvě děti, které běží bez bot.
CS-new: Dvě děti, které běží bez bot .
2018-11-07 10:59:54 +01:00
Barry Haddow
d2b558728f
basic support for Gujarati and Hindi, backported from one of the many upstreams
2018-10-30 14:16:16 +00:00
Rico Sennrich
411f45f249
multi-bleu-detok should take raw reference
2018-09-26 12:24:07 +01:00
Hieu Hoang
48fa6e92a9
grammar
2018-09-16 14:58:39 +01:00
Hieu Hoang
fd1758ba74
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-09-10 18:30:46 +01:00
Hieu Hoang
e760db2d17
unused script
2018-09-10 18:30:36 +01:00
Barry Haddow
06f519d4e2
Handle glottal stops in Somalian
2018-09-06 16:09:36 +01:00
Louis MARTIN
53da5f4dbe
Fix multi-bleu.perl bug when file does not end with newline
...
When reading hypothesis and reference files, multi-bleu.perl uses the
chop function to remove the trailing newline character.
If one of these files happens to not end with a newline, then chop will
remove the last character of the last line (instead of the newline).
This causes the BLEU score to be slightly off from its theoretical
value.
Using the safest chomp function solves this problem, i.e. it only
removes newlines when present.
2018-07-03 04:06:09 -06:00
Joachim Wagner
5bbd5ca160
fix syntax error; credit https://www.mail-archive.com/moses-support@mit.edu/msg15226.html
2018-06-23 08:19:36 +01:00
Joachim Wagner
2aa5cd2152
fix syntax error in regular expression
2018-06-22 18:16:11 +01:00
Tomas Fulajtar
3a2a63b9dc
* Added missing step for the "TRAINING:build-generation-custom".
...
* Fixed the $cmd parameter - should be "-corpus" instead of "-generation-corpus".
2018-05-18 14:18:11 +02:00
Hieu Hoang
999e83d128
Merge pull request #196 from astronautguo/master
...
fix bug when copying to cache
2018-05-04 14:42:35 +01:00
Kenneth Heafield
ae47469919
Don't drop last character if file does not end with newline
2018-05-03 10:28:11 +01:00
astro
f47e670f20
fix bug when copying to cache
2018-04-27 19:52:20 -04:00
alvations
686034488a
Contributing MosesTokenizer from NLTK to Moses
2018-04-11 00:27:37 +08:00
Scherrer Yves
4a7f16b366
add fi/sv-specific colon handling in tokenizer.perl
2018-02-14 10:27:46 +02:00
alvations
194964c017
Korean words has spaces =)
2018-01-19 13:29:53 +08:00
Hieu Hoang
3a0631a05b
better default
2017-12-12 15:30:56 +00:00
Tomas Fulajtar
5b9a6da9a4
The .gz extension should be also added for 'On Disk' and 'Probing' Phrase tables.
2017-11-28 10:29:58 +01:00
Rico Sennrich
7e9108dd29
multi-bleu-detok.perl - a plain text alternative to mteval-v13a.perl
2017-10-20 10:08:22 +01:00
Hieu Hoang
05a37d218e
wording change
2017-10-19 23:31:56 +01:00
Kenneth Heafield
545eee7e75
Attempt to stop people from publishing non-comparable BLEU scores, as discussed in statmt meeting
2017-10-19 22:57:36 +01:00
Jörg Tiedemann
23cf6c4d1f
new option for mert-moses: transform-decoded-file
2017-08-15 17:11:46 +03:00
Hieu Hoang
8aa8988320
executable
2017-07-28 22:58:54 +01:00
Hieu Hoang
b8de7c3528
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2017-05-02 10:59:02 +01:00
Hieu Hoang
101e52da60
check for executables before running
2017-05-02 10:57:00 +01:00
Hieu Hoang
b199e654df
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2017-04-27 13:48:28 +01:00
Hieu Hoang
2ea75d91dc
add new mteval script
2017-04-27 13:48:18 +01:00
Rico Sennrich
ae476ae531
fix rdlm training - extra-settings was missing
2017-04-24 15:30:17 +01:00
Rico Sennrich
61f5b49dee
fix rdlm training - train_host option was missing
2017-04-24 13:29:58 +01:00
Rico Sennrich
b99af32113
fix split-input if it is passed, but if output-splitter is defined
2017-04-24 12:16:36 +01:00
alvations
793e64b7d5
removed redundant subdirectory in path
2017-04-12 10:15:18 +08:00
alvations
66cbf46e27
use static path to compile.sh
2017-04-12 10:03:25 +08:00
alvations
9f246cef89
added Dockerfiles for Moses
2017-04-12 09:52:31 +08:00