titsuki
490dc3996a
Enable use strict pragma
2019-09-23 15:40:13 +09:00
Hieu Hoang
fd06cdf026
Merge pull request #212 from moses-smt/alvations-patch-regexes
...
The dot before an acronym should be optional.
2019-09-04 08:07:06 +01:00
alvations
0578892581
The dot before an acronym should be optional.
2019-09-04 14:16:41 +08:00
Hieu Hoang
9f08d77b0d
Merge pull request #211 from achimr/master
...
Support for Urdu in sentence splitter
2019-08-21 22:05:45 +01:00
Achim Ruopp
7ad5ffa0c0
Support for Urdu in sentence splitter
2019-07-10 10:48:32 -04:00
Hieu Hoang
158d252389
tweak readme
2019-06-08 18:22:39 +01:00
Hieu Hoang
c0545019eb
Merge pull request #210 from mjpost/patch-1
...
escape angle brackets
2019-04-27 21:23:50 +01:00
Matt Post
63c450b401
escape angle brackets
...
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.
2019-04-26 14:24:07 -04:00
Hieu Hoang
187a75cb55
Merge pull request #209 from joelb-git/multi-bleu-detok-non-ascii-fix
...
Fix non-ASCII lowercasing
2019-03-01 23:26:12 +00:00
Joel Barry
fdb7384d3d
Fix non-ASCII lowercasing
2019-02-27 10:17:29 -05:00
Hieu Hoang
49b388ac79
check state object are not null before using it. For alternate weights setting where some feature functions are not used for a particular sentence
2019-01-17 14:34:55 +00:00
Hieu Hoang
26940e714a
Revert "use ucfirst instead of defined uppercase function"
...
This reverts commit dfbb17e549
.
2019-01-04 14:55:55 +00:00
Hieu Hoang
7bc56b66f2
Merge pull request #207 from alvations/patch-truecaser
...
Reverting split_xml()
2019-01-03 16:56:56 +00:00
alvations
8fdbc74bbf
Reverting split_xml()
2019-01-03 20:51:27 +08:00
Hieu Hoang
db1894ad24
consistent output
2018-12-30 12:05:57 +00:00
Hieu Hoang
5a600dfe62
Merge pull request #206 from alvations/patch-truecaser
...
Patching truecaser
2018-12-29 20:21:37 +00:00
Hieu Hoang
4b2872fad8
rename file so it appears on github website. Clarify mailing list
2018-12-28 15:15:09 +00:00
alvations
dfbb17e549
use ucfirst instead of defined uppercase function
2018-12-20 11:57:48 +08:00
alvations
40748e528d
split_xml should be consistent for training and using
2018-12-20 11:53:02 +08:00
Hieu Hoang
413ba6b583
increase cores to 16. For bitextor azure pipeline
2018-12-10 16:17:16 +00:00
Hieu Hoang
dd9ff66479
put fix into UnorderedComparer again. Maybe weird template bug
2018-12-10 13:27:57 +00:00
Hieu Hoang
baefaa1b12
fix weird unordered set error on ubuntu 18.04, gcc 7.3.0, boost 1.65. May be over-optimizing or bug in gcc or boost
2018-12-10 13:15:03 +00:00
Hieu Hoang
20edd331bc
debug
2018-12-10 12:29:58 +00:00
Hieu Hoang
c753350641
ems config for moses2
2018-12-08 19:47:10 +00:00
Hieu Hoang
3d4bf99367
sacre bleu
2018-12-04 15:40:00 +00:00
Hieu Hoang
dbbc47292f
sacre bleu
2018-12-04 15:27:09 +00:00
Hieu Hoang
345dabcde6
use --discount_fallback
2018-12-04 14:34:47 +00:00
Hieu Hoang
1591cf3676
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-11-12 14:03:54 +00:00
Hieu Hoang
13e48bc8b4
removing python port. Sacremoses is newer
2018-11-12 14:03:38 +00:00
Hieu Hoang
19a31ca3f1
Merge pull request #205 from coylz/master
...
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-10 22:41:52 +00:00
Loïc Vial
4133726ef9
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-09 22:53:33 +01:00
Hieu Hoang
a2315ffd3a
rename directory to work with python import
2018-11-09 13:01:17 +00:00
Hieu Hoang
a70086c1e6
python wrapper works
2018-11-09 12:58:22 +00:00
Hieu Hoang
2451c46960
start borging Luis Gomes code
2018-11-07 17:12:05 +00:00
Hieu Hoang
2217bc136e
Merge pull request #204 from ozancaglayan/nb-fix
...
tokenizer.perl: split final dots unconditionally
2018-11-07 14:36:41 +00:00
Ozan Caglayan
9fc964da7f
tokenizer.perl: split final dots unconditionally
...
Allow tokenization of non-breaking prefixes at end of sentences. This should
be a fair compromise in many cases to construct a cleaner vocabulary.
EN-old: So am I.
EN-new: So am I .
DE-old: ... schwer wie ein iPhone 5.
DE-new: ... schwer wie ein iPhone 5 .
FR-old: Des gens admirent une œuvre d' art.
FR-new: Des gens admirent une œuvre d' art .
CS-old: Dvě děti, které běží bez bot.
CS-new: Dvě děti, které běží bez bot .
2018-11-07 10:59:54 +01:00
Barry Haddow
d2b558728f
basic support for Gujarati and Hindi, backported from one of the many upstreams
2018-10-30 14:16:16 +00:00
Hieu Hoang
979dd5a403
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-10-26 18:57:07 +02:00
Hieu Hoang
cbee7096bc
bump again
2018-10-26 18:52:27 +02:00
Hieu Hoang
ab161b6f7f
Merge pull request #203 from maxthomas/contrib-modular-boost
...
contrib: make boost variable modular; update version to 1.68.0
2018-10-26 18:49:27 +02:00
Hieu Hoang
4180b932b1
bump
2018-10-26 18:46:26 +02:00
max thomas
c43a84516c
contrib: make boost variable modular; update version to 1.68.0
2018-10-24 21:51:48 -05:00
Hieu Hoang
4dd747e5db
Merge pull request #202 from thuvh/python3_compatible
...
fix print to compatible with python2 and python3
2018-09-27 11:30:54 +01:00
Hoai-Thu Vuong
90c8464c53
fix print to compatible with python2 and python3
2018-09-26 23:17:19 +07:00
Rico Sennrich
411f45f249
multi-bleu-detok should take raw reference
2018-09-26 12:24:07 +01:00
Hieu Hoang
48fa6e92a9
grammar
2018-09-16 14:58:39 +01:00
Hieu Hoang
fd1758ba74
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2018-09-10 18:30:46 +01:00
Hieu Hoang
e760db2d17
unused script
2018-09-10 18:30:36 +01:00
Barry Haddow
06f519d4e2
Handle glottal stops in Somalian
2018-09-06 16:09:36 +01:00
Hieu Hoang
3545225c0b
Merge pull request #201 from louismartin/bleu-fix-newline
...
[BLEU] Fix multi-bleu.perl bug (no newline at end of file)
2018-07-05 14:02:10 +01:00