Commit Graph

17288 Commits

Author SHA1 Message Date
Hieu Hoang
b21b071a66
Merge pull request #213 from titsuki/enable-strict
Enable use strict pragma
2019-09-25 01:04:13 +01:00
titsuki
490dc3996a Enable use strict pragma 2019-09-23 15:40:13 +09:00
Hieu Hoang
fd06cdf026
Merge pull request #212 from moses-smt/alvations-patch-regexes
The dot before an acronym should be optional.
2019-09-04 08:07:06 +01:00
alvations
0578892581
The dot before an acronym should be optional. 2019-09-04 14:16:41 +08:00
Hieu Hoang
9f08d77b0d
Merge pull request #211 from achimr/master
Support for Urdu in sentence splitter
2019-08-21 22:05:45 +01:00
Achim Ruopp
7ad5ffa0c0 Support for Urdu in sentence splitter 2019-07-10 10:48:32 -04:00
Hieu Hoang
158d252389 tweak readme 2019-06-08 18:22:39 +01:00
Hieu Hoang
c0545019eb
Merge pull request #210 from mjpost/patch-1
escape angle brackets
2019-04-27 21:23:50 +01:00
Matt Post
63c450b401
escape angle brackets
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.
2019-04-26 14:24:07 -04:00
Hieu Hoang
187a75cb55
Merge pull request #209 from joelb-git/multi-bleu-detok-non-ascii-fix
Fix non-ASCII lowercasing
2019-03-01 23:26:12 +00:00
Joel Barry
fdb7384d3d Fix non-ASCII lowercasing 2019-02-27 10:17:29 -05:00
Hieu Hoang
49b388ac79 check state object are not null before using it. For alternate weights setting where some feature functions are not used for a particular sentence 2019-01-17 14:34:55 +00:00
Hieu Hoang
26940e714a Revert "use ucfirst instead of defined uppercase function"
This reverts commit dfbb17e549.
2019-01-04 14:55:55 +00:00
Hieu Hoang
7bc56b66f2
Merge pull request #207 from alvations/patch-truecaser
Reverting split_xml()
2019-01-03 16:56:56 +00:00
alvations
8fdbc74bbf
Reverting split_xml() 2019-01-03 20:51:27 +08:00
Hieu Hoang
db1894ad24 consistent output 2018-12-30 12:05:57 +00:00
Hieu Hoang
5a600dfe62
Merge pull request #206 from alvations/patch-truecaser
Patching truecaser
2018-12-29 20:21:37 +00:00
Hieu Hoang
4b2872fad8 rename file so it appears on github website. Clarify mailing list 2018-12-28 15:15:09 +00:00
alvations
dfbb17e549 use ucfirst instead of defined uppercase function 2018-12-20 11:57:48 +08:00
alvations
40748e528d split_xml should be consistent for training and using 2018-12-20 11:53:02 +08:00
Hieu Hoang
413ba6b583 increase cores to 16. For bitextor azure pipeline 2018-12-10 16:17:16 +00:00
Hieu Hoang
dd9ff66479 put fix into UnorderedComparer again. Maybe weird template bug 2018-12-10 13:27:57 +00:00
Hieu Hoang
baefaa1b12 fix weird unordered set error on ubuntu 18.04, gcc 7.3.0, boost 1.65. May be over-optimizing or bug in gcc or boost 2018-12-10 13:15:03 +00:00
Hieu Hoang
20edd331bc debug 2018-12-10 12:29:58 +00:00
Hieu Hoang
c753350641 ems config for moses2 2018-12-08 19:47:10 +00:00
Hieu Hoang
3d4bf99367 sacre bleu 2018-12-04 15:40:00 +00:00
Hieu Hoang
dbbc47292f sacre bleu 2018-12-04 15:27:09 +00:00
Hieu Hoang
345dabcde6 use --discount_fallback 2018-12-04 14:34:47 +00:00
Hieu Hoang
1591cf3676 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-11-12 14:03:54 +00:00
Hieu Hoang
13e48bc8b4 removing python port. Sacremoses is newer 2018-11-12 14:03:38 +00:00
Hieu Hoang
19a31ca3f1
Merge pull request #205 from coylz/master
Add option "-b" (unbuffer output) to tokenizer scripts
2018-11-10 22:41:52 +00:00
Loïc Vial
4133726ef9 Add option "-b" (unbuffer output) to tokenizer scripts 2018-11-09 22:53:33 +01:00
Hieu Hoang
a2315ffd3a rename directory to work with python import 2018-11-09 13:01:17 +00:00
Hieu Hoang
a70086c1e6 python wrapper works 2018-11-09 12:58:22 +00:00
Hieu Hoang
2451c46960 start borging Luis Gomes code 2018-11-07 17:12:05 +00:00
Hieu Hoang
2217bc136e
Merge pull request #204 from ozancaglayan/nb-fix
tokenizer.perl: split final dots unconditionally
2018-11-07 14:36:41 +00:00
Ozan Caglayan
9fc964da7f tokenizer.perl: split final dots unconditionally
Allow tokenization of non-breaking prefixes at end of sentences. This should
be a fair compromise in many cases to construct a cleaner vocabulary.

EN-old: So am I.
EN-new: So am I .

DE-old: ... schwer wie ein iPhone 5.
DE-new: ... schwer wie ein iPhone 5 .

FR-old: Des gens admirent une œuvre d' art.
FR-new: Des gens admirent une œuvre d' art .

CS-old: Dvě děti, které běží bez bot.
CS-new: Dvě děti, které běží bez bot .
2018-11-07 10:59:54 +01:00
Barry Haddow
d2b558728f basic support for Gujarati and Hindi, backported from one of the many upstreams 2018-10-30 14:16:16 +00:00
Hieu Hoang
979dd5a403 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-10-26 18:57:07 +02:00
Hieu Hoang
cbee7096bc bump again 2018-10-26 18:52:27 +02:00
Hieu Hoang
ab161b6f7f
Merge pull request #203 from maxthomas/contrib-modular-boost
contrib: make boost variable modular; update version to 1.68.0
2018-10-26 18:49:27 +02:00
Hieu Hoang
4180b932b1 bump 2018-10-26 18:46:26 +02:00
max thomas
c43a84516c
contrib: make boost variable modular; update version to 1.68.0 2018-10-24 21:51:48 -05:00
Hieu Hoang
4dd747e5db
Merge pull request #202 from thuvh/python3_compatible
fix print to compatible with python2 and python3
2018-09-27 11:30:54 +01:00
Hoai-Thu Vuong
90c8464c53 fix print to compatible with python2 and python3 2018-09-26 23:17:19 +07:00
Rico Sennrich
411f45f249 multi-bleu-detok should take raw reference 2018-09-26 12:24:07 +01:00
Hieu Hoang
48fa6e92a9 grammar 2018-09-16 14:58:39 +01:00
Hieu Hoang
fd1758ba74 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2018-09-10 18:30:46 +01:00
Hieu Hoang
e760db2d17 unused script 2018-09-10 18:30:36 +01:00
Barry Haddow
06f519d4e2 Handle glottal stops in Somalian 2018-09-06 16:09:36 +01:00