Commit Graph

1032 Commits

Author SHA1 Message Date
Hieu Hoang
e27f6b0120 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-11-15 14:32:49 +00:00
Hieu Hoang
67ad197d5a take out PYTHONIOENCODING=utf-8. Rely on Rico's python changes 2014-11-15 14:32:31 +00:00
Rico Sennrich
b0b5eef0c6 fix metric interpolation with mert 2014-11-14 14:35:32 +00:00
Hieu Hoang
acd3ac964a set PYTHONIOENCODING=utf-8 before running merge_alignment.py 2014-11-14 14:34:31 +00:00
Phil Williams
5240c430ce Merge s2t branch
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option.  It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP).  For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.

For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:

  http://www.emnlp2014.org/tutorials/5_notes.pdf

Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
Hieu Hoang
834a89d96b utf8 encoding /Tomas Fulajtar 2014-10-24 07:33:48 -07:00
Hieu Hoang
6c9c3e1741 portable call to bash /Paul Guyot 2014-10-14 16:01:15 +01:00
Philipp Koehn
34cc9461fb More Penn Tree Bank compliance (code by Maria Nadejde and Philip Williams 2014-10-10 16:51:32 +01:00
Rico Sennrich
f63807f957 more robust regex 2014-09-30 15:43:38 +01:00
Rico Sennrich
84ad576750 explicitly set BLEU as default scorer (for return-best-dev)
(evaluator doesn't accept --scconfig without --sctype)
2014-09-24 14:47:58 +01:00
Rico Sennrich
59cd4be2c9 don't use optimizer-specific options in extractor/evaluator 2014-09-22 10:49:20 +01:00
Rico Sennrich
d39cbca0b9 (optionally) use n-best file for evaluator/return-best-dev
this adds support for metrics that rely on alignment / trees
2014-09-22 10:49:20 +01:00
Rico Sennrich
3d00e5dc8c basic support for more metrics with kbmira
metrics need getReferenceLength (for background smoothing) to work with kbmira
2014-09-22 10:49:20 +01:00
Philipp Koehn
ab90efe4af allow specification of default weights 2014-09-22 05:28:57 +01:00
Philipp Koehn
e9db2fe4aa Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2014-09-21 06:04:22 +01:00
Philipp Koehn
3740c9f248 bug fix mmsapt training 2014-09-21 06:02:35 +01:00
Rico Sennrich
0861b464c5 use square brackets with output format '--brackets' (for cleaner escaping and consistency with decoder tree output) 2014-09-15 14:37:52 +01:00
Rico Sennrich
4da0ffc926 use configured scorer (and not always BLEU) for --return-best-dev 2014-09-12 19:17:23 +02:00
Michael Denkowski
057066ea0e Minor fixes for simulated post-editing with mert-moses.pl 2014-08-13 15:58:51 -04:00
Hieu Hoang
94c44c03d5 merge 2014-08-13 18:03:05 +01:00
Matthias Huck
c27cbf55ea source labels: integration into EMS 2014-08-07 21:02:51 +01:00
Michael Denkowski
e7c36ee804 Simulated post-editing merge: XML update, parallel SPE script, MERT 2014-08-05 14:20:00 -04:00
Matthias Huck
3a5dee12e8 implementation of phrase orientation in GHKM extraction
(...but a corresponding feature function for the chart-based decoder has not been written yet)
2014-07-28 18:27:12 +01:00
Barry Haddow
bfb5cca518 Merge branch 'master' of github.com:moses-smt/mosesdecoder
Conflicts:
	util/read_compressed.cc
2014-07-23 09:40:55 +01:00
Philipp Koehn
55ae15a6f8 integration of Uli Germann's memory mapped suffix array phrase table into EMS 2014-07-22 10:12:14 -04:00
Barry Haddow
2a611194a2 reinstate new kbmira args 2014-07-21 11:43:37 +01:00
Hieu Hoang
a3bd695cd4 factor for oov is 0, not <unk> - interferes with source input. Add extra argument to lowercase input words or not 2014-07-13 02:54:58 +01:00
Hieu Hoang
da84cce8c4 Merge github.com:moses-smt/mosesdecoder into hieu 2014-06-09 16:20:29 +01:00
Rico Sennrich
169c3fce38 convert CoNNL-X to Moses XML format 2014-06-09 15:24:41 +01:00
Hieu Hoang
091ce3f016 Merge ../mosesdecoder into hieu 2014-06-06 17:25:26 +01:00
Philipp Koehn
fc8e588f25 kbmira bug fix & factor handling 2014-06-06 14:20:57 +01:00
phikoehn
ac7670c5e7 minor bugs with factors 2014-06-06 14:14:35 +01:00
Hieu Hoang
b589c3d5c2 Merge ../mosesdecoder into hieu 2014-06-06 11:11:47 +01:00
phikoehn
7fc3ccd968 Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-06-05 21:37:20 +01:00
Philipp Koehn
243004bda6 utf8 compatible 2014-06-05 21:36:18 +01:00
phikoehn
ceadacd3af Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-06-05 21:33:35 +01:00
Philipp Koehn
15288213be allow < > in factors 2014-06-05 21:31:09 +01:00
Hieu Hoang
a17a45fa7f span length 2014-06-05 17:20:38 +01:00
Hieu Hoang
ce2a69ba25 Merge ../mosesdecoder into hieu 2014-06-05 17:18:26 +01:00
Kenneth Heafield
d82bd475a2 Nadir Durrani asked me to add this script 2014-06-04 11:27:36 -07:00
Hieu Hoang
8065bf7467 add span length as a score option to train-model.perl 2014-06-04 18:06:06 +01:00
Hieu Hoang
a270811d84 Merge ../mosesdecoder into hieu 2014-05-30 09:31:14 +01:00
Barry Haddow
00b1d83841 Remove debug flag 2014-05-27 08:55:05 +01:00
Barry Haddow
66b5d2f3fd parse hgmira output correctly 2014-05-26 11:03:28 +01:00
Barry Haddow
6c31fbb2a4 Support for hypergraph mira 2014-05-22 21:20:14 +01:00
Hieu Hoang
def22cef44 Merge branch 'hieu' of github.com:hieuhoang/mosesdecoder into hieu 2014-04-18 18:11:56 +01:00
Hieu Hoang
8d69327eb1 merge 2014-04-17 20:22:40 +01:00
Nadir Durrani
5e3e50d4ec In-Decoding Transliteration Module 2014-04-16 17:28:49 +01:00
Hieu Hoang
f2d3052627 exit on error 2014-04-04 15:30:48 +01:00
Hieu Hoang
9d09b4a6e6 bug in converting chunker output to xml. Didn't handle chunks that crossed sentence boundaries properly 2014-04-04 13:45:18 +01:00