Commit Graph

1818 Commits

Author SHA1 Message Date
Rico Sennrich
4ca730a67c improve bilingualLM alignment heuristics consistency 2014-11-26 10:32:41 +00:00
Rico Sennrich
ee759bfede move bilingual-lm training scripts 2014-11-26 10:32:37 +00:00
Tomáš Musil
4cb81e3093 lmtype now preferred as symbolic name 2014-11-24 12:20:36 +01:00
Hieu Hoang
c0be182bfa makemteval and small change to tokenizer. /Tom Hoar and Tomas Fulajtar 2014-11-21 13:55:13 +00:00
XapaJIaMnu
52c520c042 Resolve merge conflicts 2014-11-20 15:50:32 +00:00
Hieu Hoang
e27f6b0120 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-11-15 14:32:49 +00:00
Hieu Hoang
67ad197d5a take out PYTHONIOENCODING=utf-8. Rely on Rico's python changes 2014-11-15 14:32:31 +00:00
XapaJIaMnu
a343837095 Add option to choose activation function during nplm training 2014-11-15 11:54:47 +00:00
Rico Sennrich
b0b5eef0c6 fix metric interpolation with mert 2014-11-14 14:35:32 +00:00
Hieu Hoang
acd3ac964a set PYTHONIOENCODING=utf-8 before running merge_alignment.py 2014-11-14 14:34:31 +00:00
Hieu Hoang
1c27e05a06 softlink for moses_chart 2014-11-14 13:56:56 +00:00
XapaJIaMnu
d5567b6cfb Training: Do the preparation step ourselves. No validation support yet. No decoder support yet. 2014-11-13 16:14:17 +00:00
Rico Sennrich
8fd3be9e4e add EOS token </s> to each sentence 2014-11-13 16:14:16 +00:00
Rico Sennrich
f26fc251d5 sort vocab by frequency 2014-11-13 16:14:16 +00:00
XapaJIaMnu
bb70f60f67 grrr 2014-11-13 16:14:16 +00:00
XapaJIaMnu
e330ab35d5 Short option must be only one letter 2014-11-13 16:14:16 +00:00
XapaJIaMnu
a74105ea7d Fix a wrong condition 2014-11-13 16:14:16 +00:00
XapaJIaMnu
e54c171850 Make it optional to prepare the validation set 2014-11-13 16:14:16 +00:00
XapaJIaMnu
a300824bd1 Add optional validation during training 2014-11-13 16:14:16 +00:00
XapaJIaMnu
0451142ece Add null token normalization for models to be used with the chart decoder. 2014-11-13 16:13:38 +00:00
XapaJIaMnu
aae894fe6b Add null token in vocabulary during construction 2014-11-13 16:13:38 +00:00
XapaJIaMnu
b4f51c05d1 Add option to reduce the ngrams from already prepared .ngrams file to train a model with smaller number of ngrams 2014-11-13 16:13:38 +00:00
XapaJIaMnu
fbac0ae418 Make sure we always have unk in the vocabulary, otherwise we get off-by-one indexes during decoding 2014-11-13 15:51:48 +00:00
XapaJIaMnu
961578286f Forgot to close a file... 2014-11-13 15:51:48 +00:00
XapaJIaMnu
1bac666e5f Fix small oversights 2014-11-13 15:51:48 +00:00
XapaJIaMnu
617ef015df Extend train_nplm with various options 2014-11-13 15:51:48 +00:00
Nikolay Bogoychev
2b2766cce8 For GPU training one thread is optimal 2014-11-13 15:51:48 +00:00
Abmayne
4af68a0d1a Barry's training scripts with some minor changes by me 2014-11-13 15:51:48 +00:00
Phil Williams
59a1ce7380 substitute-filtered-tables.perl: check for RuleTable feature 2014-11-06 11:14:51 +00:00
Phil Williams
5240c430ce Merge s2t branch
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option.  It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP).  For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.

For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:

  http://www.emnlp2014.org/tutorials/5_notes.pdf

Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
mjdenkowski
40e8f2eca0 Hypergraph output 2014-11-03 09:16:12 -05:00
Hieu Hoang
7ca5e4fbc8 blame stats! 2014-10-31 01:07:33 +00:00
Hieu Hoang
834a89d96b utf8 encoding /Tomas Fulajtar 2014-10-24 07:33:48 -07:00
Rico Sennrich
df74aa3e89 use short names for sparse features to save disk space and I/O when tuning 2014-10-17 10:36:51 +01:00
Hieu Hoang
44ce4b361a reduce lmplz memory consumption in recaser 2014-10-14 17:52:47 +01:00
Hieu Hoang
fe266260fb Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-10-14 16:01:26 +01:00
Hieu Hoang
6c9c3e1741 portable call to bash /Paul Guyot 2014-10-14 16:01:15 +01:00
Philipp Koehn
2638ff0480 added thot to EMS 2014-10-14 10:13:16 -04:00
Phil Williams
07dbd191ed analysis.perl: update regexp for current trace format 2014-10-13 10:55:07 +01:00
mjdenkowski
a1f561ac31 Only update dynamic models 2014-10-10 15:09:53 -04:00
Philipp Koehn
34cc9461fb More Penn Tree Bank compliance (code by Maria Nadejde and Philip Williams 2014-10-10 16:51:32 +01:00
Philipp Koehn
1741bba750 Penn Tree Bank compliant versions of preprocessing 2014-10-10 16:49:06 +01:00
Rico Sennrich
f63807f957 more robust regex 2014-09-30 15:43:38 +01:00
Rico Sennrich
84ad576750 explicitly set BLEU as default scorer (for return-best-dev)
(evaluator doesn't accept --scconfig without --sctype)
2014-09-24 14:47:58 +01:00
Hieu Hoang
610090c2ed don't run truecase trainer unless it's asked for 2014-09-23 21:50:53 +01:00
Rico Sennrich
59cd4be2c9 don't use optimizer-specific options in extractor/evaluator 2014-09-22 10:49:20 +01:00
Rico Sennrich
d39cbca0b9 (optionally) use n-best file for evaluator/return-best-dev
this adds support for metrics that rely on alignment / trees
2014-09-22 10:49:20 +01:00
Rico Sennrich
3d00e5dc8c basic support for more metrics with kbmira
metrics need getReferenceLength (for background smoothing) to work with kbmira
2014-09-22 10:49:20 +01:00
Philipp Koehn
ab90efe4af allow specification of default weights 2014-09-22 05:28:57 +01:00
Philipp Koehn
e9db2fe4aa Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2014-09-21 06:04:22 +01:00