Rico Sennrich
4ca730a67c
improve bilingualLM alignment heuristics consistency
2014-11-26 10:32:41 +00:00
Rico Sennrich
ee759bfede
move bilingual-lm training scripts
2014-11-26 10:32:37 +00:00
Tomáš Musil
4cb81e3093
lmtype now preferred as symbolic name
2014-11-24 12:20:36 +01:00
Hieu Hoang
c0be182bfa
makemteval and small change to tokenizer. /Tom Hoar and Tomas Fulajtar
2014-11-21 13:55:13 +00:00
XapaJIaMnu
52c520c042
Resolve merge conflicts
2014-11-20 15:50:32 +00:00
Hieu Hoang
e27f6b0120
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2014-11-15 14:32:49 +00:00
Hieu Hoang
67ad197d5a
take out PYTHONIOENCODING=utf-8. Rely on Rico's python changes
2014-11-15 14:32:31 +00:00
XapaJIaMnu
a343837095
Add option to choose activation function during nplm training
2014-11-15 11:54:47 +00:00
Rico Sennrich
b0b5eef0c6
fix metric interpolation with mert
2014-11-14 14:35:32 +00:00
Hieu Hoang
acd3ac964a
set PYTHONIOENCODING=utf-8 before running merge_alignment.py
2014-11-14 14:34:31 +00:00
Hieu Hoang
1c27e05a06
softlink for moses_chart
2014-11-14 13:56:56 +00:00
XapaJIaMnu
d5567b6cfb
Training: Do the preparation step ourselves. No validation support yet. No decoder support yet.
2014-11-13 16:14:17 +00:00
Rico Sennrich
8fd3be9e4e
add EOS token </s> to each sentence
2014-11-13 16:14:16 +00:00
Rico Sennrich
f26fc251d5
sort vocab by frequency
2014-11-13 16:14:16 +00:00
XapaJIaMnu
bb70f60f67
grrr
2014-11-13 16:14:16 +00:00
XapaJIaMnu
e330ab35d5
Short option must be only one letter
2014-11-13 16:14:16 +00:00
XapaJIaMnu
a74105ea7d
Fix a wrong condition
2014-11-13 16:14:16 +00:00
XapaJIaMnu
e54c171850
Make it optional to prepare the validation set
2014-11-13 16:14:16 +00:00
XapaJIaMnu
a300824bd1
Add optional validation during training
2014-11-13 16:14:16 +00:00
XapaJIaMnu
0451142ece
Add null token normalization for models to be used with the chart decoder.
2014-11-13 16:13:38 +00:00
XapaJIaMnu
aae894fe6b
Add null token in vocabulary during construction
2014-11-13 16:13:38 +00:00
XapaJIaMnu
b4f51c05d1
Add option to reduce the ngrams from already prepared .ngrams file to train a model with smaller number of ngrams
2014-11-13 16:13:38 +00:00
XapaJIaMnu
fbac0ae418
Make sure we always have unk in the vocabulary, otherwise we get off-by-one indexes during decoding
2014-11-13 15:51:48 +00:00
XapaJIaMnu
961578286f
Forgot to close a file...
2014-11-13 15:51:48 +00:00
XapaJIaMnu
1bac666e5f
Fix small oversights
2014-11-13 15:51:48 +00:00
XapaJIaMnu
617ef015df
Extend train_nplm with various options
2014-11-13 15:51:48 +00:00
Nikolay Bogoychev
2b2766cce8
For GPU training one thread is optimal
2014-11-13 15:51:48 +00:00
Abmayne
4af68a0d1a
Barry's training scripts with some minor changes by me
2014-11-13 15:51:48 +00:00
Phil Williams
59a1ce7380
substitute-filtered-tables.perl: check for RuleTable feature
2014-11-06 11:14:51 +00:00
Phil Williams
5240c430ce
Merge s2t branch
...
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option. It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP). For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.
For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:
http://www.emnlp2014.org/tutorials/5_notes.pdf
Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
mjdenkowski
40e8f2eca0
Hypergraph output
2014-11-03 09:16:12 -05:00
Hieu Hoang
7ca5e4fbc8
blame stats!
2014-10-31 01:07:33 +00:00
Hieu Hoang
834a89d96b
utf8 encoding /Tomas Fulajtar
2014-10-24 07:33:48 -07:00
Rico Sennrich
df74aa3e89
use short names for sparse features to save disk space and I/O when tuning
2014-10-17 10:36:51 +01:00
Hieu Hoang
44ce4b361a
reduce lmplz memory consumption in recaser
2014-10-14 17:52:47 +01:00
Hieu Hoang
fe266260fb
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2014-10-14 16:01:26 +01:00
Hieu Hoang
6c9c3e1741
portable call to bash /Paul Guyot
2014-10-14 16:01:15 +01:00
Philipp Koehn
2638ff0480
added thot to EMS
2014-10-14 10:13:16 -04:00
Phil Williams
07dbd191ed
analysis.perl: update regexp for current trace format
2014-10-13 10:55:07 +01:00
mjdenkowski
a1f561ac31
Only update dynamic models
2014-10-10 15:09:53 -04:00
Philipp Koehn
34cc9461fb
More Penn Tree Bank compliance (code by Maria Nadejde and Philip Williams
2014-10-10 16:51:32 +01:00
Philipp Koehn
1741bba750
Penn Tree Bank compliant versions of preprocessing
2014-10-10 16:49:06 +01:00
Rico Sennrich
f63807f957
more robust regex
2014-09-30 15:43:38 +01:00
Rico Sennrich
84ad576750
explicitly set BLEU as default scorer (for return-best-dev)
...
(evaluator doesn't accept --scconfig without --sctype)
2014-09-24 14:47:58 +01:00
Hieu Hoang
610090c2ed
don't run truecase trainer unless it's asked for
2014-09-23 21:50:53 +01:00
Rico Sennrich
59cd4be2c9
don't use optimizer-specific options in extractor/evaluator
2014-09-22 10:49:20 +01:00
Rico Sennrich
d39cbca0b9
(optionally) use n-best file for evaluator/return-best-dev
...
this adds support for metrics that rely on alignment / trees
2014-09-22 10:49:20 +01:00
Rico Sennrich
3d00e5dc8c
basic support for more metrics with kbmira
...
metrics need getReferenceLength (for background smoothing) to work with kbmira
2014-09-22 10:49:20 +01:00
Philipp Koehn
ab90efe4af
allow specification of default weights
2014-09-22 05:28:57 +01:00
Philipp Koehn
e9db2fe4aa
Merge branch 'master' of git://github.com/moses-smt/mosesdecoder
2014-09-21 06:04:22 +01:00