gives better results in (small) test, and the code already had a placeholder for it.
(without truecasing, the recaser is more likely to uppercase words like "the" if they are often sentence-initial in the training corpus)
If people don't want the default behavior changed, I can disable truecasing by default and add a command line parameter to enable it.
It is kind of hard to identify the cause of a problem (or even to see there is a problem) if a script continues when a
main step failed. Better to exit when the error occurs with relevant logs.
By default, it will still use SRILM so that any previous use of this script from others won't be broken.
To switch to IRSTLM training, simply add "-lm irslm" command line option.
Also if build-lm.sh is not accessible from $PATH, the option "-build-lm /path/to/build-lm.sh" is also available.
* training script now has proper default behaviour for single-factor models,
* mert script has better handling of default lambda parameters that now
works with lexicalized reordering models, and also with multiple
models files (e.g. multiple language models)
* parallel mert script is more robust when single jobs fail: detects it
and resubmits the crashed (or killed) jobs
* recaser added that builds on moses
* filtering script added that also binarizes filtered model files
(this will be eventually replaced when the lexicalized reordering
model also uses the binary format)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1210 1f5c12ca-751b-0410-a591-d2e778427230