Commit Graph

24 Commits

Author SHA1 Message Date
Jeroen Vermeulen
ef028446f3 Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project.  Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.

I kept the notices as short as I could.  As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Kenneth Heafield
ee39fdbaa5 Relative path 2015-02-10 10:43:10 -05:00
Charley C
e40606d08f default path update in train-recaser 2015-02-09 18:36:31 -05:00
Philipp Koehn
f69c1dab02 more efficient default recaser training 2015-02-04 09:18:09 +00:00
Hieu Hoang
44ce4b361a reduce lmplz memory consumption in recaser 2014-10-14 17:52:47 +01:00
phikoehn
1a82535cf8 default kenlm training and inference in recaser 2014-06-06 21:54:42 +01:00
Hieu Hoang
99d5e738aa use kenlm if sri specified 2012-10-20 14:01:11 +01:00
Hieu Hoang
b761bd3237 exit 0 on success. /Henry Hu 2012-09-25 10:57:01 +01:00
Rico Sennrich
be1f959a1a truecase corpus before training recaser
gives better results in (small) test, and the code already had a placeholder for it.
(without truecasing, the recaser is more likely to uppercase words like "the" if they are often sentence-initial in the training corpus)

If people don't want the default behavior changed, I can disable truecasing by default and add a command line parameter to enable it.
2012-07-11 16:27:00 +02:00
Hieu Hoang
01b84656bf default pt implementation if no phrase table specified 2012-06-08 00:19:56 +01:00
Jehan
f3cb3ad789 - Bug fix: when --help set, errors on absence of --corpus or --dir must not be displayed.
- Unset variables must not be set as 0.
2011-11-27 10:14:39 +00:00
Jehan
d875b0774b - Exit with failure when a step of train-recaser.sh fails.
It is kind of hard to identify the cause of a problem (or even to see there is a problem) if a script continues when a
main step failed. Better to exit when the error occurs with relevant logs.
2011-11-27 09:55:30 +00:00
Jehan
30febce3e8 - Help output for train-recaser script. 2011-11-25 17:21:55 +00:00
Jehan
78ccb137fb - Coding style fix: use the upstream coding style. 2011-11-25 02:31:18 +00:00
Jehan
5841aea6aa - Recaser train script updated to support IRSTLM as well.
By default, it will still use SRILM so that any previous use of this script from others won't be broken.
To switch to IRSTLM training, simply add "-lm irslm" command line option.
Also if build-lm.sh is not accessible from $PATH, the option "-build-lm /path/to/build-lm.sh" is also available.
2011-11-25 02:16:16 +00:00
hieuhoang1972
eedef63277 keep perl scripts with Unix line endings
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3612 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-11 11:32:27 +00:00
bojar
3d288d81e4 Proper unicode-based lower and uppercasing.
Added language option to recase.perl, English remains the default.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1326 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-26 05:44:27 +00:00
hieuhoang1972
4b0ea463c8 add svn id comments to start of file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1308 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-14 22:30:25 +00:00
hieuhoang1972
3c07c5df4d add svn id comments to start of file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1307 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-14 22:22:36 +00:00
phkoehn
a89acb34ae minor bug fix to recaser training
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1242 1f5c12ca-751b-0410-a591-d2e778427230
2007-02-26 12:19:06 +00:00
phkoehn
14839768c8 a large number of changes. besides little tweaks:
* training script now has proper default behaviour for single-factor models, 
* mert script has better handling of default lambda parameters that now
  works with lexicalized reordering models, and also with multiple 
  models files (e.g. multiple language models)
* parallel mert script is more robust when single jobs fail: detects it
  and resubmits the crashed (or killed) jobs
* recaser added that builds on moses
* filtering script added that also binarizes filtered model files
  (this will be eventually replaced when the lexicalized reordering
  model also uses the binary format)


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1210 1f5c12ca-751b-0410-a591-d2e778427230
2007-02-13 19:22:35 +00:00