Commit Graph

32 Commits

Author SHA1 Message Date
Christian Buck
26bf04df5d added unbuffered mode for casers (using -b) 2013-03-04 15:29:13 +00:00
phikoehn
124c36a837 bug fix with MML settings 2013-01-14 19:39:26 +00:00
phikoehn
a7f7379fa4 fixed bug in detruecaser / interaction with esacping 2013-01-14 19:25:43 +00:00
phikoehn
344b150372 bug fixes with escaping / truecasing interactions 2013-01-14 19:22:29 +00:00
Hieu Hoang
99d5e738aa use kenlm if sri specified 2012-10-20 14:01:11 +01:00
Hieu Hoang
b761bd3237 exit 0 on success. /Henry Hu 2012-09-25 10:57:01 +01:00
Rico Sennrich
bed4bc08ad distortion limit for recaser should be 0 2012-07-11 16:57:05 +02:00
Rico Sennrich
be1f959a1a truecase corpus before training recaser
gives better results in (small) test, and the code already had a placeholder for it.
(without truecasing, the recaser is more likely to uppercase words like "the" if they are often sentence-initial in the training corpus)

If people don't want the default behavior changed, I can disable truecasing by default and add a command line parameter to enable it.
2012-07-11 16:27:00 +02:00
Hieu Hoang
01b84656bf default pt implementation if no phrase table specified 2012-06-08 00:19:56 +01:00
Jehan
f3cb3ad789 - Bug fix: when --help set, errors on absence of --corpus or --dir must not be displayed.
- Unset variables must not be set as 0.
2011-11-27 10:14:39 +00:00
Jehan
d875b0774b - Exit with failure when a step of train-recaser.sh fails.
It is kind of hard to identify the cause of a problem (or even to see there is a problem) if a script continues when a
main step failed. Better to exit when the error occurs with relevant logs.
2011-11-27 09:55:30 +00:00
Jehan
30febce3e8 - Help output for train-recaser script. 2011-11-25 17:21:55 +00:00
Jehan
78ccb137fb - Coding style fix: use the upstream coding style. 2011-11-25 02:31:18 +00:00
Jehan
5841aea6aa - Recaser train script updated to support IRSTLM as well.
By default, it will still use SRILM so that any previous use of this script from others won't be broken.
To switch to IRSTLM training, simply add "-lm irslm" command line option.
Also if build-lm.sh is not accessible from $PATH, the option "-build-lm /path/to/build-lm.sh" is also available.
2011-11-25 02:16:16 +00:00
bgottesman
518035ed05 add --possiblyUseFirstToken option, which, when selected, allows certain sentence-initial tokens to be taken into account. See comment in header or support mailing list discussion.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3690 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-09 11:05:23 +00:00
hieuhoang1972
e5edb4b971 delete duplicate detokenizer
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3622 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-13 16:39:46 +00:00
hieuhoang1972
eedef63277 keep perl scripts with Unix line endings
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3612 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-11 11:32:27 +00:00
pjwilliams
2edfc16912 Merge remaining script support for tree-based models from mt3_chart.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3137 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-16 09:45:51 +00:00
bgottesman
5a3a6bd3b0 set utf8 mode on the input and output files, instead of on stdin and stdout, which are not used. This allows case variants of non-ASCII characters to be recognized correctly
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@2987 1f5c12ca-751b-0410-a591-d2e778427230
2010-03-18 19:13:05 +00:00
bojar
dbfe610546 uppercasing first letter even if after punct
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@2846 1f5c12ca-751b-0410-a591-d2e778427230
2010-02-03 14:23:20 +00:00
phkoehn
8d5aef137b bug fix
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@2113 1f5c12ca-751b-0410-a591-d2e778427230
2009-02-09 16:00:35 +00:00
phkoehn
a62f8ee316 added truecaser
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@2112 1f5c12ca-751b-0410-a591-d2e778427230
2009-02-09 15:32:34 +00:00
bojar
7f3e34207a added some heuristics for Czech quotation marks
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1567 1f5c12ca-751b-0410-a591-d2e778427230
2008-02-22 15:07:46 +00:00
bojar
6af3140978 added optional sentence uppercasing (use -u)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1566 1f5c12ca-751b-0410-a591-d2e778427230
2008-02-22 14:50:43 +00:00
jdschroeder
04ae9361d2 added "-v 0" moses flag to decoder call to minimize log output.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1335 1f5c12ca-751b-0410-a591-d2e778427230
2007-04-04 17:04:50 +00:00
bojar
55ea5d6f94 Adding simple Czech rules to detokenizer. Making detokenizer 'released'.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1328 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-26 06:08:13 +00:00
bojar
58bf2089af Adding detokenizer from WMT07 shared scripts.tgz, hoping there are no copyright
problems. Please withdraw if necessary.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1327 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-26 05:46:50 +00:00
bojar
3d288d81e4 Proper unicode-based lower and uppercasing.
Added language option to recase.perl, English remains the default.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1326 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-26 05:44:27 +00:00
hieuhoang1972
4b0ea463c8 add svn id comments to start of file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1308 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-14 22:30:25 +00:00
hieuhoang1972
3c07c5df4d add svn id comments to start of file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1307 1f5c12ca-751b-0410-a591-d2e778427230
2007-03-14 22:22:36 +00:00
phkoehn
a89acb34ae minor bug fix to recaser training
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1242 1f5c12ca-751b-0410-a591-d2e778427230
2007-02-26 12:19:06 +00:00
phkoehn
14839768c8 a large number of changes. besides little tweaks:
* training script now has proper default behaviour for single-factor models, 
* mert script has better handling of default lambda parameters that now
  works with lexicalized reordering models, and also with multiple 
  models files (e.g. multiple language models)
* parallel mert script is more robust when single jobs fail: detects it
  and resubmits the crashed (or killed) jobs
* recaser added that builds on moses
* filtering script added that also binarizes filtered model files
  (this will be eventually replaced when the lexicalized reordering
  model also uses the binary format)


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1210 1f5c12ca-751b-0410-a591-d2e778427230
2007-02-13 19:22:35 +00:00