Commit Graph

1085 Commits

Author SHA1 Message Date
Matthias Huck
506427368f filter-model-given-input.pl: drop "-encoding None" from phrase table binaization with processPhraseTableMin. Recommended by Marcin. 2015-03-23 14:38:24 +00:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Rico Sennrich
ca08b1d205 reduce-factors: port xml support from train-model.perl 2015-03-20 14:44:48 +00:00
Rico Sennrich
b8ca33c34e RDLM training without editing bash scripts 2015-03-20 14:12:41 +00:00
Rico Sennrich
2271f295e6 nplm_train: more options 2015-03-20 14:12:41 +00:00
Rico Sennrich
eab513b635 relational dependency language model 2015-03-18 17:39:45 +00:00
Phil Williams
ac51e9f0a8 Always use "SyntaxInputWeight0" as name of SyntaxInputWeight feature 2015-03-18 09:56:46 +00:00
Phil Williams
05872cf32f Add tree-converter-mosesxml.sh wrapper script 2015-03-12 22:27:43 +00:00
Phil Williams
4685474e9b parse-en-egret.perl: wrap tree in parentheses prior to conversion to XML 2015-03-12 09:49:28 +00:00
Philipp Koehn
1632c5f39d proper handling of specified configuration file 2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9 GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label) 2015-03-10 21:25:32 +00:00
Phil Williams
c7cf33ee05 parse-en-egret.perl: use "ROOT" instead of "TOP" as label of root tree node
This is to match the label Egret assigns to the root vertices of forests.
2015-03-10 15:43:14 +00:00
Phil Williams
f7b4d403e3 Add parse-en-egret.perl wrapper script. 2015-03-10 14:32:59 +00:00
Phil Williams
91abb69cdf train-model.perl: add -use-syntax-input-weight-feature option
Currently only used for forest input.
2015-03-10 11:39:14 +00:00
Phil Williams
e79644540c train-model.perl: add -dont-tune-glue-grammar option 2015-03-10 09:53:12 +00:00
Phil Williams
fd3dcb7bb0 filter-model-given-input.pl: add -[no]StripXml and -SyntaxFilterCmd options
-noStripXml is required for tree and forest input in STSG-based models.

-SyntaxFilterCmd can be used to set the command for filtering rule tables in
syntax-based models.  The default is to use

    $SCRIPTS_ROOTDIR/../bin/filter-rule-table

The option -MinNonInitialRuleCount is deprecated.
2015-03-10 08:57:56 +00:00
Phil Williams
70bef90b36 train-model.perl: add -score-command option
This matches the existing -extract-command option.  Given the argument value
<name>, train-model.perl will use the score program in

  $SCRIPTS_ROOTDIR/../bin/<name>

The default value is "score".
2015-03-10 08:48:54 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Hieu Hoang
cb2e1b8a40 separate variables into lines. Easier to merge with other branches 2015-03-05 21:37:30 +00:00
Hieu Hoang
0f5556f6d9 separate variables into lines. Easier to merge with other branches 2015-03-05 21:28:51 +00:00
Matthias Huck
638e9c3f60 POS property: map tags to indices in consolidate 2015-03-04 22:48:34 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Rico Sennrich
ff5502d323 off-by-one error in previous commit 2015-03-04 17:25:19 +00:00
Rico Sennrich
71ab598435 extract_test.py should also create numberized corpus 2015-03-04 17:10:06 +00:00
Rico Sennrich
dca8ddc746 EMS convenience:
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
  - use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Rico Sennrich
f9ec387a5b typo 2015-03-04 10:06:03 +00:00
Rico Sennrich
e2b1ac1e9d fix option --return-best-dev with hypergraph MIRA (which I broke in commit d39cbca0b9) 2015-02-27 14:47:37 +00:00
Marcin Junczys-Dowmunt
a3d2adca50 Update filter-model-given-input.pl
Added -encoding None to force single pass for compact phrase table so it works with pipes.
2015-02-26 14:04:06 +01:00
Ondrej Bojar
441a2bb190 safer binarizer execution, bash, sort tempdir 2015-02-24 00:36:29 +01:00
Matthias Huck
8025cbf350 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-02-16 15:10:15 +00:00
Phil Williams
92a21f9d3a train-model.perl: fix "argument isn't numeric" warning 2015-02-13 11:55:39 +00:00
Matthias Huck
53ce063214 tuneable-components config parameter for feature functions 2015-02-09 13:52:05 +00:00
Hieu Hoang
78f79632b9 script to convert moses.ini v2 to v1 /Tom Hoar 2015-02-03 10:59:38 +00:00
XapaJIaMnu
6ca1a4718c Expose learning rate as a parameter 2015-01-25 02:13:47 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Hieu Hoang
59c4baec3f use utf8 german model 2015-01-22 16:10:12 +00:00
Hieu Hoang
90d4b2d713 use pigz rather than gzip if it exists 2015-01-13 15:16:22 +00:00
Hieu Hoang
a8d4b81e71 Revert "Update train-model.perl"
This reverts commit e1e14a91ee.
2015-01-08 16:07:40 +00:00
Philipp Koehn
0441fd6ab9 added informative error message when trying to build a lexicalized reordering model with hierarchical model 2015-01-06 18:46:02 +00:00
Philipp Koehn
59fdb3d99c same spec for dedicated script as for train-model.perl and filter-model-given-input.pl 2014-12-21 01:37:05 +00:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Rico Sennrich
67e101b07a Revert "Update train-model.perl"
This reverts commit 41f06a01c0.
2014-12-17 17:51:02 +00:00
Rico Sennrich
685f18ca1b documentation/readability 2014-12-16 17:42:17 +00:00
Nicola Bertoldi
d0cddf0f2d Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-12-16 17:35:47 +01:00
Nicola Bertoldi
4e77665d30 better handling of cache-based models with inconsistent parameters 2014-12-15 17:42:41 +01:00
Xiang Li
41f06a01c0 Update train-model.perl
If the final alignment model is model 3-5, the hmm model will be trained.
2014-12-16 00:37:15 +08:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Kenneth Heafield
8bbccd441a Fix #85 by changing the default LM. Hieu said it's ok in the issue. 2014-12-11 23:51:48 -05:00
Xiang Li
e1e14a91ee Update train-model.perl
The default hmm iterations of GIZA++ is 5. Even though the "hmm-align" option is not set. The hmm align is also activated when using the training script.
2014-12-01 11:26:53 +08:00
Rico Sennrich
4ca730a67c improve bilingualLM alignment heuristics consistency 2014-11-26 10:32:41 +00:00