Matthias Huck
506427368f
filter-model-given-input.pl: drop "-encoding None" from phrase table binaization with processPhraseTableMin. Recommended by Marcin.
2015-03-23 14:38:24 +00:00
Rico Sennrich
3a673fc8dc
EMS: support for syntactic metrics for MERT/MIRA
...
- add "-n-best-trees" to TUNING:decoder-settings
- add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
- TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Rico Sennrich
ca08b1d205
reduce-factors: port xml support from train-model.perl
2015-03-20 14:44:48 +00:00
Rico Sennrich
b8ca33c34e
RDLM training without editing bash scripts
2015-03-20 14:12:41 +00:00
Rico Sennrich
2271f295e6
nplm_train: more options
2015-03-20 14:12:41 +00:00
Rico Sennrich
eab513b635
relational dependency language model
2015-03-18 17:39:45 +00:00
Phil Williams
ac51e9f0a8
Always use "SyntaxInputWeight0" as name of SyntaxInputWeight feature
2015-03-18 09:56:46 +00:00
Phil Williams
05872cf32f
Add tree-converter-mosesxml.sh wrapper script
2015-03-12 22:27:43 +00:00
Phil Williams
4685474e9b
parse-en-egret.perl: wrap tree in parentheses prior to conversion to XML
2015-03-12 09:49:28 +00:00
Philipp Koehn
1632c5f39d
proper handling of specified configuration file
2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9
GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label)
2015-03-10 21:25:32 +00:00
Phil Williams
c7cf33ee05
parse-en-egret.perl: use "ROOT" instead of "TOP" as label of root tree node
...
This is to match the label Egret assigns to the root vertices of forests.
2015-03-10 15:43:14 +00:00
Phil Williams
f7b4d403e3
Add parse-en-egret.perl wrapper script.
2015-03-10 14:32:59 +00:00
Phil Williams
91abb69cdf
train-model.perl: add -use-syntax-input-weight-feature option
...
Currently only used for forest input.
2015-03-10 11:39:14 +00:00
Phil Williams
e79644540c
train-model.perl: add -dont-tune-glue-grammar option
2015-03-10 09:53:12 +00:00
Phil Williams
fd3dcb7bb0
filter-model-given-input.pl: add -[no]StripXml and -SyntaxFilterCmd options
...
-noStripXml is required for tree and forest input in STSG-based models.
-SyntaxFilterCmd can be used to set the command for filtering rule tables in
syntax-based models. The default is to use
$SCRIPTS_ROOTDIR/../bin/filter-rule-table
The option -MinNonInitialRuleCount is deprecated.
2015-03-10 08:57:56 +00:00
Phil Williams
70bef90b36
train-model.perl: add -score-command option
...
This matches the existing -extract-command option. Given the argument value
<name>, train-model.perl will use the score program in
$SCRIPTS_ROOTDIR/../bin/<name>
The default value is "score".
2015-03-10 08:48:54 +00:00
Matthias Huck
25f5470216
GHKM: write target parts-of-speech as a factor
2015-03-09 21:54:03 +00:00
Hieu Hoang
cb2e1b8a40
separate variables into lines. Easier to merge with other branches
2015-03-05 21:37:30 +00:00
Hieu Hoang
0f5556f6d9
separate variables into lines. Easier to merge with other branches
2015-03-05 21:28:51 +00:00
Matthias Huck
638e9c3f60
POS property: map tags to indices in consolidate
2015-03-04 22:48:34 +00:00
Matthias Huck
06e87d851e
GHKM: extract POS phrase property (from preterminals in the syntactic parse tree)
2015-03-04 21:40:56 +00:00
Rico Sennrich
ff5502d323
off-by-one error in previous commit
2015-03-04 17:25:19 +00:00
Rico Sennrich
71ab598435
extract_test.py should also create numberized corpus
2015-03-04 17:10:06 +00:00
Rico Sennrich
dca8ddc746
EMS convenience:
...
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
- use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Rico Sennrich
f9ec387a5b
typo
2015-03-04 10:06:03 +00:00
Rico Sennrich
e2b1ac1e9d
fix option --return-best-dev with hypergraph MIRA (which I broke in commit d39cbca0b9
)
2015-02-27 14:47:37 +00:00
Marcin Junczys-Dowmunt
a3d2adca50
Update filter-model-given-input.pl
...
Added -encoding None to force single pass for compact phrase table so it works with pipes.
2015-02-26 14:04:06 +01:00
Ondrej Bojar
441a2bb190
safer binarizer execution, bash, sort tempdir
2015-02-24 00:36:29 +01:00
Matthias Huck
8025cbf350
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2015-02-16 15:10:15 +00:00
Phil Williams
92a21f9d3a
train-model.perl: fix "argument isn't numeric" warning
2015-02-13 11:55:39 +00:00
Matthias Huck
53ce063214
tuneable-components config parameter for feature functions
2015-02-09 13:52:05 +00:00
Hieu Hoang
78f79632b9
script to convert moses.ini v2 to v1 /Tom Hoar
2015-02-03 10:59:38 +00:00
XapaJIaMnu
6ca1a4718c
Expose learning rate as a parameter
2015-01-25 02:13:47 +00:00
Matthias Huck
9987beb453
SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
...
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Hieu Hoang
59c4baec3f
use utf8 german model
2015-01-22 16:10:12 +00:00
Hieu Hoang
90d4b2d713
use pigz rather than gzip if it exists
2015-01-13 15:16:22 +00:00
Hieu Hoang
a8d4b81e71
Revert "Update train-model.perl"
...
This reverts commit e1e14a91ee
.
2015-01-08 16:07:40 +00:00
Philipp Koehn
0441fd6ab9
added informative error message when trying to build a lexicalized reordering model with hierarchical model
2015-01-06 18:46:02 +00:00
Philipp Koehn
59fdb3d99c
same spec for dedicated script as for train-model.perl and filter-model-given-input.pl
2014-12-21 01:37:05 +00:00
Philipp Koehn
831f947874
long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works
2014-12-21 01:14:42 +00:00
Rico Sennrich
67e101b07a
Revert "Update train-model.perl"
...
This reverts commit 41f06a01c0
.
2014-12-17 17:51:02 +00:00
Rico Sennrich
685f18ca1b
documentation/readability
2014-12-16 17:42:17 +00:00
Nicola Bertoldi
d0cddf0f2d
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2014-12-16 17:35:47 +01:00
Nicola Bertoldi
4e77665d30
better handling of cache-based models with inconsistent parameters
2014-12-15 17:42:41 +01:00
Xiang Li
41f06a01c0
Update train-model.perl
...
If the final alignment model is model 3-5, the hmm model will be trained.
2014-12-16 00:37:15 +08:00
Nicola Bertoldi
e4eb201c52
merged master into dynamic-models and solved conflicts
2014-12-13 12:52:47 +01:00
Kenneth Heafield
8bbccd441a
Fix #85 by changing the default LM. Hieu said it's ok in the issue.
2014-12-11 23:51:48 -05:00
Xiang Li
e1e14a91ee
Update train-model.perl
...
The default hmm iterations of GIZA++ is 5. Even though the "hmm-align" option is not set. The hmm align is also activated when using the training script.
2014-12-01 11:26:53 +08:00
Rico Sennrich
4ca730a67c
improve bilingualLM alignment heuristics consistency
2014-11-26 10:32:41 +00:00