Commit Graph

409 Commits

Author SHA1 Message Date
Dingyuan Wang
aea07b0a19 Fix some problems in EMS:
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Hieu Hoang
b2f9ba2b64 revert last commit to add MASTER_PATH. Not needed 2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96 pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node 2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Hieu Hoang
e22d275c32 don't ignore lowercasing of factored LM. Must be consistent with pt 2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8 lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Phil Williams
fc15e03ebe Replace truecase-egret.sh with more general tree-converter-wrapper.perl 2015-03-18 09:57:42 +00:00
Phil Williams
0a8e5fb3bf EMS: fix TRAINING:use-syntax-input-weight-feature option 2015-03-13 17:18:56 +00:00
Hieu Hoang
ce8b0e0876 fix example for reusing tuned moses.ini file 2015-03-13 15:07:23 +00:00
Philipp Koehn
530d0f5a11 some more better defaults for recaser 2015-03-11 17:56:02 +00:00
Philipp Koehn
2ce45229f8 better default configuration for recaser 2015-03-11 17:52:30 +00:00
Philipp Koehn
1632c5f39d proper handling of specified configuration file 2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9 GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label) 2015-03-10 21:25:32 +00:00
Phil Williams
9e2eb702dc EMS: add TRAINING:use-syntax-input-weight-feature option 2015-03-10 11:40:49 +00:00
Phil Williams
7eba58b942 EMS: add TRAINING:dont-tune-glue-grammar option
Adds -dont-tune-glue-grammar to train-model.perl command during config file
generation step.  This is preferable to manually adding -dont-tune-glue-grammar
to TRAINING:training-options because changing its value won't trigger a re-run
of dependent steps that don't really need re-running (like word alignment).
2015-03-10 10:20:19 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Rico Sennrich
2431f514dd fix EMS bug from dca8dd: cleaning step was skipped 2015-03-05 10:55:35 +00:00
Rico Sennrich
47c460fe1d remove unused variable 2015-03-05 08:31:50 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Rico Sennrich
dca8ddc746 EMS convenience:
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
  - use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Phil Williams
90e8d4940c EMS: add TRAINING:no-glue-grammar option 2015-03-03 12:36:09 +00:00
Philipp Koehn
39c0068e4f discount_fallback for lmplz 2015-02-26 22:21:50 +00:00
Hieu Hoang
6186262a3b don't use processPhraseTable in EMS 2015-01-12 12:43:51 +00:00
Hieu Hoang
0a707597d8 Revert "Added error message on experiment.meta for the filter step 'No phrases in'"
This reverts commit 2105423626.
2015-01-03 21:58:15 +05:30
Eleftherios Avramidis
2105423626 Added error message on experiment.meta for the filter step 'No phrases in' 2014-12-28 18:09:33 +01:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Phil Williams
1353aa57dc experiment.meta: fixes for $input-parse-relaxer 2014-12-08 16:26:08 +00:00
Philipp Koehn
9d55ce13c0 change for thot integration 2014-12-02 14:05:56 -05:00
Phil Williams
59a1ce7380 substitute-filtered-tables.perl: check for RuleTable feature 2014-11-06 11:14:51 +00:00
Phil Williams
5240c430ce Merge s2t branch
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option.  It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP).  For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.

For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:

  http://www.emnlp2014.org/tutorials/5_notes.pdf

Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
Rico Sennrich
df74aa3e89 use short names for sparse features to save disk space and I/O when tuning 2014-10-17 10:36:51 +01:00
Philipp Koehn
2638ff0480 added thot to EMS 2014-10-14 10:13:16 -04:00
Phil Williams
07dbd191ed analysis.perl: update regexp for current trace format 2014-10-13 10:55:07 +01:00
Hieu Hoang
610090c2ed don't run truecase trainer unless it's asked for 2014-09-23 21:50:53 +01:00
Philipp Koehn
a8659d1399 support for specified weights 2014-09-21 06:01:16 +01:00
Philipp Koehn
acefdb0262 bug fix for final-step 2014-09-21 05:59:21 +01:00
Philipp Koehn
a574454635 bug fix with delete crashed step output files 2014-08-14 14:14:42 -04:00
Philipp Koehn
7a087f24df also delete interrupted steps 2014-08-14 10:15:58 -04:00
Matthias Huck
c27cbf55ea source labels: integration into EMS 2014-08-07 21:02:51 +01:00
Matthias Huck
3a5dee12e8 implementation of phrase orientation in GHKM extraction
(...but a corresponding feature function for the chart-based decoder has not been written yet)
2014-07-28 18:27:12 +01:00
phikoehn
573076976f added transliteration into ems example config, minor fixes 2014-07-23 15:44:55 +01:00
phikoehn
2d11fe3916 Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-07-23 15:40:04 +01:00
phikoehn
2239501b21 allow specification of weights for lm interpolation 2014-07-23 15:39:42 +01:00
Philipp Koehn
55ae15a6f8 integration of Uli Germann's memory mapped suffix array phrase table into EMS 2014-07-22 10:12:14 -04:00
Philipp Koehn
36919b53a7 example files for memory mapped suffix array phrase table by Uli Germann 2014-07-22 10:10:34 -04:00
Matthias Huck
c2644c9a08 typo in log output 2014-06-16 15:10:53 +01:00
XapaJIaMnu
5c6c44291d Python2 scripts should require python2 specifically 2014-06-13 15:44:30 +01:00
Matthias Huck
02848112d8 experiment.meta: skip-parse and mock-parse 2014-06-11 19:06:04 +01:00
phikoehn
45648d03b9 support for lmplz training of osm in ems 2014-06-11 13:44:02 +01:00