Commit Graph

533 Commits

Author SHA1 Message Date
Jeroen Vermeulen
61162dd242 Fix more Python lint.
Most of the complaints fixed here were from Pocketlint, but many were also
from Syntastic the vim plugin.
2015-05-16 17:26:56 +07:00
Hieu Hoang
abfc0671a3 osm tweaks and morfessor wrapper 2015-05-12 20:19:39 +04:00
Hieu Hoang
8bb18b9ff0 add no-splitter-training argument. Splitter to be used by mada 2015-05-11 15:26:50 +04:00
Barry Haddow
85c1af4d72 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478 mmsapt doesn't require feature weights on first tuning iteration 2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394 output bleu for multi-bleu hack 2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c don't output remaining args twice 2015-05-05 12:15:08 +04:00
Hieu Hoang
8f272e04a9 output debugging messages to stderr, not stdout 2015-05-05 12:01:21 +04:00
Hieu Hoang
d456d9229e add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring 2015-05-03 14:07:12 +04:00
Philipp Koehn
a4a7c14593 allow breaking up training data for fast align (to avoid memory blowups for very large corpora) 2015-05-01 17:47:08 -04:00
Philipp Koehn
b369699661 various small changes, mostly related to better compliance with grid engine 2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980 fix interpolation for LM with parser in pre-processing 2015-04-30 15:46:33 +01:00
Hieu Hoang
4b47e1148c use ignore-unless /Philipp Koehn 2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78 hack to allow target side of tokenized parallel corpus to be used for LM 2015-04-22 19:01:12 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Hieu Hoang
02185a85fb store temp run files in current directory, not /tmp 2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9 leave in runPath for debugging 2015-04-05 16:49:12 +04:00
Hieu Hoang
7ffdddef13 script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo 2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19 Fix some problems in EMS:
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Hieu Hoang
b2f9ba2b64 revert last commit to add MASTER_PATH. Not needed 2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96 pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node 2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Hieu Hoang
e22d275c32 don't ignore lowercasing of factored LM. Must be consistent with pt 2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8 lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Phil Williams
fc15e03ebe Replace truecase-egret.sh with more general tree-converter-wrapper.perl 2015-03-18 09:57:42 +00:00
Phil Williams
0a8e5fb3bf EMS: fix TRAINING:use-syntax-input-weight-feature option 2015-03-13 17:18:56 +00:00
Hieu Hoang
ce8b0e0876 fix example for reusing tuned moses.ini file 2015-03-13 15:07:23 +00:00
Philipp Koehn
530d0f5a11 some more better defaults for recaser 2015-03-11 17:56:02 +00:00
Philipp Koehn
2ce45229f8 better default configuration for recaser 2015-03-11 17:52:30 +00:00
Philipp Koehn
1632c5f39d proper handling of specified configuration file 2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9 GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label) 2015-03-10 21:25:32 +00:00
Phil Williams
9e2eb702dc EMS: add TRAINING:use-syntax-input-weight-feature option 2015-03-10 11:40:49 +00:00
Phil Williams
7eba58b942 EMS: add TRAINING:dont-tune-glue-grammar option
Adds -dont-tune-glue-grammar to train-model.perl command during config file
generation step.  This is preferable to manually adding -dont-tune-glue-grammar
to TRAINING:training-options because changing its value won't trigger a re-run
of dependent steps that don't really need re-running (like word alignment).
2015-03-10 10:20:19 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Rico Sennrich
2431f514dd fix EMS bug from dca8dd: cleaning step was skipped 2015-03-05 10:55:35 +00:00
Rico Sennrich
47c460fe1d remove unused variable 2015-03-05 08:31:50 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Rico Sennrich
dca8ddc746 EMS convenience:
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
  - use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Phil Williams
90e8d4940c EMS: add TRAINING:no-glue-grammar option 2015-03-03 12:36:09 +00:00
Philipp Koehn
39c0068e4f discount_fallback for lmplz 2015-02-26 22:21:50 +00:00
Hieu Hoang
6186262a3b don't use processPhraseTable in EMS 2015-01-12 12:43:51 +00:00
Hieu Hoang
0a707597d8 Revert "Added error message on experiment.meta for the filter step 'No phrases in'"
This reverts commit 2105423626.
2015-01-03 21:58:15 +05:30
Eleftherios Avramidis
2105423626 Added error message on experiment.meta for the filter step 'No phrases in' 2014-12-28 18:09:33 +01:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Phil Williams
1353aa57dc experiment.meta: fixes for $input-parse-relaxer 2014-12-08 16:26:08 +00:00
Philipp Koehn
9d55ce13c0 change for thot integration 2014-12-02 14:05:56 -05:00
Phil Williams
59a1ce7380 substitute-filtered-tables.perl: check for RuleTable feature 2014-11-06 11:14:51 +00:00
Phil Williams
5240c430ce Merge s2t branch
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option.  It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP).  For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.

For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:

  http://www.emnlp2014.org/tutorials/5_notes.pdf

Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
Rico Sennrich
df74aa3e89 use short names for sparse features to save disk space and I/O when tuning 2014-10-17 10:36:51 +01:00
Philipp Koehn
2638ff0480 added thot to EMS 2014-10-14 10:13:16 -04:00
Phil Williams
07dbd191ed analysis.perl: update regexp for current trace format 2014-10-13 10:55:07 +01:00
Hieu Hoang
610090c2ed don't run truecase trainer unless it's asked for 2014-09-23 21:50:53 +01:00
Philipp Koehn
a8659d1399 support for specified weights 2014-09-21 06:01:16 +01:00
Philipp Koehn
acefdb0262 bug fix for final-step 2014-09-21 05:59:21 +01:00
Philipp Koehn
a574454635 bug fix with delete crashed step output files 2014-08-14 14:14:42 -04:00
Philipp Koehn
7a087f24df also delete interrupted steps 2014-08-14 10:15:58 -04:00
Matthias Huck
c27cbf55ea source labels: integration into EMS 2014-08-07 21:02:51 +01:00
Matthias Huck
3a5dee12e8 implementation of phrase orientation in GHKM extraction
(...but a corresponding feature function for the chart-based decoder has not been written yet)
2014-07-28 18:27:12 +01:00
phikoehn
573076976f added transliteration into ems example config, minor fixes 2014-07-23 15:44:55 +01:00
phikoehn
2d11fe3916 Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-07-23 15:40:04 +01:00
phikoehn
2239501b21 allow specification of weights for lm interpolation 2014-07-23 15:39:42 +01:00
Philipp Koehn
55ae15a6f8 integration of Uli Germann's memory mapped suffix array phrase table into EMS 2014-07-22 10:12:14 -04:00
Philipp Koehn
36919b53a7 example files for memory mapped suffix array phrase table by Uli Germann 2014-07-22 10:10:34 -04:00
Matthias Huck
c2644c9a08 typo in log output 2014-06-16 15:10:53 +01:00
XapaJIaMnu
5c6c44291d Python2 scripts should require python2 specifically 2014-06-13 15:44:30 +01:00
Matthias Huck
02848112d8 experiment.meta: skip-parse and mock-parse 2014-06-11 19:06:04 +01:00
phikoehn
45648d03b9 support for lmplz training of osm in ems 2014-06-11 13:44:02 +01:00
phikoehn
76859cf37b default OSM training to lmplz (kenlm) 2014-06-08 06:13:46 +01:00
phikoehn
ceadacd3af Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-06-05 21:33:35 +01:00
Philipp Koehn
15288213be allow < > in factors 2014-06-05 21:31:09 +01:00
Barry Haddow
ea37abbf45 lm query output has been modified 2014-06-03 22:13:25 +01:00
Philipp Koehn
b1d09352ca change default language model training to kenlm 2014-06-02 22:20:16 -04:00
phikoehn
9a91f423e4 fixed error 2014-05-30 08:30:06 +01:00
Hieu Hoang
9615f4636c change error to warning. Seems to work ok with recaser 2014-05-30 05:40:22 +01:00
Philipp Koehn
7adf705a9e Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
merge with upstram
2014-05-28 03:59:16 -04:00
Philipp Koehn
1b26f37f9a allow specification of final step and final outcome in experiment.perl 2014-05-28 03:58:14 -04:00
phikoehn
f7ec76654e minor in experiment.meta: proper multi-bleu, beautify 2014-05-28 01:39:53 +01:00
phikoehn
f4651751e6 Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-05-28 01:39:00 +01:00
phikoehn
d93da293ad transliteration with multiple language models, other minor 2014-05-28 01:38:26 +01:00
Hieu Hoang
cfe22c29bc Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-05-25 10:06:51 +01:00
Hieu Hoang
b724a90d59 more edinburgh servers 2014-05-25 10:06:40 +01:00
Philipp Koehn
2876613973 fixes to delete-crashed and delete-run 2014-05-23 15:02:22 -04:00
Philipp Koehn
85ea9d552a fixes to delete-crashed and delete-run 2014-05-23 15:01:53 -04:00
Philipp Koehn
dd9a59499f progress on deleting steps and runs 2014-05-21 11:16:40 -04:00
Philipp Koehn
aac51cec89 ems: delete a run. may work. 2014-05-16 16:57:34 -04:00
Your Name
93d2d19c3e delete crashed steps 2014-05-08 16:42:11 -04:00
Ulrich Germann
3d4ab5a0d9 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into dynamic-phrase-tables 2014-04-28 10:22:12 +01:00
Ulrich Germann
7c145d045b Merge branch 'master' into dynamic-phrase-tables
Conflicts:
	contrib/server/Jamfile
	contrib/server/mosesserver.cpp
2014-04-28 10:00:07 +01:00
Philipp Koehn
fa85d15e31 relative links to javascript sub directory in ems web interface 2014-04-23 16:56:10 -04:00
phikoehn
4ee4e07c1b minor ems fixes 2014-04-23 13:50:08 +01:00
Nadir Durrani
5e3e50d4ec In-Decoding Transliteration Module 2014-04-16 17:28:49 +01:00
Ulrich Germann
fbb4b59084 Added option to disable output buffering to split-sentences.perl. 2014-04-16 02:40:23 +01:00
Rico Sennrich
c8682e9420 target-syntax: use SoftMatchingFeature to assign non-terminal to unknown words 2014-03-24 14:57:24 +00:00
Phil Williams
cea86d6750 Transliteration support for syntax models. 2014-03-21 22:13:38 +00:00
Hieu Hoang
c501e5fab6 accidental error in perl script 2014-03-14 09:04:49 +00:00
Nadir Durrani
054a648713 Transliteration Script - Modifications 2014-03-13 13:10:38 +00:00
phikoehn
049be8b71c Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2014-02-12 21:01:09 +00:00
phikoehn
d6b62db5b1 fix bug if interpolated lm on different factors 2014-02-12 21:00:55 +00:00
Matthias Huck
65811a0325 tree fragments: tiny issues with the extraction pipeline 2014-02-03 18:13:10 +00:00
Hieu Hoang
dc3d5b8d38 source labelling for test set. 2014-01-24 16:33:30 +00:00
Hieu Hoang
878e7ab899 source labelling for tuning set. More debugging message in filtering script 2014-01-24 16:21:47 +00:00
Hieu Hoang
6a10f8ce71 corrected phrase-table name / type mixup when creating filtering script 2014-01-23 17:09:56 +00:00
Hieu Hoang
05de672bd8 need to 'label' target side too 2014-01-21 19:21:24 +00:00
Hieu Hoang
27152ccce4 add source labeller to EMS 2014-01-20 23:26:06 +00:00
phikoehn
4e75911331 changed biconcor location in EMS example config files 2014-01-16 13:58:45 +00:00
Hieu Hoang
ebc724b3de Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-01-12 13:51:04 +00:00
Hieu Hoang
a975e3d32d Add Exception as a keyword for detecting error in EMS step 2014-01-12 13:50:01 +00:00
phikoehn
25553079d9 bug fix with sparse feature handling depending on word alignment in compact phrase table 2014-01-10 18:34:47 +00:00
phikoehn
9ea0f5dd0e reporting on init (pretty slow for binary phrase table!) and bug fix in experiment.perl with setting filter settings 2014-01-05 22:39:47 +00:00
phikoehn
c8b5cc4f0e avoid warning; 2013-12-31 19:21:28 +00:00
Nadir Durrani
7f75018349 Post-decoding Transliteration Script 2013-12-18 16:10:57 +00:00
Nadir Durrani
c291f859a0 Transliteration Mining 2013-12-16 18:19:44 +00:00
phikoehn
dab6a301fa make reference-from-sgm.perl more robust 2013-11-30 02:00:04 +00:00
Hieu Hoang
f85d26ec60 delete reuse-weights.perl 2013-11-12 12:19:44 +00:00
Hieu Hoang
df3f3d130f reuse-weights.perl --> substitute-weights.perl 2013-11-12 12:07:06 +00:00
JIDFmaster@JIDF.org
f68a92e9c1 correcting the reuse-weights.perl for a new format 2013-11-12 00:38:03 +01:00
Barry Haddow
97695164dd Basic support for WADE analysis
Partial support for running WADE (http://www.umiacs.umd.edu/~hal/damt/)
analysis from ems. You still need to create the input-reference alignments
somehow - for example by running training with the test set concatenated
to the training set.

To use WADE, (i) add 'wade = /path/to/wade.py' to the EVALUATION section and
(ii) add 'alignment = /path/to/alignments' to the approriate stanza
for each test set.
2013-11-01 16:56:55 +00:00
phikoehn
29d2c015a3 removed spurious $input-extension 2013-10-11 02:02:01 +01:00
Hieu Hoang
e8951c9243 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-10-10 21:30:17 +00:00
Hieu Hoang
9cbfa50102 add actual config file and run command used to train and decode this example 2013-10-10 21:30:07 +00:00
phikoehn
75e007d0f3 minor fixes 2013-10-10 10:12:56 +01:00
phikoehn
1e702c46b2 updated web interface for experiment.perl 2013-09-25 23:16:53 +01:00
Barry Haddow
ef43d6e038 Need phrase penalty weight 2013-09-11 10:59:48 +01:00
Barry Haddow
03997dfc3a Change number of weights in example 2013-09-11 10:41:17 +01:00
phikoehn
9f40416ee4 changed default for hierarchical phrase table binarization 2013-09-08 18:30:48 +01:00
Barry Haddow
867c6efe6c Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-09-06 13:29:29 +01:00
Barry Haddow
7425036c3a scoring correction 2013-09-06 13:29:20 +01:00
Nadir Durrani
4156c7acb6 Config files 2013-08-27 13:47:09 +01:00
Nadir Durrani
696c0eff61 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2013-08-26 13:22:57 +01:00
nadir
fb35e1f3c9 Training Scripts for Factored OSM 2013-08-26 13:21:04 +01:00
phikoehn
79a2c98ff7 better ems support for different binarizers and reordering models 2013-08-25 20:30:37 +01:00
phikoehn
b368085609 xml constraint 2013-08-15 11:46:45 +01:00
Hieu Hoang
02c7af3fb8 Mira changes. Manually applied Eva's patch 2013-08-12 13:03:26 +01:00
Hieu Hoang
b05a443f36 correct arguments to substitute-filtered-tables-and-weights.perl 2013-07-30 11:14:17 +01:00
Barry Haddow
46ee1ca42d More lattice fixes squashed by merge 2013-07-24 16:09:32 +01:00
Barry Haddow
d5e40a5b08 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:38:23 +01:00
Nadir Durrani
30544ae17e Sample Config File 2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd Sample Config File 2013-07-23 12:18:57 +01:00
Barry Haddow
ecc6c7177c Reinstate lattice fixes squashed by merge 2013-07-22 17:25:01 +01:00
Nadir Durrani
deae3ac7b9 OSM entries in train-model.perl, experiment.* 2013-07-07 13:05:09 +01:00
Nadir Durrani
d2bc6a2584 In EMS 2013-07-04 19:58:19 +01:00
Nadir Durrani
fbdb07a94c EMS 2013-07-03 10:54:38 +01:00
Wilker Aziz
2c19238c24 Patching up the suffix array wrappers 2013-06-24 15:38:10 +01:00