Barry Haddow
85c1af4d72
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478
mmsapt doesn't require feature weights on first tuning iteration
2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394
output bleu for multi-bleu hack
2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c
don't output remaining args twice
2015-05-05 12:15:08 +04:00
Hieu Hoang
8f272e04a9
output debugging messages to stderr, not stdout
2015-05-05 12:01:21 +04:00
Hieu Hoang
d456d9229e
add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring
2015-05-03 14:07:12 +04:00
Philipp Koehn
a4a7c14593
allow breaking up training data for fast align (to avoid memory blowups for very large corpora)
2015-05-01 17:47:08 -04:00
Philipp Koehn
b369699661
various small changes, mostly related to better compliance with grid engine
2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980
fix interpolation for LM with parser in pre-processing
2015-04-30 15:46:33 +01:00
Hieu Hoang
4b47e1148c
use ignore-unless /Philipp Koehn
2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78
hack to allow target side of tokenized parallel corpus to be used for LM
2015-04-22 19:01:12 +04:00
Hieu Hoang
ab01d30687
make sure GetOptions doesn consume -T by confusing it with --text
2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259
be more tolerant about xml input
2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd
EMS: LM:mock-parse can be actual parser
2015-04-21 10:21:24 +01:00
Hieu Hoang
1b9dc6cfae
more butinah tweaks
2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8
add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada
2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690
add use warnings to all perl scripts
2015-04-13 20:42:33 +04:00
Dingyuan Wang
4aba64ed53
Merge pull request #106 from gumblex/master
...
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Hieu Hoang
02185a85fb
store temp run files in current directory, not /tmp
2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9
leave in runPath for debugging
2015-04-05 16:49:12 +04:00
Hieu Hoang
7ffdddef13
script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo
2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19
Fix some problems in EMS:
...
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Hieu Hoang
b2f9ba2b64
revert last commit to add MASTER_PATH. Not needed
2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96
pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node
2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d
consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid
2015-04-02 17:38:56 +04:00
Hieu Hoang
e22d275c32
don't ignore lowercasing of factored LM. Must be consistent with pt
2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8
lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
...
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
Rico Sennrich
3a673fc8dc
EMS: support for syntactic metrics for MERT/MIRA
...
- add "-n-best-trees" to TUNING:decoder-settings
- add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
- TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Phil Williams
fc15e03ebe
Replace truecase-egret.sh with more general tree-converter-wrapper.perl
2015-03-18 09:57:42 +00:00
Phil Williams
0a8e5fb3bf
EMS: fix TRAINING:use-syntax-input-weight-feature option
2015-03-13 17:18:56 +00:00
Hieu Hoang
ce8b0e0876
fix example for reusing tuned moses.ini file
2015-03-13 15:07:23 +00:00
Philipp Koehn
530d0f5a11
some more better defaults for recaser
2015-03-11 17:56:02 +00:00
Philipp Koehn
2ce45229f8
better default configuration for recaser
2015-03-11 17:52:30 +00:00
Philipp Koehn
1632c5f39d
proper handling of specified configuration file
2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9
GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label)
2015-03-10 21:25:32 +00:00
Phil Williams
9e2eb702dc
EMS: add TRAINING:use-syntax-input-weight-feature option
2015-03-10 11:40:49 +00:00
Phil Williams
7eba58b942
EMS: add TRAINING:dont-tune-glue-grammar option
...
Adds -dont-tune-glue-grammar to train-model.perl command during config file
generation step. This is preferable to manually adding -dont-tune-glue-grammar
to TRAINING:training-options because changing its value won't trigger a re-run
of dependent steps that don't really need re-running (like word alignment).
2015-03-10 10:20:19 +00:00
Matthias Huck
25f5470216
GHKM: write target parts-of-speech as a factor
2015-03-09 21:54:03 +00:00
Rico Sennrich
2431f514dd
fix EMS bug from dca8dd: cleaning step was skipped
2015-03-05 10:55:35 +00:00
Rico Sennrich
47c460fe1d
remove unused variable
2015-03-05 08:31:50 +00:00
Matthias Huck
06e87d851e
GHKM: extract POS phrase property (from preterminals in the syntactic parse tree)
2015-03-04 21:40:56 +00:00
Rico Sennrich
dca8ddc746
EMS convenience:
...
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
- use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Phil Williams
90e8d4940c
EMS: add TRAINING:no-glue-grammar option
2015-03-03 12:36:09 +00:00
Philipp Koehn
39c0068e4f
discount_fallback for lmplz
2015-02-26 22:21:50 +00:00
Hieu Hoang
6186262a3b
don't use processPhraseTable in EMS
2015-01-12 12:43:51 +00:00
Hieu Hoang
0a707597d8
Revert "Added error message on experiment.meta for the filter step 'No phrases in'"
...
This reverts commit 2105423626
.
2015-01-03 21:58:15 +05:30
Eleftherios Avramidis
2105423626
Added error message on experiment.meta for the filter step 'No phrases in'
2014-12-28 18:09:33 +01:00
Philipp Koehn
831f947874
long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works
2014-12-21 01:14:42 +00:00
Phil Williams
1353aa57dc
experiment.meta: fixes for $input-parse-relaxer
2014-12-08 16:26:08 +00:00
Philipp Koehn
9d55ce13c0
change for thot integration
2014-12-02 14:05:56 -05:00
Phil Williams
59a1ce7380
substitute-filtered-tables.perl: check for RuleTable feature
2014-11-06 11:14:51 +00:00
Phil Williams
5240c430ce
Merge s2t branch
...
This adds a new string-to-tree decoder, which can be enabled with the -s2t
option. It's intended to be faster and simpler than the generic chart
decoder, and is designed to support lattice input (still WIP). For a en-de
system trained on WMT14 data, it's approximately 40% faster in practice.
For background information, see the decoding section of the EMNLP tutorial
on syntax-based MT:
http://www.emnlp2014.org/tutorials/5_notes.pdf
Some features are not implemented yet, including support for internal tree
structure and soft source-syntactic constraints.
2014-11-04 13:13:56 +00:00
Rico Sennrich
df74aa3e89
use short names for sparse features to save disk space and I/O when tuning
2014-10-17 10:36:51 +01:00
Philipp Koehn
2638ff0480
added thot to EMS
2014-10-14 10:13:16 -04:00
Phil Williams
07dbd191ed
analysis.perl: update regexp for current trace format
2014-10-13 10:55:07 +01:00
Hieu Hoang
610090c2ed
don't run truecase trainer unless it's asked for
2014-09-23 21:50:53 +01:00
Philipp Koehn
a8659d1399
support for specified weights
2014-09-21 06:01:16 +01:00
Philipp Koehn
acefdb0262
bug fix for final-step
2014-09-21 05:59:21 +01:00
Philipp Koehn
a574454635
bug fix with delete crashed step output files
2014-08-14 14:14:42 -04:00
Philipp Koehn
7a087f24df
also delete interrupted steps
2014-08-14 10:15:58 -04:00
Matthias Huck
c27cbf55ea
source labels: integration into EMS
2014-08-07 21:02:51 +01:00
Matthias Huck
3a5dee12e8
implementation of phrase orientation in GHKM extraction
...
(...but a corresponding feature function for the chart-based decoder has not been written yet)
2014-07-28 18:27:12 +01:00
phikoehn
573076976f
added transliteration into ems example config, minor fixes
2014-07-23 15:44:55 +01:00
phikoehn
2d11fe3916
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
2014-07-23 15:40:04 +01:00
phikoehn
2239501b21
allow specification of weights for lm interpolation
2014-07-23 15:39:42 +01:00
Philipp Koehn
55ae15a6f8
integration of Uli Germann's memory mapped suffix array phrase table into EMS
2014-07-22 10:12:14 -04:00
Philipp Koehn
36919b53a7
example files for memory mapped suffix array phrase table by Uli Germann
2014-07-22 10:10:34 -04:00
Matthias Huck
c2644c9a08
typo in log output
2014-06-16 15:10:53 +01:00
XapaJIaMnu
5c6c44291d
Python2 scripts should require python2 specifically
2014-06-13 15:44:30 +01:00
Matthias Huck
02848112d8
experiment.meta: skip-parse and mock-parse
2014-06-11 19:06:04 +01:00
phikoehn
45648d03b9
support for lmplz training of osm in ems
2014-06-11 13:44:02 +01:00
phikoehn
76859cf37b
default OSM training to lmplz (kenlm)
2014-06-08 06:13:46 +01:00
phikoehn
ceadacd3af
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
2014-06-05 21:33:35 +01:00
Philipp Koehn
15288213be
allow < > in factors
2014-06-05 21:31:09 +01:00
Barry Haddow
ea37abbf45
lm query output has been modified
2014-06-03 22:13:25 +01:00
Philipp Koehn
b1d09352ca
change default language model training to kenlm
2014-06-02 22:20:16 -04:00
phikoehn
9a91f423e4
fixed error
2014-05-30 08:30:06 +01:00
Hieu Hoang
9615f4636c
change error to warning. Seems to work ok with recaser
2014-05-30 05:40:22 +01:00
Philipp Koehn
7adf705a9e
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
...
merge with upstram
2014-05-28 03:59:16 -04:00
Philipp Koehn
1b26f37f9a
allow specification of final step and final outcome in experiment.perl
2014-05-28 03:58:14 -04:00
phikoehn
f7ec76654e
minor in experiment.meta: proper multi-bleu, beautify
2014-05-28 01:39:53 +01:00
phikoehn
f4651751e6
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
2014-05-28 01:39:00 +01:00
phikoehn
d93da293ad
transliteration with multiple language models, other minor
2014-05-28 01:38:26 +01:00
Hieu Hoang
cfe22c29bc
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2014-05-25 10:06:51 +01:00
Hieu Hoang
b724a90d59
more edinburgh servers
2014-05-25 10:06:40 +01:00
Philipp Koehn
2876613973
fixes to delete-crashed and delete-run
2014-05-23 15:02:22 -04:00
Philipp Koehn
85ea9d552a
fixes to delete-crashed and delete-run
2014-05-23 15:01:53 -04:00
Philipp Koehn
dd9a59499f
progress on deleting steps and runs
2014-05-21 11:16:40 -04:00
Philipp Koehn
aac51cec89
ems: delete a run. may work.
2014-05-16 16:57:34 -04:00
Your Name
93d2d19c3e
delete crashed steps
2014-05-08 16:42:11 -04:00
Ulrich Germann
3d4ab5a0d9
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into dynamic-phrase-tables
2014-04-28 10:22:12 +01:00
Ulrich Germann
7c145d045b
Merge branch 'master' into dynamic-phrase-tables
...
Conflicts:
contrib/server/Jamfile
contrib/server/mosesserver.cpp
2014-04-28 10:00:07 +01:00
Philipp Koehn
fa85d15e31
relative links to javascript sub directory in ems web interface
2014-04-23 16:56:10 -04:00
phikoehn
4ee4e07c1b
minor ems fixes
2014-04-23 13:50:08 +01:00
Nadir Durrani
5e3e50d4ec
In-Decoding Transliteration Module
2014-04-16 17:28:49 +01:00
Ulrich Germann
fbb4b59084
Added option to disable output buffering to split-sentences.perl.
2014-04-16 02:40:23 +01:00
Rico Sennrich
c8682e9420
target-syntax: use SoftMatchingFeature to assign non-terminal to unknown words
2014-03-24 14:57:24 +00:00
Phil Williams
cea86d6750
Transliteration support for syntax models.
2014-03-21 22:13:38 +00:00
Hieu Hoang
c501e5fab6
accidental error in perl script
2014-03-14 09:04:49 +00:00
Nadir Durrani
054a648713
Transliteration Script - Modifications
2014-03-13 13:10:38 +00:00