Barry Haddow
ad8114ddb0
capitalisation
2015-06-15 16:23:12 +01:00
XapaJIaMnu
166bf7365f
Forgot to update the weight config path
2015-06-12 16:56:36 +01:00
XapaJIaMnu
ffd3f2bb6e
Added basic BilingualNPLM support to EMS and an example config.
2015-06-12 16:21:24 +01:00
Jeroen Vermeulen
85c23ed7dc
Fix some JS lint.
2015-06-02 18:05:12 +07:00
Jeroen Vermeulen
0981d23705
Lint-fixing binge.
2015-06-02 16:02:39 +07:00
Jeroen Vermeulen
ef028446f3
Add license notices to scripts.
...
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project. Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.
I kept the notices as short as I could. As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Rico Sennrich
f6f56d11af
ems: parse-relax comes last in train; do same for dev/test
2015-05-25 15:52:07 +01:00
Rico Sennrich
98ff2382d0
duplication of existing functionality
2015-05-20 17:35:38 +01:00
Rico Sennrich
6aac7ded9a
EMS: more flexible way to concatenate LM training data.
...
the implementation allows the user to specify which corpora to combine,
and to have multiple LMs on the same data.
2015-05-20 17:20:02 +01:00
Rico Sennrich
8ca6764c7d
ems: allow LMs with user-specified training commands and moses.ini config entries
...
intended for neural LMs, syntactic LMs, and the like. currently doesn't play nice with INTERPOLATED-LM.
2015-05-18 19:07:37 +01:00
Rico Sennrich
fb06a2325e
fix broken ems with interpolated lm disabled
2015-05-18 17:26:09 +01:00
Rico Sennrich
f85dd85f6b
ignore-unless magic
2015-05-18 16:17:33 +01:00
Rico Sennrich
59376f500b
still confused about pass-unless vs. ignore-unless
2015-05-18 14:40:56 +01:00
Rico Sennrich
45a97f9016
EMS: disable concatenated LM by default
2015-05-18 14:10:29 +01:00
Rico Sennrich
27fd45d088
ems: training LM on concatenation of all LM training corpora
2015-05-18 12:18:49 +01:00
Jeroen Vermeulen
e2a632a2b8
JavaScript lint.
2015-05-17 21:36:07 +07:00
Jeroen Vermeulen
5d0bbb6a45
Fix some JavaScript lint. Still a lot left.
2015-05-17 21:24:04 +07:00
Jeroen Vermeulen
a25193cc5d
Fix a lot of lint, mostly trailing whitespace.
...
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)
Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Jeroen Vermeulen
61162dd242
Fix more Python lint.
...
Most of the complaints fixed here were from Pocketlint, but many were also
from Syntastic the vim plugin.
2015-05-16 17:26:56 +07:00
Hieu Hoang
abfc0671a3
osm tweaks and morfessor wrapper
2015-05-12 20:19:39 +04:00
Hieu Hoang
8bb18b9ff0
add no-splitter-training argument. Splitter to be used by mada
2015-05-11 15:26:50 +04:00
Barry Haddow
85c1af4d72
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478
mmsapt doesn't require feature weights on first tuning iteration
2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394
output bleu for multi-bleu hack
2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c
don't output remaining args twice
2015-05-05 12:15:08 +04:00
Hieu Hoang
8f272e04a9
output debugging messages to stderr, not stdout
2015-05-05 12:01:21 +04:00
Hieu Hoang
d456d9229e
add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring
2015-05-03 14:07:12 +04:00
Philipp Koehn
a4a7c14593
allow breaking up training data for fast align (to avoid memory blowups for very large corpora)
2015-05-01 17:47:08 -04:00
Philipp Koehn
b369699661
various small changes, mostly related to better compliance with grid engine
2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980
fix interpolation for LM with parser in pre-processing
2015-04-30 15:46:33 +01:00
Hieu Hoang
4b47e1148c
use ignore-unless /Philipp Koehn
2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78
hack to allow target side of tokenized parallel corpus to be used for LM
2015-04-22 19:01:12 +04:00
Hieu Hoang
ab01d30687
make sure GetOptions doesn consume -T by confusing it with --text
2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259
be more tolerant about xml input
2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd
EMS: LM:mock-parse can be actual parser
2015-04-21 10:21:24 +01:00
Hieu Hoang
1b9dc6cfae
more butinah tweaks
2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8
add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada
2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690
add use warnings to all perl scripts
2015-04-13 20:42:33 +04:00
Dingyuan Wang
4aba64ed53
Merge pull request #106 from gumblex/master
...
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Hieu Hoang
02185a85fb
store temp run files in current directory, not /tmp
2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9
leave in runPath for debugging
2015-04-05 16:49:12 +04:00
Hieu Hoang
7ffdddef13
script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo
2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19
Fix some problems in EMS:
...
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Hieu Hoang
b2f9ba2b64
revert last commit to add MASTER_PATH. Not needed
2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96
pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node
2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d
consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid
2015-04-02 17:38:56 +04:00
Hieu Hoang
e22d275c32
don't ignore lowercasing of factored LM. Must be consistent with pt
2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8
lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
...
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
Rico Sennrich
3a673fc8dc
EMS: support for syntactic metrics for MERT/MIRA
...
- add "-n-best-trees" to TUNING:decoder-settings
- add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
- TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Phil Williams
fc15e03ebe
Replace truecase-egret.sh with more general tree-converter-wrapper.perl
2015-03-18 09:57:42 +00:00