Commit Graph

451 Commits

Author SHA1 Message Date
Barry Haddow
ad8114ddb0 capitalisation 2015-06-15 16:23:12 +01:00
XapaJIaMnu
166bf7365f Forgot to update the weight config path 2015-06-12 16:56:36 +01:00
XapaJIaMnu
ffd3f2bb6e Added basic BilingualNPLM support to EMS and an example config. 2015-06-12 16:21:24 +01:00
Jeroen Vermeulen
85c23ed7dc Fix some JS lint. 2015-06-02 18:05:12 +07:00
Jeroen Vermeulen
0981d23705 Lint-fixing binge. 2015-06-02 16:02:39 +07:00
Jeroen Vermeulen
ef028446f3 Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project.  Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.

I kept the notices as short as I could.  As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Rico Sennrich
f6f56d11af ems: parse-relax comes last in train; do same for dev/test 2015-05-25 15:52:07 +01:00
Rico Sennrich
98ff2382d0 duplication of existing functionality 2015-05-20 17:35:38 +01:00
Rico Sennrich
6aac7ded9a EMS: more flexible way to concatenate LM training data.
the implementation allows the user to specify which corpora to combine,
and to have multiple LMs on the same data.
2015-05-20 17:20:02 +01:00
Rico Sennrich
8ca6764c7d ems: allow LMs with user-specified training commands and moses.ini config entries
intended for neural LMs, syntactic LMs, and the like. currently doesn't play nice with INTERPOLATED-LM.
2015-05-18 19:07:37 +01:00
Rico Sennrich
fb06a2325e fix broken ems with interpolated lm disabled 2015-05-18 17:26:09 +01:00
Rico Sennrich
f85dd85f6b ignore-unless magic 2015-05-18 16:17:33 +01:00
Rico Sennrich
59376f500b still confused about pass-unless vs. ignore-unless 2015-05-18 14:40:56 +01:00
Rico Sennrich
45a97f9016 EMS: disable concatenated LM by default 2015-05-18 14:10:29 +01:00
Rico Sennrich
27fd45d088 ems: training LM on concatenation of all LM training corpora 2015-05-18 12:18:49 +01:00
Jeroen Vermeulen
e2a632a2b8 JavaScript lint. 2015-05-17 21:36:07 +07:00
Jeroen Vermeulen
5d0bbb6a45 Fix some JavaScript lint. Still a lot left. 2015-05-17 21:24:04 +07:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Jeroen Vermeulen
61162dd242 Fix more Python lint.
Most of the complaints fixed here were from Pocketlint, but many were also
from Syntastic the vim plugin.
2015-05-16 17:26:56 +07:00
Hieu Hoang
abfc0671a3 osm tweaks and morfessor wrapper 2015-05-12 20:19:39 +04:00
Hieu Hoang
8bb18b9ff0 add no-splitter-training argument. Splitter to be used by mada 2015-05-11 15:26:50 +04:00
Barry Haddow
85c1af4d72 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478 mmsapt doesn't require feature weights on first tuning iteration 2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394 output bleu for multi-bleu hack 2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c don't output remaining args twice 2015-05-05 12:15:08 +04:00
Hieu Hoang
8f272e04a9 output debugging messages to stderr, not stdout 2015-05-05 12:01:21 +04:00
Hieu Hoang
d456d9229e add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring 2015-05-03 14:07:12 +04:00
Philipp Koehn
a4a7c14593 allow breaking up training data for fast align (to avoid memory blowups for very large corpora) 2015-05-01 17:47:08 -04:00
Philipp Koehn
b369699661 various small changes, mostly related to better compliance with grid engine 2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980 fix interpolation for LM with parser in pre-processing 2015-04-30 15:46:33 +01:00
Hieu Hoang
4b47e1148c use ignore-unless /Philipp Koehn 2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78 hack to allow target side of tokenized parallel corpus to be used for LM 2015-04-22 19:01:12 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Hieu Hoang
02185a85fb store temp run files in current directory, not /tmp 2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9 leave in runPath for debugging 2015-04-05 16:49:12 +04:00
Hieu Hoang
7ffdddef13 script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo 2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19 Fix some problems in EMS:
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Hieu Hoang
b2f9ba2b64 revert last commit to add MASTER_PATH. Not needed 2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96 pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node 2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Hieu Hoang
e22d275c32 don't ignore lowercasing of factored LM. Must be consistent with pt 2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8 lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Phil Williams
fc15e03ebe Replace truecase-egret.sh with more general tree-converter-wrapper.perl 2015-03-18 09:57:42 +00:00