Commit Graph

2052 Commits

Author SHA1 Message Date
Hieu Hoang
f66beabf4f Generation error in EMS due to pruning. Lets see if this works. 2015-06-28 14:03:54 +04:00
Hieu Hoang
82edbb98a7 comments in ini file about default weights 2015-06-28 10:40:43 +04:00
Hieu Hoang
57e213ed19 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 12:18:21 +04:00
Hieu Hoang
ca54852641 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 11:55:24 +04:00
Hieu Hoang
b83803203e prune generation table in ems 2015-06-25 18:10:31 +04:00
Hieu Hoang
dce0f33270 prune generation table in ems 2015-06-24 18:35:59 +04:00
Barry Haddow
425118aa5d bugfixes - working directory 2015-06-17 09:32:29 +01:00
Barry Haddow
ad8114ddb0 capitalisation 2015-06-15 16:23:12 +01:00
XapaJIaMnu
166bf7365f Forgot to update the weight config path 2015-06-12 16:56:36 +01:00
XapaJIaMnu
ffd3f2bb6e Added basic BilingualNPLM support to EMS and an example config. 2015-06-12 16:21:24 +01:00
Jeroen Vermeulen
dbcc264506 Remove unneeded script.
Tom Hoar, the author of this script, asked me to remove it because it
doesn't actually do what the current name says, and can't work without
an additional script which isn't in the repository.
2015-06-09 23:10:27 +07:00
Lexi Birch
b76194a16b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-06-08 17:13:00 +01:00
Lexi Birch
501c51947b Allowing the truecaser to work on uncased ASR input, pass the -a flag 2015-06-08 16:58:50 +01:00
Jeroen Vermeulen
85c23ed7dc Fix some JS lint. 2015-06-02 18:05:12 +07:00
Jeroen Vermeulen
b3e577be76 Fixing lint. Only 600 or so lines of errors left! 2015-06-02 17:29:32 +07:00
Jeroen Vermeulen
0981d23705 Lint-fixing binge. 2015-06-02 16:02:39 +07:00
Rico Sennrich
5d8af9c289 support memory-mapped files for NPLM training 2015-05-29 16:07:26 +01:00
Jeroen Vermeulen
ef028446f3 Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project.  Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.

I kept the notices as short as I could.  As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Jeroen Vermeulen
26170a4179 Friendlier error reporting in beautify.py. 2015-05-29 09:37:37 +07:00
Barry Haddow
c27aa193ea Revert "Min score parameter". Doesn't work without filter.
This reverts commit ab2d396781.
2015-05-28 17:44:26 +01:00
Barry Haddow
ab2d396781 Min score parameter 2015-05-28 17:10:21 +01:00
Phil Williams
842fc9780e senna2brackets.py: bug fixes + clean-up 2015-05-27 20:33:43 +01:00
Phil Williams
c086a8ee50 Add a wrapper script for parsing English text with SENNA 2015-05-26 16:44:13 +01:00
Rico Sennrich
f6f56d11af ems: parse-relax comes last in train; do same for dev/test 2015-05-25 15:52:07 +01:00
Hieu Hoang
582a845524 don't use zcat 2015-05-24 20:04:01 +04:00
Rico Sennrich
43527c82fc training script for monolingual Neural LM
(+bugfixes and usability improvements for RDLM training)
2015-05-22 15:31:08 +01:00
Rico Sennrich
a1678187fe wrapper for stanford dependency parser 2015-05-22 15:28:42 +01:00
Rico Sennrich
98ff2382d0 duplication of existing functionality 2015-05-20 17:35:38 +01:00
Rico Sennrich
6aac7ded9a EMS: more flexible way to concatenate LM training data.
the implementation allows the user to specify which corpora to combine,
and to have multiple LMs on the same data.
2015-05-20 17:20:02 +01:00
Hieu Hoang
36caf2eb9a escape ^# character otherwise morfessors skips line 2015-05-20 15:28:28 +04:00
Hieu Hoang
79ca96db0a should have tested this 2015-05-19 22:19:11 +04:00
Hieu Hoang
59071bf16c run on all cores if number of cores not given 2015-05-19 18:32:31 +04:00
Rico Sennrich
8ca6764c7d ems: allow LMs with user-specified training commands and moses.ini config entries
intended for neural LMs, syntactic LMs, and the like. currently doesn't play nice with INTERPOLATED-LM.
2015-05-18 19:07:37 +01:00
Rico Sennrich
fb06a2325e fix broken ems with interpolated lm disabled 2015-05-18 17:26:09 +01:00
Rico Sennrich
f85dd85f6b ignore-unless magic 2015-05-18 16:17:33 +01:00
Rico Sennrich
59376f500b still confused about pass-unless vs. ignore-unless 2015-05-18 14:40:56 +01:00
Rico Sennrich
45a97f9016 EMS: disable concatenated LM by default 2015-05-18 14:10:29 +01:00
Hieu Hoang
2c0aecb16b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 16:27:03 +04:00
Hieu Hoang
2f0ee5502e delete debugging info 2015-05-18 16:26:26 +04:00
Rico Sennrich
27fd45d088 ems: training LM on concatenation of all LM training corpora 2015-05-18 12:18:49 +01:00
Hieu Hoang
14d2a67193 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 12:28:35 +04:00
Hieu Hoang
5fdcf372ae rename a python file to have have .py, instead of .perl. In case beautify script depends on it 2015-05-18 12:27:35 +04:00
Jeroen Vermeulen
5aa70c6cdd Also reformat Perl, using Perltidy. 2015-05-18 00:45:15 +07:00
Jeroen Vermeulen
494c20f634 Add note about Perltidy. 2015-05-17 22:48:03 +07:00
Jeroen Vermeulen
e2a632a2b8 JavaScript lint. 2015-05-17 21:36:07 +07:00
Jeroen Vermeulen
5d0bbb6a45 Fix some JavaScript lint. Still a lot left. 2015-05-17 21:24:04 +07:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Jeroen Vermeulen
108da16374 Suppress CSS lint checking; accept longer lines.
The CSS style that Pocketlint expects is just too different from what we
have.  Don't check those files for now.

Also, a maximum line length of 300 still gives too many warnings, so I'm
regretfully dumping the default to 400 characters.  The traditional 80
characters are already longer than the measured optimum for human reading,
so I hope some day we can address this!
2015-05-17 20:03:27 +07:00
Jeroen Vermeulen
07a8fe06aa Also support checking for lint.
Choose which action(s) you want for each run: --format and/or --lint.

Many different types of files are lint-checkable, but you need Pocketlint
installed (plus ideally, its plugins for the various languages).

Also, added option to control batching of the commands.
2015-05-17 18:25:06 +07:00
Jeroen Vermeulen
9bdcb5f7c1 Fix more Python lint.
This is about the last that isn't in contrib or generated files.  At this
point we can start doing regular lint checks, at least on the Python files,
without being completely inundated with warnings.
2015-05-16 18:03:54 +07:00