Commit Graph

2352 Commits

Author SHA1 Message Date
Pierre Lison
e26bbc215f sending the stderr output of the which command to /dev/null 2015-09-16 11:01:22 +02:00
Ulrich Germann
2e3a82a40c Merge pull request #125 from akivajp/master
Fixed for removed option building cooccurrence table.
2015-09-12 11:32:27 +02:00
Akiva Miura
7b15548144 Fixed for removed option building cooccurrence table. 2015-09-07 06:21:04 +09:00
Barry Haddow
90f15cc619 extra nplm settings 2015-09-04 10:07:50 +01:00
Barry Haddow
4746970bf8 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-09-04 10:06:58 +01:00
Barry Haddow
e58ddf74e8 parameter changes 2015-09-04 10:06:54 +01:00
Tomáš Fulajtár
1a26cb8414 Added a simple support for the factored systems. 2015-08-27 15:15:32 +02:00
Hieu Hoang
d349bf8a94 dos2unix everything 2015-08-23 19:00:19 +04:00
Matthias Huck
d5c41634e8 EMS: fix filtering issue when output-splitter is defined 2015-08-21 18:58:36 +01:00
Matthias Huck
261cfdb024 perl shebang 2015-08-19 16:59:10 +01:00
Hieu Hoang
4ff776f564 dos2unix the whole lot 2015-08-19 16:30:31 +04:00
Hieu Hoang
97eed02301 dos2unix. Revert Matthias' change, use env 2015-08-19 13:25:29 +01:00
Matthias Huck
d583e0888f make-factor-de-pos.perl 2015-08-18 18:31:41 +01:00
Hieu Hoang
3a261c9fc9 don't hardcode amount of mem to be used by lmplz 2015-08-16 20:32:07 +04:00
Phil Williams
01a9dd2305 extract-target-trees.py: support for new-style trace files 2015-08-14 16:53:24 +01:00
Ulrich Germann
883c34aee9 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/SearchNormalBatch.cpp
	moses/TranslationModel/UG/mm/ug_bitext.h
	moses/TranslationModel/UG/mm/ug_typedefs.h
	moses/TranslationModel/UG/mmsapt.cpp
	moses/TranslationModel/UG/mmsapt.h
2015-08-07 14:14:19 +01:00
Barry Haddow
4c3a6a3f3f remove dash 2015-08-03 21:19:08 +01:00
Barry Haddow
57b0c351c0 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-08-03 16:47:31 +01:00
Barry Haddow
f808b32030 support version of nplm that picks best on heldout 2015-08-03 16:47:25 +01:00
Barry Haddow
a39544bbcb fix inconsistency in the example 2015-08-03 16:46:57 +01:00
Hieu Hoang
bfd45fdfc3 don't use all threads 2015-08-01 11:35:47 +04:00
Hieu Hoang
7ac6f90a4d Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-07-31 22:29:07 +04:00
Hieu Hoang
f894dec0fd multi-threaded decoding by default /Vincent Nguyen 2015-07-31 22:28:45 +04:00
Rico Sennrich
89d16a491a fix ems regression (concatenate-split step) 2015-07-31 11:20:29 +01:00
Philipp Koehn
a5ee3c1b6d script to copy model files to local disk before running the decoder - useful for grid 2015-07-29 11:10:13 -04:00
Philipp Koehn
836ca8212a better support of grid engine cluster 2015-07-29 11:03:24 -04:00
Philipp Koehn
b29166e2fe fix of fix 2015-07-29 11:02:55 -04:00
Philipp Koehn
837bcde69d bug with legacy 2015-07-29 10:46:09 -04:00
Philipp Koehn
ae9cd14948 fixes 2015-07-29 10:44:57 -04:00
Hieu Hoang
5f4b497b63 Merge pull request #119 from cidermole/master
Fix 'Use of uninitialized value' error through explicit setting of 0s…
2015-07-29 16:25:39 +04:00
Barry Haddow
3a2116b2c9 add quotes so arguments don't get lost 2015-07-29 09:35:19 +01:00
Rico Sennrich
0173512ddc separate xml version of combine_factors.pl
xml version causes slow-down for users who use factors, but not xml (which are most)
2015-07-28 23:30:18 +01:00
Phil Williams
2cda286a06 experiment.meta: re-run fast_align symmetrization if symmetrization type changes 2015-07-28 16:55:55 +01:00
Rico Sennrich
a968536176 ems fix: pass-unless doesn't understand AND 2015-07-28 16:37:50 +01:00
Ulrich Germann
d67723fd29 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into ranked-sampling
Conflicts:
	moses/TargetPhrase.cpp
	moses/TargetPhrase.h
2015-07-28 14:29:49 +01:00
Barry Haddow
e53ad40859 Support for nplm in ems 2015-07-23 10:37:26 +01:00
Philipp Koehn
1a795f549e only extract reordering phrase pairs if use mmsapt phrase table 2015-07-20 11:47:24 -04:00
Philipp Koehn
777a88673d compress sort tmp files by default 2015-07-20 11:46:47 -04:00
Philipp Koehn
496f8c6d85 only extract reordering phrase pairs if use mmsapt phrase table 2015-07-20 11:44:22 -04:00
Philipp Koehn
fcf2934a2f customized phrase table pruning step 2015-07-20 11:43:02 -04:00
Rico Sennrich
7b19d83d43 xml support for combine-factors.pl 2015-07-20 10:45:18 +01:00
Rico Sennrich
1b1bafb1e8 ems: add option to factorize after truecase/split/etc. 2015-07-20 10:43:23 +01:00
Rico Sennrich
e85f353898 code simplification by removing language-specific, unused hack. 2015-07-20 10:39:01 +01:00
Ulrich Germann
df50e454d2 Proper handling of moses parameters with double dash in create_config(). 2015-07-17 14:32:30 +01:00
Phil Williams
c83628a92b Fix errors from multiline `` commands in transliteration Perl scripts
Replace the backslash-newline sequence with backslash-backslash-newline in
multiline backquote command strings.  i.e. replace expressions like this:

  `some-command \
    -option1 \
    -option2`;

with ones like this

  `some-command \\
    -option1 \\
    -option2`;

If I understand this right, the shell converts a backslash-newline sequence
to an empty string (i.e. it discards it), but Perl does not.  Unless the
backslash itself is escaped, using a backslash-newline in a Perl command
string results in errors in most instances.  By escaping the backslash, it
gets passed through to the shell where it is interpreted as intended.
2015-07-16 14:54:00 +01:00
Philipp Koehn
66ecf98cf7 minor bug fix 2015-07-14 11:01:22 -04:00
Rico Sennrich
ca72105fdf fix ems regression 2015-07-14 13:16:25 +01:00
David Madl
3c30210dad Fix 'Use of uninitialized value' error through explicit setting of 0s in hash.
Fixes the following errors in bootstrap-hypothesis-difference-significance.pl on Perl v5.14.2:

Use of uninitialized value $coocUpd in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 317.
Use of uninitialized value $b in numeric lt (<) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 543.
Use of uninitialized value $coocUpd in addition (+) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 314.
Use of uninitialized value $coocUpd in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 317.
Use of uninitialized value $a in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 552.
2015-07-14 13:05:22 +01:00
Philipp Koehn
7e3050f7f2 allow saving of model from fast-align (for incremental use) 2015-07-14 05:27:03 -04:00
Barry Haddow
3fdbb00904 Improvements to handling of bilingual LM in EMS 2015-07-10 15:44:24 +01:00
Hieu Hoang
f66beabf4f Generation error in EMS due to pruning. Lets see if this works. 2015-06-28 14:03:54 +04:00
Hieu Hoang
82edbb98a7 comments in ini file about default weights 2015-06-28 10:40:43 +04:00
Hieu Hoang
57e213ed19 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 12:18:21 +04:00
Hieu Hoang
ca54852641 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 11:55:24 +04:00
Hieu Hoang
b83803203e prune generation table in ems 2015-06-25 18:10:31 +04:00
Hieu Hoang
dce0f33270 prune generation table in ems 2015-06-24 18:35:59 +04:00
Barry Haddow
425118aa5d bugfixes - working directory 2015-06-17 09:32:29 +01:00
Barry Haddow
ad8114ddb0 capitalisation 2015-06-15 16:23:12 +01:00
XapaJIaMnu
166bf7365f Forgot to update the weight config path 2015-06-12 16:56:36 +01:00
XapaJIaMnu
ffd3f2bb6e Added basic BilingualNPLM support to EMS and an example config. 2015-06-12 16:21:24 +01:00
Jeroen Vermeulen
dbcc264506 Remove unneeded script.
Tom Hoar, the author of this script, asked me to remove it because it
doesn't actually do what the current name says, and can't work without
an additional script which isn't in the repository.
2015-06-09 23:10:27 +07:00
Lexi Birch
b76194a16b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-06-08 17:13:00 +01:00
Lexi Birch
501c51947b Allowing the truecaser to work on uncased ASR input, pass the -a flag 2015-06-08 16:58:50 +01:00
Jeroen Vermeulen
85c23ed7dc Fix some JS lint. 2015-06-02 18:05:12 +07:00
Jeroen Vermeulen
b3e577be76 Fixing lint. Only 600 or so lines of errors left! 2015-06-02 17:29:32 +07:00
Jeroen Vermeulen
0981d23705 Lint-fixing binge. 2015-06-02 16:02:39 +07:00
Rico Sennrich
5d8af9c289 support memory-mapped files for NPLM training 2015-05-29 16:07:26 +01:00
Jeroen Vermeulen
ef028446f3 Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project.  Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.

I kept the notices as short as I could.  As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Jeroen Vermeulen
26170a4179 Friendlier error reporting in beautify.py. 2015-05-29 09:37:37 +07:00
Barry Haddow
c27aa193ea Revert "Min score parameter". Doesn't work without filter.
This reverts commit ab2d396781.
2015-05-28 17:44:26 +01:00
Barry Haddow
ab2d396781 Min score parameter 2015-05-28 17:10:21 +01:00
Phil Williams
842fc9780e senna2brackets.py: bug fixes + clean-up 2015-05-27 20:33:43 +01:00
Phil Williams
c086a8ee50 Add a wrapper script for parsing English text with SENNA 2015-05-26 16:44:13 +01:00
Rico Sennrich
f6f56d11af ems: parse-relax comes last in train; do same for dev/test 2015-05-25 15:52:07 +01:00
Hieu Hoang
582a845524 don't use zcat 2015-05-24 20:04:01 +04:00
Rico Sennrich
43527c82fc training script for monolingual Neural LM
(+bugfixes and usability improvements for RDLM training)
2015-05-22 15:31:08 +01:00
Rico Sennrich
a1678187fe wrapper for stanford dependency parser 2015-05-22 15:28:42 +01:00
Rico Sennrich
98ff2382d0 duplication of existing functionality 2015-05-20 17:35:38 +01:00
Rico Sennrich
6aac7ded9a EMS: more flexible way to concatenate LM training data.
the implementation allows the user to specify which corpora to combine,
and to have multiple LMs on the same data.
2015-05-20 17:20:02 +01:00
Hieu Hoang
36caf2eb9a escape ^# character otherwise morfessors skips line 2015-05-20 15:28:28 +04:00
Hieu Hoang
79ca96db0a should have tested this 2015-05-19 22:19:11 +04:00
Hieu Hoang
59071bf16c run on all cores if number of cores not given 2015-05-19 18:32:31 +04:00
Rico Sennrich
8ca6764c7d ems: allow LMs with user-specified training commands and moses.ini config entries
intended for neural LMs, syntactic LMs, and the like. currently doesn't play nice with INTERPOLATED-LM.
2015-05-18 19:07:37 +01:00
Rico Sennrich
fb06a2325e fix broken ems with interpolated lm disabled 2015-05-18 17:26:09 +01:00
Rico Sennrich
f85dd85f6b ignore-unless magic 2015-05-18 16:17:33 +01:00
Rico Sennrich
59376f500b still confused about pass-unless vs. ignore-unless 2015-05-18 14:40:56 +01:00
Rico Sennrich
45a97f9016 EMS: disable concatenated LM by default 2015-05-18 14:10:29 +01:00
Hieu Hoang
2c0aecb16b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 16:27:03 +04:00
Hieu Hoang
2f0ee5502e delete debugging info 2015-05-18 16:26:26 +04:00
Rico Sennrich
27fd45d088 ems: training LM on concatenation of all LM training corpora 2015-05-18 12:18:49 +01:00
Hieu Hoang
14d2a67193 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 12:28:35 +04:00
Hieu Hoang
5fdcf372ae rename a python file to have have .py, instead of .perl. In case beautify script depends on it 2015-05-18 12:27:35 +04:00
Jeroen Vermeulen
5aa70c6cdd Also reformat Perl, using Perltidy. 2015-05-18 00:45:15 +07:00
Jeroen Vermeulen
494c20f634 Add note about Perltidy. 2015-05-17 22:48:03 +07:00
Jeroen Vermeulen
e2a632a2b8 JavaScript lint. 2015-05-17 21:36:07 +07:00
Jeroen Vermeulen
5d0bbb6a45 Fix some JavaScript lint. Still a lot left. 2015-05-17 21:24:04 +07:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Jeroen Vermeulen
108da16374 Suppress CSS lint checking; accept longer lines.
The CSS style that Pocketlint expects is just too different from what we
have.  Don't check those files for now.

Also, a maximum line length of 300 still gives too many warnings, so I'm
regretfully dumping the default to 400 characters.  The traditional 80
characters are already longer than the measured optimum for human reading,
so I hope some day we can address this!
2015-05-17 20:03:27 +07:00
Jeroen Vermeulen
07a8fe06aa Also support checking for lint.
Choose which action(s) you want for each run: --format and/or --lint.

Many different types of files are lint-checkable, but you need Pocketlint
installed (plus ideally, its plugins for the various languages).

Also, added option to control batching of the commands.
2015-05-17 18:25:06 +07:00
Jeroen Vermeulen
9bdcb5f7c1 Fix more Python lint.
This is about the last that isn't in contrib or generated files.  At this
point we can start doing regular lint checks, at least on the Python files,
without being completely inundated with warnings.
2015-05-16 18:03:54 +07:00
Jeroen Vermeulen
61162dd242 Fix more Python lint.
Most of the complaints fixed here were from Pocketlint, but many were also
from Syntastic the vim plugin.
2015-05-16 17:26:56 +07:00
Jeroen Vermeulen
0ffe79579e Fix some python lint.
I used mainly pocketlint, a very good Python linter, but also Syntastic,
a vim plugin.  Didn't get anywhere near fixing all of Syntastic's complaints
though.

Once I've cleaned up all (or at least most) of the Python lint, we can
start doing regular automated lint checks and keep the code clean.
2015-05-16 14:58:03 +07:00
Jeroen Vermeulen
f1ed14eb33 Move ignored path prefixes into config file.
The path prefixes listed in .beautify-ignore, in the project root, will not
be cleaned up.  C and C++ files everywhere else will be.

Also fixes bugs in the prefix-matching code, and makes the matching a little
bit more powerful: the prefix can now extend down into the directory tree.
2015-05-15 16:40:06 +07:00
Jeroen Vermeulen
d7599134b8 Remove the --any-astyle option.
The risk from flip-flopping on styles is too great: 2.04 formatting has
many changes from 2.01 formatting.

Also, fix a broken version check.
2015-05-15 15:24:54 +07:00
Jeroen Vermeulen
3e821b56dd Rewrite beautify script in Python.
The new version is much longer, but hopefully extensible, reusable, and
easy to read.  With a few more changes it will let us do three things:

1. Apply more checks and cleanups.
2. Clean up additional file types.
3. Use the same script for mgiza++, so we get a uniform code style.

The "more cleanups" could be more things like the removal of trailing
whitespace which we just added.  We may even want to run lint checkers of
some sort.

Additional file types could be e.g. Perl scripts, build configuration,
or documentation.

In order to make the script reusable we'll have to generalize the
skip_at_root list in list_c_like_files into something like a configuration
file.
2015-05-15 13:46:00 +07:00
Hieu Hoang
5173b9f617 beautify. Add sed for trailing spaces 2015-05-13 11:29:16 +01:00
Hieu Hoang
87e1f1351f tighten up OSM build. More debugging output, to stderr not stdout. lmplz uses outdir as temp directory 2015-05-13 12:29:56 +04:00
Hieu Hoang
0cd62488bf morfessor wrapper 2015-05-12 20:40:19 +04:00
Hieu Hoang
abfc0671a3 osm tweaks and morfessor wrapper 2015-05-12 20:19:39 +04:00
Hieu Hoang
a922245864 default to using lmplz for convenience and because SRILM uses tonnes of memory 2015-05-12 11:44:05 +04:00
Hieu Hoang
03e507f354 default to using lmplz for convenience and because SRILM uses tonnes of memory 2015-05-12 10:40:52 +04:00
Hieu Hoang
8bb18b9ff0 add no-splitter-training argument. Splitter to be used by mada 2015-05-11 15:26:50 +04:00
Barry Haddow
85c1af4d72 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478 mmsapt doesn't require feature weights on first tuning iteration 2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394 output bleu for multi-bleu hack 2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c don't output remaining args twice 2015-05-05 12:15:08 +04:00
Hieu Hoang
5fefb0da47 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-05 12:02:13 +04:00
Hieu Hoang
8f272e04a9 output debugging messages to stderr, not stdout 2015-05-05 12:01:21 +04:00
Nicola Bertoldi
90a982e579 merge remote into local 2015-05-04 09:42:44 +02:00
Hieu Hoang
d456d9229e add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring 2015-05-03 14:07:12 +04:00
Hieu Hoang
e5f76ee99e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-03 11:50:31 +04:00
Hieu Hoang
73ae7d7e20 option not to use parallel 2015-05-03 11:50:10 +04:00
Philipp Koehn
a4a7c14593 allow breaking up training data for fast align (to avoid memory blowups for very large corpora) 2015-05-01 17:47:08 -04:00
Philipp Koehn
de6a9bd1b3 minor updates to factor scripts; brown-cluster may now run other scripts (e.g., truecaser) before assigning classes 2015-05-01 17:46:14 -04:00
Philipp Koehn
b369699661 various small changes, mostly related to better compliance with grid engine 2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980 fix interpolation for LM with parser in pre-processing 2015-04-30 15:46:33 +01:00
Hieu Hoang
1278b8f5a7 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-30 15:35:34 +04:00
Hieu Hoang
a6d34a660d another madamira wrapper. Just uses the tokenized file it outputs 2015-04-30 15:35:15 +04:00
Hieu Hoang
15e4b16f49 delete unused var 2015-04-30 14:01:03 +04:00
Hieu Hoang
ebc5a51d32 Merge pull request #111 from unhammer/extract-perl-safewait
die if the forked extract exited with error
2015-04-30 11:45:25 +04:00
Hieu Hoang
1c99b2b2b8 Merge pull request #110 from unhammer/extract-perl-abspath-when-ln
avoid bad symlinks in extract-parallel
2015-04-30 11:41:03 +04:00
Kevin Brubeck Unhammer
2af2f2ef36 avoid bad symlinks in extract-parallel
train-model seems to pass a non-absolute path for the
model/aligned-argument, and then extract-parallel creates a bad symlink
2015-04-30 09:36:59 +02:00
Kevin Brubeck Unhammer
c116fa0dbf die if the forked extract exited with error
Should we pass on bad exit codes from RunFork to those waitpids as well?
Seems like the right thing, though I don't know the code.
2015-04-30 09:33:37 +02:00
Nicola Bertoldi
3400b622c0 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-30 08:35:41 +02:00
Hieu Hoang
8f9bf7ea38 add -config 2015-04-28 15:03:59 +04:00
Hieu Hoang
b7792b227a script to convert arabic to bw, and vice versa 2015-04-28 12:29:58 +04:00
Hieu Hoang
8adad4fc2e exec permission 2015-04-27 17:39:49 +04:00
Hieu Hoang
a47fc00635 option to output factors 2015-04-27 17:35:19 +04:00
Rico Sennrich
da648fd65b fix some RDLM training options 2015-04-27 10:52:16 +01:00
alvations
fa30ea6712 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into moses-smt-master 2015-04-26 20:38:27 +02:00
alvations
4a68c42b16 syncing to latest moses version 2015-04-26 20:37:10 +02:00
alvations
ec54ea3c4f put back some of the difference made after RELEASE3.0 and incorporated it with the -threads parameter 2015-04-26 20:30:15 +02:00
alvations
c01b0a6262 merging the filter-model-given-input.pl with alvations-master branch 2015-04-26 20:25:15 +02:00
alvations
dda3ddd80b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-26 20:23:39 +02:00
alvations
e1fcc8082a use integer type when reading options instead of checking for undef. it's more elegant. 2015-04-24 19:30:40 +02:00
alvations
d453ccc9f5 removed wrongly added perl script... 2015-04-24 19:14:04 +02:00
alvations
0ccbcaece6 added $threads option in usage example 2015-04-24 19:11:57 +02:00
alvations
aa9207acfc fixed typo in $thread to $threads 2015-04-24 19:10:10 +02:00
alvations
6c63ca963c checks for undefined $threads 2015-04-24 19:06:37 +02:00
alvations
585784f62a added thread options for filter-model-given-input.pl 2015-04-24 18:57:28 +02:00
Hieu Hoang
4b47e1148c use ignore-unless /Philipp Koehn 2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78 hack to allow target side of tokenized parallel corpus to be used for LM 2015-04-22 19:01:12 +04:00
Nicola Bertoldi
5700fbaabf Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-22 07:50:07 +02:00
Hieu Hoang
c15f3ef068 duplicated functionality with ems/support/lmplz-wrapper.perl 2015-04-21 17:54:34 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
95435f2a2e better detection of pigz 2015-04-20 20:40:50 +04:00
Hieu Hoang
eb37437d09 don't output warnings. It wasn't originally there before the 'env perl' change. This script should be tightened up at some point, eg use strict, debug warning messages 2015-04-20 16:18:51 +04:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Hieu Hoang
8190a5e1d6 Merge pull request #107 from flammie/master
Finnish detokenisation
2015-04-11 14:20:37 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Michael Denkowski
2682cc0f9b typo fix 2015-04-07 17:06:18 -04:00
Flammie Pirinen
fc8ee03b8d examples 2015-04-07 16:19:07 +01:00
Flammie Pirinen
ef52bc66f6 full set of cases and caps 2015-04-07 16:16:43 +01:00
Flammie Pirinen
85230e8334 add fi to list to silence warnings 2015-04-07 16:00:48 +01:00
Flammie Pirinen
f9deb6de3b also lowercase if case fail 2015-04-07 15:55:02 +01:00
Flammie Pirinen
5817806ec7 fix detokenising : in abbrev. case suffix case 2015-04-07 15:51:29 +01:00
Hieu Hoang
54e55f2dcb better detection of pigz, sort, split. In case they are not in the default directory 2015-04-06 11:31:44 +04:00
Hieu Hoang
02185a85fb store temp run files in current directory, not /tmp 2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9 leave in runPath for debugging 2015-04-05 16:49:12 +04:00
Hieu Hoang
4cb8a1837e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-05 16:45:17 +04:00
Hieu Hoang
7ffdddef13 script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo 2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19 Fix some problems in EMS:
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Rico Sennrich
8d8097632b re-apply commit 1fb51dc (use gunzip -c instead of zcat)
plus be more tolerant about xml input
2015-04-03 15:00:45 +01:00
Hieu Hoang
b2f9ba2b64 revert last commit to add MASTER_PATH. Not needed 2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96 pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node 2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Hieu Hoang
035c806059 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-02 14:10:42 +04:00
Hieu Hoang
e76247e19b Conditional import of Thread package for perl installations that don't support threads 2015-04-02 14:07:57 +04:00
Hieu Hoang
e22d275c32 don't ignore lowercasing of factored LM. Must be consistent with pt 2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8 lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
alvations
496a2a716c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-03-23 15:55:56 +01:00
Matthias Huck
506427368f filter-model-given-input.pl: drop "-encoding None" from phrase table binaization with processPhraseTableMin. Recommended by Marcin. 2015-03-23 14:38:24 +00:00
Hieu Hoang
c4af7d28b5 Merge pull request #100 from alvations/master
Added the Gacha Filter used in WMT14 by the Manawi system
2015-03-21 22:42:55 +00:00
alvations
e5feb1a73e Enforce python3 and also remove extra empty newline from STDOUT 2015-03-20 19:37:34 +01:00
alvations
8f2d687d27 added more description of usage in docstring 2015-03-20 19:00:36 +01:00
alvations
44cd32d058 fixed typo in error message 2015-03-20 18:53:03 +01:00
alvations
93ea5853e8 Added Gacha Filter used by Manawi system from WMT14 translation task 2015-03-20 18:48:47 +01:00
alvations
5536e13213 added Gacha Filter from WMT14 2015-03-20 18:44:59 +01:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Rico Sennrich
ca08b1d205 reduce-factors: port xml support from train-model.perl 2015-03-20 14:44:48 +00:00
Rico Sennrich
b8ca33c34e RDLM training without editing bash scripts 2015-03-20 14:12:41 +00:00
Rico Sennrich
2271f295e6 nplm_train: more options 2015-03-20 14:12:41 +00:00
Rico Sennrich
eab513b635 relational dependency language model 2015-03-18 17:39:45 +00:00
Phil Williams
fc15e03ebe Replace truecase-egret.sh with more general tree-converter-wrapper.perl 2015-03-18 09:57:42 +00:00
Phil Williams
ac51e9f0a8 Always use "SyntaxInputWeight0" as name of SyntaxInputWeight feature 2015-03-18 09:56:46 +00:00
Phil Williams
0a8e5fb3bf EMS: fix TRAINING:use-syntax-input-weight-feature option 2015-03-13 17:18:56 +00:00
Hieu Hoang
ce8b0e0876 fix example for reusing tuned moses.ini file 2015-03-13 15:07:23 +00:00
Phil Williams
05872cf32f Add tree-converter-mosesxml.sh wrapper script 2015-03-12 22:27:43 +00:00
Phil Williams
4685474e9b parse-en-egret.perl: wrap tree in parentheses prior to conversion to XML 2015-03-12 09:49:28 +00:00
Philipp Koehn
530d0f5a11 some more better defaults for recaser 2015-03-11 17:56:02 +00:00
Philipp Koehn
2ce45229f8 better default configuration for recaser 2015-03-11 17:52:30 +00:00
Philipp Koehn
1632c5f39d proper handling of specified configuration file 2015-03-11 16:49:20 +00:00
Matthias Huck
01bed83cf9 GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label) 2015-03-10 21:25:32 +00:00
Phil Williams
c7cf33ee05 parse-en-egret.perl: use "ROOT" instead of "TOP" as label of root tree node
This is to match the label Egret assigns to the root vertices of forests.
2015-03-10 15:43:14 +00:00
Phil Williams
77faaaea6c Add truecase-egret.sh
This is currently just a wrapper for Travatar's tree-converter tool.
2015-03-10 14:36:28 +00:00
Phil Williams
f7b4d403e3 Add parse-en-egret.perl wrapper script. 2015-03-10 14:32:59 +00:00
Phil Williams
9e2eb702dc EMS: add TRAINING:use-syntax-input-weight-feature option 2015-03-10 11:40:49 +00:00
Phil Williams
91abb69cdf train-model.perl: add -use-syntax-input-weight-feature option
Currently only used for forest input.
2015-03-10 11:39:14 +00:00
Phil Williams
7eba58b942 EMS: add TRAINING:dont-tune-glue-grammar option
Adds -dont-tune-glue-grammar to train-model.perl command during config file
generation step.  This is preferable to manually adding -dont-tune-glue-grammar
to TRAINING:training-options because changing its value won't trigger a re-run
of dependent steps that don't really need re-running (like word alignment).
2015-03-10 10:20:19 +00:00
Phil Williams
e79644540c train-model.perl: add -dont-tune-glue-grammar option 2015-03-10 09:53:12 +00:00
Phil Williams
fd3dcb7bb0 filter-model-given-input.pl: add -[no]StripXml and -SyntaxFilterCmd options
-noStripXml is required for tree and forest input in STSG-based models.

-SyntaxFilterCmd can be used to set the command for filtering rule tables in
syntax-based models.  The default is to use

    $SCRIPTS_ROOTDIR/../bin/filter-rule-table

The option -MinNonInitialRuleCount is deprecated.
2015-03-10 08:57:56 +00:00
Phil Williams
70bef90b36 train-model.perl: add -score-command option
This matches the existing -extract-command option.  Given the argument value
<name>, train-model.perl will use the score program in

  $SCRIPTS_ROOTDIR/../bin/<name>

The default value is "score".
2015-03-10 08:48:54 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Hieu Hoang
cb2e1b8a40 separate variables into lines. Easier to merge with other branches 2015-03-05 21:37:30 +00:00
Hieu Hoang
0f5556f6d9 separate variables into lines. Easier to merge with other branches 2015-03-05 21:28:51 +00:00
Rico Sennrich
2431f514dd fix EMS bug from dca8dd: cleaning step was skipped 2015-03-05 10:55:35 +00:00
Rico Sennrich
47c460fe1d remove unused variable 2015-03-05 08:31:50 +00:00
Matthias Huck
638e9c3f60 POS property: map tags to indices in consolidate 2015-03-04 22:48:34 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Rico Sennrich
ff5502d323 off-by-one error in previous commit 2015-03-04 17:25:19 +00:00
Rico Sennrich
71ab598435 extract_test.py should also create numberized corpus 2015-03-04 17:10:06 +00:00
Rico Sennrich
dca8ddc746 EMS convenience:
- merge clean-corpus-n-ratio.perl and clean-corpus-n.perl (use variable 'cleaner' in EMS to call cleaning script with extra arguments)
  - use low default weight for glue rules in syntax systems (especially useful with 'tuneable=false')
2015-03-04 14:43:05 +00:00
Rico Sennrich
f9ec387a5b typo 2015-03-04 10:06:03 +00:00
Phil Williams
90e8d4940c EMS: add TRAINING:no-glue-grammar option 2015-03-03 12:36:09 +00:00
Nicola Bertoldi
37b7162018 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-02-28 22:09:41 +01:00
Rico Sennrich
e2b1ac1e9d fix option --return-best-dev with hypergraph MIRA (which I broke in commit d39cbca0b9) 2015-02-27 14:47:37 +00:00
Philipp Koehn
39c0068e4f discount_fallback for lmplz 2015-02-26 22:21:50 +00:00
Marcin Junczys-Dowmunt
a3d2adca50 Update filter-model-given-input.pl
Added -encoding None to force single pass for compact phrase table so it works with pipes.
2015-02-26 14:04:06 +01:00
Ondrej Bojar
441a2bb190 safer binarizer execution, bash, sort tempdir 2015-02-24 00:36:29 +01:00
Matthias Huck
8025cbf350 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-02-16 15:10:15 +00:00
Nicola Bertoldi
ff4a103826 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-02-13 14:52:31 +01:00
Barry Haddow
34b139e2ae Remove debug 2015-02-13 12:14:18 +00:00
Phil Williams
92a21f9d3a train-model.perl: fix "argument isn't numeric" warning 2015-02-13 11:55:39 +00:00
Phil Williams
7e54e23fe2 Update transliteration scripts to use the on-disk phrase table
The scripts now use CreateOnDiskPt instead of processPhraseTable (which
is no longer supported and was removed by commit f3a84fc01).
2015-02-13 11:36:16 +00:00
Kenneth Heafield
ee39fdbaa5 Relative path 2015-02-10 10:43:10 -05:00
Charley C
e40606d08f default path update in train-recaser 2015-02-09 18:36:31 -05:00
Matthias Huck
53ce063214 tuneable-components config parameter for feature functions 2015-02-09 13:52:05 +00:00
Philipp Koehn
f69c1dab02 more efficient default recaser training 2015-02-04 09:18:09 +00:00
Nicola Bertoldi
a1539505c8 minor change to make extract-parallel.perl compliant with MacOSX split command 2015-02-04 09:02:51 +01:00
Hieu Hoang
78f79632b9 script to convert moses.ini v2 to v1 /Tom Hoar 2015-02-03 10:59:38 +00:00
Kenneth Heafield
925565a0b9 "just put it in. I'll verify it if i can be bovvered" --Hieu /usr/bin/env 2015-01-29 18:37:05 -05:00
Matthias Huck
449d9b294b Revert "env perl shebang"
This reverts commit 34f2801f8a.

Caused problems because /bin/env doesn't exist on Ubuntu 12.04.
/usr/bin/env does, though.
2015-01-29 21:15:20 +00:00
Kenneth Heafield
34f2801f8a env perl shebang 2015-01-27 18:35:54 -05:00
XapaJIaMnu
6ca1a4718c Expose learning rate as a parameter 2015-01-25 02:13:47 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Hieu Hoang
59c4baec3f use utf8 german model 2015-01-22 16:10:12 +00:00
Kenneth Heafield
7c507bfa74 May is not an abbreviation 2015-01-19 16:37:57 -05:00
Hieu Hoang
30e31d4a95 don't normalise quotes if tokenizing like Penn /Phil Williams 2015-01-16 12:34:22 +00:00
Hieu Hoang
19d7c44aad move normalisation of quotes into normalize-punctuation.perl /Tom Hoar 2015-01-16 11:37:31 +00:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Hieu Hoang
90d4b2d713 use pigz rather than gzip if it exists 2015-01-13 15:16:22 +00:00
Hieu Hoang
6186262a3b don't use processPhraseTable in EMS 2015-01-12 12:43:51 +00:00
Hieu Hoang
a8d4b81e71 Revert "Update train-model.perl"
This reverts commit e1e14a91ee.
2015-01-08 16:07:40 +00:00
Hieu Hoang
5336598734 beatify 2015-01-08 08:29:56 +00:00
Philipp Koehn
0441fd6ab9 added informative error message when trying to build a lexicalized reordering model with hierarchical model 2015-01-06 18:46:02 +00:00
Hieu Hoang
0a707597d8 Revert "Added error message on experiment.meta for the filter step 'No phrases in'"
This reverts commit 2105423626.
2015-01-03 21:58:15 +05:30
Eleftherios Avramidis
2105423626 Added error message on experiment.meta for the filter step 'No phrases in' 2014-12-28 18:09:33 +01:00
Philipp Koehn
59fdb3d99c same spec for dedicated script as for train-model.perl and filter-model-given-input.pl 2014-12-21 01:37:05 +00:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Rico Sennrich
67e101b07a Revert "Update train-model.perl"
This reverts commit 41f06a01c0.
2014-12-17 17:51:02 +00:00
Rico Sennrich
685f18ca1b documentation/readability 2014-12-16 17:42:17 +00:00
Nicola Bertoldi
d0cddf0f2d Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-12-16 17:35:47 +01:00
Nicola Bertoldi
4e77665d30 better handling of cache-based models with inconsistent parameters 2014-12-15 17:42:41 +01:00
Xiang Li
41f06a01c0 Update train-model.perl
If the final alignment model is model 3-5, the hmm model will be trained.
2014-12-16 00:37:15 +08:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Hieu Hoang
5ae5a630a6 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-12-12 10:04:58 +00:00
Kenneth Heafield
8bbccd441a Fix #85 by changing the default LM. Hieu said it's ok in the issue. 2014-12-11 23:51:48 -05:00
Hieu Hoang
c48a3aadc1 chmod 2014-12-11 16:54:19 +00:00
Hieu Hoang
765d8d1350 Merge pull request #83 from lixiangnlp/patch-1
Update train-model.perl
2014-12-10 15:48:35 +00:00
Phil Williams
1353aa57dc experiment.meta: fixes for $input-parse-relaxer 2014-12-08 16:26:08 +00:00
Phil Williams
60e56efc6b phrase-extract: add syntax-common sub-library
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Kenneth Heafield
f97ed79a70 Month abbreviations shouldn't be causing a sentence split.
Yes this will break existing tokenized data :-(.
2014-12-05 03:41:01 -05:00
Philipp Koehn
9d55ce13c0 change for thot integration 2014-12-02 14:05:56 -05:00
Xiang Li
e1e14a91ee Update train-model.perl
The default hmm iterations of GIZA++ is 5. Even though the "hmm-align" option is not set. The hmm align is also activated when using the training script.
2014-12-01 11:26:53 +08:00
Rico Sennrich
4ca730a67c improve bilingualLM alignment heuristics consistency 2014-11-26 10:32:41 +00:00
Rico Sennrich
ee759bfede move bilingual-lm training scripts 2014-11-26 10:32:37 +00:00
Tomáš Musil
4cb81e3093 lmtype now preferred as symbolic name 2014-11-24 12:20:36 +01:00
Hieu Hoang
c0be182bfa makemteval and small change to tokenizer. /Tom Hoar and Tomas Fulajtar 2014-11-21 13:55:13 +00:00
XapaJIaMnu
52c520c042 Resolve merge conflicts 2014-11-20 15:50:32 +00:00
Hieu Hoang
e27f6b0120 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-11-15 14:32:49 +00:00
Hieu Hoang
67ad197d5a take out PYTHONIOENCODING=utf-8. Rely on Rico's python changes 2014-11-15 14:32:31 +00:00
XapaJIaMnu
a343837095 Add option to choose activation function during nplm training 2014-11-15 11:54:47 +00:00
Rico Sennrich
b0b5eef0c6 fix metric interpolation with mert 2014-11-14 14:35:32 +00:00
Hieu Hoang
acd3ac964a set PYTHONIOENCODING=utf-8 before running merge_alignment.py 2014-11-14 14:34:31 +00:00
Hieu Hoang
1c27e05a06 softlink for moses_chart 2014-11-14 13:56:56 +00:00
XapaJIaMnu
d5567b6cfb Training: Do the preparation step ourselves. No validation support yet. No decoder support yet. 2014-11-13 16:14:17 +00:00
Rico Sennrich
8fd3be9e4e add EOS token </s> to each sentence 2014-11-13 16:14:16 +00:00
Rico Sennrich
f26fc251d5 sort vocab by frequency 2014-11-13 16:14:16 +00:00
XapaJIaMnu
bb70f60f67 grrr 2014-11-13 16:14:16 +00:00
XapaJIaMnu
e330ab35d5 Short option must be only one letter 2014-11-13 16:14:16 +00:00
XapaJIaMnu
a74105ea7d Fix a wrong condition 2014-11-13 16:14:16 +00:00
XapaJIaMnu
e54c171850 Make it optional to prepare the validation set 2014-11-13 16:14:16 +00:00
XapaJIaMnu
a300824bd1 Add optional validation during training 2014-11-13 16:14:16 +00:00
XapaJIaMnu
0451142ece Add null token normalization for models to be used with the chart decoder. 2014-11-13 16:13:38 +00:00
XapaJIaMnu
aae894fe6b Add null token in vocabulary during construction 2014-11-13 16:13:38 +00:00
XapaJIaMnu
b4f51c05d1 Add option to reduce the ngrams from already prepared .ngrams file to train a model with smaller number of ngrams 2014-11-13 16:13:38 +00:00