Commit Graph

2352 Commits

Author SHA1 Message Date
Matthias Huck
c731851b92 integration of TargetPreferencesFeature in EMS create-config step 2016-01-27 17:54:57 +00:00
Hieu Hoang
38f999fa3f Merge ../mosesdecoder into perf_moses2 2016-01-13 14:57:20 +00:00
Matthias Huck
7764d86fcb tiny changes related to feature functions 2016-01-12 19:44:43 +00:00
Hieu Hoang
d0a48e71ad Merge ../mosesdecoder into perf_moses2 2016-01-12 09:23:07 +00:00
Matthias Huck
885b8b33a1 preparing extraction of Hiero soft syntactic preferences (target syntax) 2016-01-11 20:04:32 +00:00
Hieu Hoang
2c78a26b74 Merge ../mosesdecoder into perf_moses2 2016-01-10 13:21:53 +00:00
Matthias Huck
1d3feba8d0 preparing extraction of Hiero soft syntactic preferences (target syntax) 2016-01-09 23:02:31 +00:00
Hieu Hoang
bf19d71780 Merge ../mosesdecoder into perf_moses2 2016-01-06 21:03:33 +00:00
Barry Haddow
977e8eaf67 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2016-01-06 11:55:16 +00:00
Barry Haddow
7125096c29 enable nplm training on separate host, fix ems for nplm 2016-01-06 11:55:12 +00:00
Hieu Hoang
f179072902 cpu affinity offset 2016-01-04 00:19:35 +00:00
Hieu Hoang
589f0d97f0 cpu affinity offset 2016-01-04 00:05:17 +00:00
Hieu Hoang
abf41d04e9 cpu affinity offset 2016-01-03 23:22:50 +00:00
Matthias Huck
0a39efb6c8 Hiero phrase orientation: modify some parameters 2015-12-18 17:24:42 +00:00
Matthias Huck
bd3f573452 Hiero phrase orientation 2015-12-10 12:56:37 +00:00
Philipp Koehn
33f4e93915 no binarizing/filtering with mmsapt 2015-12-01 23:10:37 +00:00
Ulrich Germann
4c78e7c0b2 scripts/generic/bsbleu.py can now handle gzipped files 2015-11-29 17:57:57 +00:00
Ulrich Germann
c8b859de67 Merge remote-tracking branch 'legacy/master'
Conflicts:
	moses/server/TranslationRequest.cpp
2015-11-24 19:22:37 +00:00
Jeroen Vermeulen
710915c088 Python implementation of parallel scoring.
Re-implementation of score-parallel.perl.  Not a drop-in replacement;
the command line is similar but different and uses the standard Python
command-line parser.

Written without much knowledge of the original script, so documentation
in particular may seem nonsensical to experts.  If you see something
wrong, please help!
2015-11-24 14:37:18 +01:00
Philipp Koehn
94cd1f7433 when building mmsapt phrase table, also use mmsapt reordering table 2015-11-23 18:12:56 -05:00
Barry Haddow
10df006eed Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-11-23 12:16:46 +00:00
Michael Denkowski
b002fade50 Minimal buffering for multi_moses.py
Speeds things up when using multi-threaded instances
2015-11-18 13:54:54 -05:00
Barry Haddow
21d8111287 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-11-18 09:50:29 +00:00
David Madl
e36fb96557 LanguageModel, KenLM: avoid StaticData usage
* drop global lmodel-oov-feature option, and add it to LM FF config line instead
	use oov-feature=1 (bool) option instead
* drop LanguageModel::GetWeight()
* KenLM: use m_verbosity of FF instead of IFVERBOSE macro which uses StaticData

* train-model.perl: move language model OOV feature onto LM feature spec line
2015-11-17 16:15:13 +00:00
Barry Haddow
ccfe8ba018 remove unused method, and misleading comment 2015-11-10 21:35:08 +00:00
Ulrich Germann
ec71c2397b Allow multiple reference files to be specified on the command line; handle gzipped reference files. 2015-11-10 01:16:17 +00:00
Phil Williams
6a37dfd2ce Add a wrapper script for parsing English text with the BLLIP parser 2015-10-26 16:18:54 +00:00
Barry Haddow
d59bb883dc fix extra settings 2015-10-21 09:22:00 +01:00
Hieu Hoang
384e0b06d2 revert Greg's error handling commit. Cruise control breaks with message
sh: 1: cannot create /dev/stderr: Permission denied
2015-10-14 16:42:44 +01:00
Ubuntu
d6dda4f292 explicitly use bash /Uli Germannn 2015-10-13 13:40:48 +00:00
Ubuntu
81d106e74d Revert "Revert "Be more aware of errors in subprocesses""
This reverts commit 1fca3f8a75.
2015-10-13 13:33:40 +00:00
Ubuntu
1fca3f8a75 Revert "Be more aware of errors in subprocesses"
This reverts commit f6894f4623.
2015-10-13 12:53:02 +00:00
Matthias Huck
62748f5296 Revert "EMS: fix filtering issue when output-splitter is defined"
This reverts commit d5c41634e8.
2015-10-12 18:05:46 +01:00
Tomáš Fulajtár
dd9eb54ec4 Named group added for the safer 'protected patterns' recognition regexp.
In the original code there are the number references used , which might actualy colidate if any group is used inside the $protected_pattern string.  for example the protected_pattenr (loaded from file ) :  (http[s]?|ftp):\/\/[^:\/\s]+(\/\w+)*\/[\w\-\.]+.

If we use the number reference, the $2 will reffer to (http[s]?|ftp):, instead to  (.*) inside the :
 while ($t =~ /($protected_pattern)(.*)$/) {

Naming patterns resolves this issue.
2015-10-12 18:47:45 +02:00
Tomáš Fulajtár
83e25a3f5e Merge pull request #1 from moses-smt/master
update my local fork
2015-10-12 18:37:26 +02:00
Greg Hanneman
f6894f4623 Be more aware of errors in subprocesses 2015-10-09 16:05:25 +00:00
Michael Denkowski
923f512be0 Extend multi_moses.py to support multi-threaded moses instances 2015-10-08 11:50:34 -04:00
Nadir
15b4aa91b0 Extra Space 2015-10-08 13:16:58 +01:00
Nadir
965aeb9012 Interpolated OSM - Bug Fix 2015-10-08 12:03:09 +01:00
Nadir
2ec6fed898 Interpolated OSM 2015-10-07 13:57:32 +01:00
Michael Denkowski
35538bf894 Actually override threads specified in moses.ini 2015-10-06 15:29:42 -04:00
Rico Sennrich
c0fedd275b change default nplm setting to 1 hidden layer 2015-10-06 11:49:45 +01:00
Michael Denkowski
160a7f254b Parallelization with multiple instances of moses 2015-10-02 18:19:43 -04:00
Barry Haddow
bb1b5d3abd config of dropout 2015-09-25 13:53:39 +01:00
Hieu Hoang
1f5ec65f3c Merge pull request #129 from jimregan/patch-1
Basic tokenizer support for Irish (ga)
2015-09-24 16:07:17 +01:00
Hieu Hoang
d81dfda511 option to cat model files 2015-09-24 09:00:54 -04:00
Jim Regan
36e4951134 ga (mostly) behaves more like fr/it 2015-09-23 14:36:57 +01:00
Jim Regan
dfe682d823 Create nonbreaking_prefix.ga 2015-09-23 14:35:18 +01:00
Jim Regan
09a9f1b061 ga (mostly) behaves more like fr/it 2015-09-23 14:33:18 +01:00
Pierre Lison
31cc22cf14 redirecting output of which and split --help to /dev/null 2015-09-16 12:56:25 +02:00
Pierre Lison
e26bbc215f sending the stderr output of the which command to /dev/null 2015-09-16 11:01:22 +02:00
Ulrich Germann
2e3a82a40c Merge pull request #125 from akivajp/master
Fixed for removed option building cooccurrence table.
2015-09-12 11:32:27 +02:00
Akiva Miura
7b15548144 Fixed for removed option building cooccurrence table. 2015-09-07 06:21:04 +09:00
Barry Haddow
90f15cc619 extra nplm settings 2015-09-04 10:07:50 +01:00
Barry Haddow
4746970bf8 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-09-04 10:06:58 +01:00
Barry Haddow
e58ddf74e8 parameter changes 2015-09-04 10:06:54 +01:00
Tomáš Fulajtár
1a26cb8414 Added a simple support for the factored systems. 2015-08-27 15:15:32 +02:00
Hieu Hoang
d349bf8a94 dos2unix everything 2015-08-23 19:00:19 +04:00
Matthias Huck
d5c41634e8 EMS: fix filtering issue when output-splitter is defined 2015-08-21 18:58:36 +01:00
Matthias Huck
261cfdb024 perl shebang 2015-08-19 16:59:10 +01:00
Hieu Hoang
4ff776f564 dos2unix the whole lot 2015-08-19 16:30:31 +04:00
Hieu Hoang
97eed02301 dos2unix. Revert Matthias' change, use env 2015-08-19 13:25:29 +01:00
Matthias Huck
d583e0888f make-factor-de-pos.perl 2015-08-18 18:31:41 +01:00
Hieu Hoang
3a261c9fc9 don't hardcode amount of mem to be used by lmplz 2015-08-16 20:32:07 +04:00
Phil Williams
01a9dd2305 extract-target-trees.py: support for new-style trace files 2015-08-14 16:53:24 +01:00
Ulrich Germann
883c34aee9 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/SearchNormalBatch.cpp
	moses/TranslationModel/UG/mm/ug_bitext.h
	moses/TranslationModel/UG/mm/ug_typedefs.h
	moses/TranslationModel/UG/mmsapt.cpp
	moses/TranslationModel/UG/mmsapt.h
2015-08-07 14:14:19 +01:00
Barry Haddow
4c3a6a3f3f remove dash 2015-08-03 21:19:08 +01:00
Barry Haddow
57b0c351c0 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-08-03 16:47:31 +01:00
Barry Haddow
f808b32030 support version of nplm that picks best on heldout 2015-08-03 16:47:25 +01:00
Barry Haddow
a39544bbcb fix inconsistency in the example 2015-08-03 16:46:57 +01:00
Hieu Hoang
bfd45fdfc3 don't use all threads 2015-08-01 11:35:47 +04:00
Hieu Hoang
7ac6f90a4d Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-07-31 22:29:07 +04:00
Hieu Hoang
f894dec0fd multi-threaded decoding by default /Vincent Nguyen 2015-07-31 22:28:45 +04:00
Rico Sennrich
89d16a491a fix ems regression (concatenate-split step) 2015-07-31 11:20:29 +01:00
Philipp Koehn
a5ee3c1b6d script to copy model files to local disk before running the decoder - useful for grid 2015-07-29 11:10:13 -04:00
Philipp Koehn
836ca8212a better support of grid engine cluster 2015-07-29 11:03:24 -04:00
Philipp Koehn
b29166e2fe fix of fix 2015-07-29 11:02:55 -04:00
Philipp Koehn
837bcde69d bug with legacy 2015-07-29 10:46:09 -04:00
Philipp Koehn
ae9cd14948 fixes 2015-07-29 10:44:57 -04:00
Hieu Hoang
5f4b497b63 Merge pull request #119 from cidermole/master
Fix 'Use of uninitialized value' error through explicit setting of 0s…
2015-07-29 16:25:39 +04:00
Barry Haddow
3a2116b2c9 add quotes so arguments don't get lost 2015-07-29 09:35:19 +01:00
Rico Sennrich
0173512ddc separate xml version of combine_factors.pl
xml version causes slow-down for users who use factors, but not xml (which are most)
2015-07-28 23:30:18 +01:00
Phil Williams
2cda286a06 experiment.meta: re-run fast_align symmetrization if symmetrization type changes 2015-07-28 16:55:55 +01:00
Rico Sennrich
a968536176 ems fix: pass-unless doesn't understand AND 2015-07-28 16:37:50 +01:00
Ulrich Germann
d67723fd29 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into ranked-sampling
Conflicts:
	moses/TargetPhrase.cpp
	moses/TargetPhrase.h
2015-07-28 14:29:49 +01:00
Barry Haddow
e53ad40859 Support for nplm in ems 2015-07-23 10:37:26 +01:00
Philipp Koehn
1a795f549e only extract reordering phrase pairs if use mmsapt phrase table 2015-07-20 11:47:24 -04:00
Philipp Koehn
777a88673d compress sort tmp files by default 2015-07-20 11:46:47 -04:00
Philipp Koehn
496f8c6d85 only extract reordering phrase pairs if use mmsapt phrase table 2015-07-20 11:44:22 -04:00
Philipp Koehn
fcf2934a2f customized phrase table pruning step 2015-07-20 11:43:02 -04:00
Rico Sennrich
7b19d83d43 xml support for combine-factors.pl 2015-07-20 10:45:18 +01:00
Rico Sennrich
1b1bafb1e8 ems: add option to factorize after truecase/split/etc. 2015-07-20 10:43:23 +01:00
Rico Sennrich
e85f353898 code simplification by removing language-specific, unused hack. 2015-07-20 10:39:01 +01:00
Ulrich Germann
df50e454d2 Proper handling of moses parameters with double dash in create_config(). 2015-07-17 14:32:30 +01:00
Phil Williams
c83628a92b Fix errors from multiline `` commands in transliteration Perl scripts
Replace the backslash-newline sequence with backslash-backslash-newline in
multiline backquote command strings.  i.e. replace expressions like this:

  `some-command \
    -option1 \
    -option2`;

with ones like this

  `some-command \\
    -option1 \\
    -option2`;

If I understand this right, the shell converts a backslash-newline sequence
to an empty string (i.e. it discards it), but Perl does not.  Unless the
backslash itself is escaped, using a backslash-newline in a Perl command
string results in errors in most instances.  By escaping the backslash, it
gets passed through to the shell where it is interpreted as intended.
2015-07-16 14:54:00 +01:00
Philipp Koehn
66ecf98cf7 minor bug fix 2015-07-14 11:01:22 -04:00
Rico Sennrich
ca72105fdf fix ems regression 2015-07-14 13:16:25 +01:00
David Madl
3c30210dad Fix 'Use of uninitialized value' error through explicit setting of 0s in hash.
Fixes the following errors in bootstrap-hypothesis-difference-significance.pl on Perl v5.14.2:

Use of uninitialized value $coocUpd in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 317.
Use of uninitialized value $b in numeric lt (<) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 543.
Use of uninitialized value $coocUpd in addition (+) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 314.
Use of uninitialized value $coocUpd in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 317.
Use of uninitialized value $a in numeric gt (>) at /fs/lofn0/dmadl/software/mosesdecoder/scripts/analysis/bootstrap-hypothesis-difference-significance.pl line 552.
2015-07-14 13:05:22 +01:00
Philipp Koehn
7e3050f7f2 allow saving of model from fast-align (for incremental use) 2015-07-14 05:27:03 -04:00
Barry Haddow
3fdbb00904 Improvements to handling of bilingual LM in EMS 2015-07-10 15:44:24 +01:00
Hieu Hoang
f66beabf4f Generation error in EMS due to pruning. Lets see if this works. 2015-06-28 14:03:54 +04:00
Hieu Hoang
82edbb98a7 comments in ini file about default weights 2015-06-28 10:40:43 +04:00
Hieu Hoang
57e213ed19 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 12:18:21 +04:00
Hieu Hoang
ca54852641 tighten up extract-parallel on osx. Can now use gsplit and bsd split 2015-06-26 11:55:24 +04:00
Hieu Hoang
b83803203e prune generation table in ems 2015-06-25 18:10:31 +04:00
Hieu Hoang
dce0f33270 prune generation table in ems 2015-06-24 18:35:59 +04:00
Barry Haddow
425118aa5d bugfixes - working directory 2015-06-17 09:32:29 +01:00
Barry Haddow
ad8114ddb0 capitalisation 2015-06-15 16:23:12 +01:00
XapaJIaMnu
166bf7365f Forgot to update the weight config path 2015-06-12 16:56:36 +01:00
XapaJIaMnu
ffd3f2bb6e Added basic BilingualNPLM support to EMS and an example config. 2015-06-12 16:21:24 +01:00
Jeroen Vermeulen
dbcc264506 Remove unneeded script.
Tom Hoar, the author of this script, asked me to remove it because it
doesn't actually do what the current name says, and can't work without
an additional script which isn't in the repository.
2015-06-09 23:10:27 +07:00
Lexi Birch
b76194a16b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-06-08 17:13:00 +01:00
Lexi Birch
501c51947b Allowing the truecaser to work on uncased ASR input, pass the -a flag 2015-06-08 16:58:50 +01:00
Jeroen Vermeulen
85c23ed7dc Fix some JS lint. 2015-06-02 18:05:12 +07:00
Jeroen Vermeulen
b3e577be76 Fixing lint. Only 600 or so lines of errors left! 2015-06-02 17:29:32 +07:00
Jeroen Vermeulen
0981d23705 Lint-fixing binge. 2015-06-02 16:02:39 +07:00
Rico Sennrich
5d8af9c289 support memory-mapped files for NPLM training 2015-05-29 16:07:26 +01:00
Jeroen Vermeulen
ef028446f3 Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project.  Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.

I kept the notices as short as I could.  As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
2015-05-29 18:30:26 +07:00
Jeroen Vermeulen
26170a4179 Friendlier error reporting in beautify.py. 2015-05-29 09:37:37 +07:00
Barry Haddow
c27aa193ea Revert "Min score parameter". Doesn't work without filter.
This reverts commit ab2d396781.
2015-05-28 17:44:26 +01:00
Barry Haddow
ab2d396781 Min score parameter 2015-05-28 17:10:21 +01:00
Phil Williams
842fc9780e senna2brackets.py: bug fixes + clean-up 2015-05-27 20:33:43 +01:00
Phil Williams
c086a8ee50 Add a wrapper script for parsing English text with SENNA 2015-05-26 16:44:13 +01:00
Rico Sennrich
f6f56d11af ems: parse-relax comes last in train; do same for dev/test 2015-05-25 15:52:07 +01:00
Hieu Hoang
582a845524 don't use zcat 2015-05-24 20:04:01 +04:00
Rico Sennrich
43527c82fc training script for monolingual Neural LM
(+bugfixes and usability improvements for RDLM training)
2015-05-22 15:31:08 +01:00
Rico Sennrich
a1678187fe wrapper for stanford dependency parser 2015-05-22 15:28:42 +01:00
Rico Sennrich
98ff2382d0 duplication of existing functionality 2015-05-20 17:35:38 +01:00
Rico Sennrich
6aac7ded9a EMS: more flexible way to concatenate LM training data.
the implementation allows the user to specify which corpora to combine,
and to have multiple LMs on the same data.
2015-05-20 17:20:02 +01:00
Hieu Hoang
36caf2eb9a escape ^# character otherwise morfessors skips line 2015-05-20 15:28:28 +04:00
Hieu Hoang
79ca96db0a should have tested this 2015-05-19 22:19:11 +04:00
Hieu Hoang
59071bf16c run on all cores if number of cores not given 2015-05-19 18:32:31 +04:00
Rico Sennrich
8ca6764c7d ems: allow LMs with user-specified training commands and moses.ini config entries
intended for neural LMs, syntactic LMs, and the like. currently doesn't play nice with INTERPOLATED-LM.
2015-05-18 19:07:37 +01:00
Rico Sennrich
fb06a2325e fix broken ems with interpolated lm disabled 2015-05-18 17:26:09 +01:00
Rico Sennrich
f85dd85f6b ignore-unless magic 2015-05-18 16:17:33 +01:00
Rico Sennrich
59376f500b still confused about pass-unless vs. ignore-unless 2015-05-18 14:40:56 +01:00
Rico Sennrich
45a97f9016 EMS: disable concatenated LM by default 2015-05-18 14:10:29 +01:00
Hieu Hoang
2c0aecb16b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 16:27:03 +04:00
Hieu Hoang
2f0ee5502e delete debugging info 2015-05-18 16:26:26 +04:00
Rico Sennrich
27fd45d088 ems: training LM on concatenation of all LM training corpora 2015-05-18 12:18:49 +01:00
Hieu Hoang
14d2a67193 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-18 12:28:35 +04:00
Hieu Hoang
5fdcf372ae rename a python file to have have .py, instead of .perl. In case beautify script depends on it 2015-05-18 12:27:35 +04:00
Jeroen Vermeulen
5aa70c6cdd Also reformat Perl, using Perltidy. 2015-05-18 00:45:15 +07:00
Jeroen Vermeulen
494c20f634 Add note about Perltidy. 2015-05-17 22:48:03 +07:00
Jeroen Vermeulen
e2a632a2b8 JavaScript lint. 2015-05-17 21:36:07 +07:00
Jeroen Vermeulen
5d0bbb6a45 Fix some JavaScript lint. Still a lot left. 2015-05-17 21:24:04 +07:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Jeroen Vermeulen
108da16374 Suppress CSS lint checking; accept longer lines.
The CSS style that Pocketlint expects is just too different from what we
have.  Don't check those files for now.

Also, a maximum line length of 300 still gives too many warnings, so I'm
regretfully dumping the default to 400 characters.  The traditional 80
characters are already longer than the measured optimum for human reading,
so I hope some day we can address this!
2015-05-17 20:03:27 +07:00
Jeroen Vermeulen
07a8fe06aa Also support checking for lint.
Choose which action(s) you want for each run: --format and/or --lint.

Many different types of files are lint-checkable, but you need Pocketlint
installed (plus ideally, its plugins for the various languages).

Also, added option to control batching of the commands.
2015-05-17 18:25:06 +07:00
Jeroen Vermeulen
9bdcb5f7c1 Fix more Python lint.
This is about the last that isn't in contrib or generated files.  At this
point we can start doing regular lint checks, at least on the Python files,
without being completely inundated with warnings.
2015-05-16 18:03:54 +07:00
Jeroen Vermeulen
61162dd242 Fix more Python lint.
Most of the complaints fixed here were from Pocketlint, but many were also
from Syntastic the vim plugin.
2015-05-16 17:26:56 +07:00
Jeroen Vermeulen
0ffe79579e Fix some python lint.
I used mainly pocketlint, a very good Python linter, but also Syntastic,
a vim plugin.  Didn't get anywhere near fixing all of Syntastic's complaints
though.

Once I've cleaned up all (or at least most) of the Python lint, we can
start doing regular automated lint checks and keep the code clean.
2015-05-16 14:58:03 +07:00
Jeroen Vermeulen
f1ed14eb33 Move ignored path prefixes into config file.
The path prefixes listed in .beautify-ignore, in the project root, will not
be cleaned up.  C and C++ files everywhere else will be.

Also fixes bugs in the prefix-matching code, and makes the matching a little
bit more powerful: the prefix can now extend down into the directory tree.
2015-05-15 16:40:06 +07:00
Jeroen Vermeulen
d7599134b8 Remove the --any-astyle option.
The risk from flip-flopping on styles is too great: 2.04 formatting has
many changes from 2.01 formatting.

Also, fix a broken version check.
2015-05-15 15:24:54 +07:00
Jeroen Vermeulen
3e821b56dd Rewrite beautify script in Python.
The new version is much longer, but hopefully extensible, reusable, and
easy to read.  With a few more changes it will let us do three things:

1. Apply more checks and cleanups.
2. Clean up additional file types.
3. Use the same script for mgiza++, so we get a uniform code style.

The "more cleanups" could be more things like the removal of trailing
whitespace which we just added.  We may even want to run lint checkers of
some sort.

Additional file types could be e.g. Perl scripts, build configuration,
or documentation.

In order to make the script reusable we'll have to generalize the
skip_at_root list in list_c_like_files into something like a configuration
file.
2015-05-15 13:46:00 +07:00
Hieu Hoang
5173b9f617 beautify. Add sed for trailing spaces 2015-05-13 11:29:16 +01:00
Hieu Hoang
87e1f1351f tighten up OSM build. More debugging output, to stderr not stdout. lmplz uses outdir as temp directory 2015-05-13 12:29:56 +04:00
Hieu Hoang
0cd62488bf morfessor wrapper 2015-05-12 20:40:19 +04:00
Hieu Hoang
abfc0671a3 osm tweaks and morfessor wrapper 2015-05-12 20:19:39 +04:00
Hieu Hoang
a922245864 default to using lmplz for convenience and because SRILM uses tonnes of memory 2015-05-12 11:44:05 +04:00
Hieu Hoang
03e507f354 default to using lmplz for convenience and because SRILM uses tonnes of memory 2015-05-12 10:40:52 +04:00
Hieu Hoang
8bb18b9ff0 add no-splitter-training argument. Splitter to be used by mada 2015-05-11 15:26:50 +04:00
Barry Haddow
85c1af4d72 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-08 09:16:55 +01:00
Barry Haddow
f403f5e478 mmsapt doesn't require feature weights on first tuning iteration 2015-05-08 09:16:51 +01:00
Hieu Hoang
2acb590394 output bleu for multi-bleu hack 2015-05-05 17:54:35 +04:00
Hieu Hoang
d006c6ef8c don't output remaining args twice 2015-05-05 12:15:08 +04:00
Hieu Hoang
5fefb0da47 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-05 12:02:13 +04:00
Hieu Hoang
8f272e04a9 output debugging messages to stderr, not stdout 2015-05-05 12:01:21 +04:00
Nicola Bertoldi
90a982e579 merge remote into local 2015-05-04 09:42:44 +02:00
Hieu Hoang
d456d9229e add multi-bleu-detok. Like multi-bleu scoring but will detokenize/post-process before scoring 2015-05-03 14:07:12 +04:00
Hieu Hoang
e5f76ee99e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-05-03 11:50:31 +04:00
Hieu Hoang
73ae7d7e20 option not to use parallel 2015-05-03 11:50:10 +04:00
Philipp Koehn
a4a7c14593 allow breaking up training data for fast align (to avoid memory blowups for very large corpora) 2015-05-01 17:47:08 -04:00
Philipp Koehn
de6a9bd1b3 minor updates to factor scripts; brown-cluster may now run other scripts (e.g., truecaser) before assigning classes 2015-05-01 17:46:14 -04:00
Philipp Koehn
b369699661 various small changes, mostly related to better compliance with grid engine 2015-05-01 17:44:18 -04:00
Rico Sennrich
e98a2fc980 fix interpolation for LM with parser in pre-processing 2015-04-30 15:46:33 +01:00
Hieu Hoang
1278b8f5a7 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-30 15:35:34 +04:00
Hieu Hoang
a6d34a660d another madamira wrapper. Just uses the tokenized file it outputs 2015-04-30 15:35:15 +04:00
Hieu Hoang
15e4b16f49 delete unused var 2015-04-30 14:01:03 +04:00
Hieu Hoang
ebc5a51d32 Merge pull request #111 from unhammer/extract-perl-safewait
die if the forked extract exited with error
2015-04-30 11:45:25 +04:00
Hieu Hoang
1c99b2b2b8 Merge pull request #110 from unhammer/extract-perl-abspath-when-ln
avoid bad symlinks in extract-parallel
2015-04-30 11:41:03 +04:00
Kevin Brubeck Unhammer
2af2f2ef36 avoid bad symlinks in extract-parallel
train-model seems to pass a non-absolute path for the
model/aligned-argument, and then extract-parallel creates a bad symlink
2015-04-30 09:36:59 +02:00
Kevin Brubeck Unhammer
c116fa0dbf die if the forked extract exited with error
Should we pass on bad exit codes from RunFork to those waitpids as well?
Seems like the right thing, though I don't know the code.
2015-04-30 09:33:37 +02:00
Nicola Bertoldi
3400b622c0 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-30 08:35:41 +02:00
Hieu Hoang
8f9bf7ea38 add -config 2015-04-28 15:03:59 +04:00
Hieu Hoang
b7792b227a script to convert arabic to bw, and vice versa 2015-04-28 12:29:58 +04:00
Hieu Hoang
8adad4fc2e exec permission 2015-04-27 17:39:49 +04:00
Hieu Hoang
a47fc00635 option to output factors 2015-04-27 17:35:19 +04:00
Rico Sennrich
da648fd65b fix some RDLM training options 2015-04-27 10:52:16 +01:00
alvations
fa30ea6712 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into moses-smt-master 2015-04-26 20:38:27 +02:00
alvations
4a68c42b16 syncing to latest moses version 2015-04-26 20:37:10 +02:00
alvations
ec54ea3c4f put back some of the difference made after RELEASE3.0 and incorporated it with the -threads parameter 2015-04-26 20:30:15 +02:00
alvations
c01b0a6262 merging the filter-model-given-input.pl with alvations-master branch 2015-04-26 20:25:15 +02:00
alvations
dda3ddd80b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-26 20:23:39 +02:00
alvations
e1fcc8082a use integer type when reading options instead of checking for undef. it's more elegant. 2015-04-24 19:30:40 +02:00
alvations
d453ccc9f5 removed wrongly added perl script... 2015-04-24 19:14:04 +02:00
alvations
0ccbcaece6 added $threads option in usage example 2015-04-24 19:11:57 +02:00
alvations
aa9207acfc fixed typo in $thread to $threads 2015-04-24 19:10:10 +02:00
alvations
6c63ca963c checks for undefined $threads 2015-04-24 19:06:37 +02:00
alvations
585784f62a added thread options for filter-model-given-input.pl 2015-04-24 18:57:28 +02:00
Hieu Hoang
4b47e1148c use ignore-unless /Philipp Koehn 2015-04-22 23:02:57 +04:00
Hieu Hoang
40933b4a78 hack to allow target side of tokenized parallel corpus to be used for LM 2015-04-22 19:01:12 +04:00
Nicola Bertoldi
5700fbaabf Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-22 07:50:07 +02:00
Hieu Hoang
c15f3ef068 duplicated functionality with ems/support/lmplz-wrapper.perl 2015-04-21 17:54:34 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
95435f2a2e better detection of pigz 2015-04-20 20:40:50 +04:00
Hieu Hoang
eb37437d09 don't output warnings. It wasn't originally there before the 'env perl' change. This script should be tightened up at some point, eg use strict, debug warning messages 2015-04-20 16:18:51 +04:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Hieu Hoang
8190a5e1d6 Merge pull request #107 from flammie/master
Finnish detokenisation
2015-04-11 14:20:37 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Michael Denkowski
2682cc0f9b typo fix 2015-04-07 17:06:18 -04:00
Flammie Pirinen
fc8ee03b8d examples 2015-04-07 16:19:07 +01:00
Flammie Pirinen
ef52bc66f6 full set of cases and caps 2015-04-07 16:16:43 +01:00
Flammie Pirinen
85230e8334 add fi to list to silence warnings 2015-04-07 16:00:48 +01:00
Flammie Pirinen
f9deb6de3b also lowercase if case fail 2015-04-07 15:55:02 +01:00
Flammie Pirinen
5817806ec7 fix detokenising : in abbrev. case suffix case 2015-04-07 15:51:29 +01:00
Hieu Hoang
54e55f2dcb better detection of pigz, sort, split. In case they are not in the default directory 2015-04-06 11:31:44 +04:00
Hieu Hoang
02185a85fb store temp run files in current directory, not /tmp 2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9 leave in runPath for debugging 2015-04-05 16:49:12 +04:00
Hieu Hoang
4cb8a1837e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-05 16:45:17 +04:00
Hieu Hoang
7ffdddef13 script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo 2015-04-05 16:44:24 +04:00
Dingyuan Wang
aea07b0a19 Fix some problems in EMS:
* remove absolute links
* fix coverage bar highlighting
* change Base64 library to support UTF-8
2015-04-03 23:47:25 +08:00
Rico Sennrich
8d8097632b re-apply commit 1fb51dc (use gunzip -c instead of zcat)
plus be more tolerant about xml input
2015-04-03 15:00:45 +01:00
Hieu Hoang
b2f9ba2b64 revert last commit to add MASTER_PATH. Not needed 2015-04-02 19:29:42 +04:00
Hieu Hoang
27b36e0c96 pass in PATH variable from master node. When you're running of a grid but really just qsubbing everything to 1 slave node 2015-04-02 19:15:21 +04:00
Hieu Hoang
2d1da3219d consistently use 'env perl' command for environments where the 1st perl in PATH isn't the default perl. Which is kinda stupid 2015-04-02 17:38:56 +04:00
Hieu Hoang
035c806059 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-02 14:10:42 +04:00
Hieu Hoang
e76247e19b Conditional import of Thread package for perl installations that don't support threads 2015-04-02 14:07:57 +04:00
Hieu Hoang
e22d275c32 don't ignore lowercasing of factored LM. Must be consistent with pt 2015-04-01 23:25:57 +04:00
Phil Williams
6ce3060dd8 lmplz-wrapper.perl: use Getopt::Long's "pass_through" option
This avoids the need to duplicate all of lmplz's options in the wrapper and
it prevents --prune 0 0 1 from being truncated to --prune 0 if the user forgets
to quote the arguments.
2015-03-30 10:18:51 +01:00
alvations
496a2a716c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-03-23 15:55:56 +01:00
Matthias Huck
506427368f filter-model-given-input.pl: drop "-encoding None" from phrase table binaization with processPhraseTableMin. Recommended by Marcin. 2015-03-23 14:38:24 +00:00
Hieu Hoang
c4af7d28b5 Merge pull request #100 from alvations/master
Added the Gacha Filter used in WMT14 by the Manawi system
2015-03-21 22:42:55 +00:00
alvations
e5feb1a73e Enforce python3 and also remove extra empty newline from STDOUT 2015-03-20 19:37:34 +01:00
alvations
8f2d687d27 added more description of usage in docstring 2015-03-20 19:00:36 +01:00
alvations
44cd32d058 fixed typo in error message 2015-03-20 18:53:03 +01:00
alvations
93ea5853e8 Added Gacha Filter used by Manawi system from WMT14 translation task 2015-03-20 18:48:47 +01:00
alvations
5536e13213 added Gacha Filter from WMT14 2015-03-20 18:44:59 +01:00
Rico Sennrich
3a673fc8dc EMS: support for syntactic metrics for MERT/MIRA
- add "-n-best-trees" to TUNING:decoder-settings
 - add "mock-output-parser-references = $output-parser" to GENERAL (and define output-parser)
 - TUNING:tuning-settings should include the metric you want to optimize (e.g. "-batch-mira-args='--sctype BLEU,HWCM'")
2015-03-20 17:15:33 +00:00
Rico Sennrich
ca08b1d205 reduce-factors: port xml support from train-model.perl 2015-03-20 14:44:48 +00:00
Rico Sennrich
b8ca33c34e RDLM training without editing bash scripts 2015-03-20 14:12:41 +00:00
Rico Sennrich
2271f295e6 nplm_train: more options 2015-03-20 14:12:41 +00:00
Rico Sennrich
eab513b635 relational dependency language model 2015-03-18 17:39:45 +00:00
Phil Williams
fc15e03ebe Replace truecase-egret.sh with more general tree-converter-wrapper.perl 2015-03-18 09:57:42 +00:00
Phil Williams
ac51e9f0a8 Always use "SyntaxInputWeight0" as name of SyntaxInputWeight feature 2015-03-18 09:56:46 +00:00
Phil Williams
0a8e5fb3bf EMS: fix TRAINING:use-syntax-input-weight-feature option 2015-03-13 17:18:56 +00:00