Commit Graph

2352 Commits

Author SHA1 Message Date
Linas Vepstas
6fb2c97029 Bug-fix: regular Western sentence enders not recognized. 2017-01-05 23:29:00 -06:00
Linas Vepstas
bd9d12351b Create a Cantonese version, distinct from Mandarin.
The content is identical, at this moment, but having distinct
langauge suffixes solves processing-pipeline problems later on.
2017-01-05 12:53:21 -06:00
Linas Vepstas
1933bcbf33 Whoops, revert cut-n-paste damage in previous commit. 2017-01-05 11:39:01 -06:00
Linas Vepstas
203c7c6387 Preliminary support for Chinese. 2017-01-05 11:34:38 -06:00
Linas Vepstas
144f43495e Preliminary support for Chinese.
Also, cleanup some of the comments.
2017-01-05 11:33:10 -06:00
Linas Vepstas
9f5500a3a8 oops. 2017-01-05 10:09:34 -06:00
Linas Vepstas
ab6816f9a7 Purely cosmetic cleanup.
Use same indentation style throughout; wrap long lines; capitalize
sentences; add punctuation; remove trailing whitespace.
2017-01-05 10:08:06 -06:00
Linas Vepstas
d10ba6f049 More abbreviations for LLithuanian. 2017-01-04 23:52:28 -06:00
Linas Vepstas
3ef84b133c More abbreviations 2017-01-04 22:30:53 -06:00
Linas Vepstas
2a5e40ed60 New file: Lithuanian 2017-01-04 22:01:45 -06:00
Hieu Hoang
ff12a13eaa re-tune if decoder changed. eg moses -> moses2 2017-01-02 16:37:56 -05:00
Hieu Hoang
29b0072eda CreateProbingPT2 -> CreateProbingPT 2017-01-02 06:02:54 -05:00
Hieu Hoang
28c0564589 Merge pull request #170 from moses-smt/alvations-patch-1
Changed \p{Hyphen} to \p{Line_Break} in mteval-v13a.pl
2016-12-23 15:00:49 +00:00
Hieu Hoang
59119c0044 Merge pull request #168 from tofula/master
Named group added for the safer 'protected patterns' recognition regexp
2016-12-23 10:26:19 +00:00
Hieu Hoang
fc8829cdda Merge pull request #169 from lonevvolf/master
Fix for number at the end of a string
2016-12-23 09:50:57 +00:00
alvations
c6c3bc84b7 Changed \p{Hyphen} to \p{LineBreak}
Using Perl v5.18.2, it's reporting this warning:
**Use of 'Hyphen' in \p{} or \P{} is deprecated because: Supplanted by Line_Break property values; see www.unicode.org/reports/tr14**
2016-12-23 14:21:20 +08:00
lonevvolf
d68211cba9 Fix for number at the end of a string 2016-12-06 09:41:32 +01:00
Hieu Hoang
5992f58d99 comment out debugging messages 2016-10-31 12:27:25 -04:00
Hieu Hoang
2679c30c1b --num-scores 2016-10-05 21:35:36 +01:00
Hieu Hoang
bcea640c9a handles hiero models too 2016-10-05 21:33:19 +01:00
Hieu Hoang
16d6a89861 output debugging messages to stderr not stdout 2016-09-29 07:16:57 -04:00
Hieu Hoang
3d5500e698 Merge branch 'perf_moses2' of github.com:hieuhoang/mosesdecoder into perf_moses2 2016-09-27 08:21:34 -04:00
Hieu Hoang
a29f7d5c99 can define srilm-dir in general section 2016-09-27 08:21:18 -04:00
Hieu Hoang
9527fb050d duplicate -T arg for OSM 2016-09-26 12:04:33 +01:00
Hieu Hoang
92f5f868cb add --num-scores arg. To binarize regression test tables with 5 scores 2016-09-25 22:53:08 +01:00
Marcin Junczys-Dowmunt
9ff0af4e85 consistent order of parameters in ini 2016-09-04 21:53:41 +02:00
Hieu Hoang
9d176ce7b5 Merge pull request #162 from da-web/patch-2
Changed NoPhraseCount score-option
2016-09-04 12:25:18 +01:00
Hieu Hoang
9236eeeba9 add brodie to list of machines 2016-08-13 18:27:25 +01:00
Hieu Hoang
260b4776ad binarization with CreateProbingPT2 2016-08-13 18:24:18 +01:00
Hieu Hoang
67d8a10d95 binarization with CreateProbingPT2 2016-08-13 18:16:35 +01:00
Hieu Hoang
c42bb54c04 formatting of -show-weights to make it work with mert script 2016-08-12 09:20:43 +01:00
Hieu Hoang
a8325a3e8e make probing pt work with ems 2016-08-11 15:46:43 +01:00
Hieu Hoang
9a31447c23 add support for CreateProbingPT2 2016-08-10 19:57:35 +01:00
Hieu Hoang
bf4f6b3b90 Merge ../mosesdecoder into perf_moses2 2016-08-05 17:15:18 +01:00
Hieu Hoang
34ffa372c1 Merge pull request #163 from a455bcd9/patch-1
Separate comma after a number end sentence
2016-08-05 17:12:44 +01:00
Antoine Dusséaux
6652068a43 Single lower-case letter French word
"a" is a single lower-case letter French word that can be at the end of a sentence: "Oui, il l'a."
2016-07-31 14:56:37 +02:00
Antoine Dusséaux
d04bdc7440 Separate comma after a number end sentence
Separate "," after a number if it's the end of a sentence.

Example:

He is tall,
He was born in 1800,
He wants to go there in 2000.

He is tall ,
He was born in 1800 ,
He wants to go there in 2000 .
2016-07-31 14:10:07 +02:00
da-web
fb2d7261cf Changed NoPhraseCount score-option
NoPhraseCount score-option was changed to PhraseCount: i.e. per default PhraseCount is omitted.

1. parse PhraseCount instead of NoPhraseCount from "score-options"
2. pass PhraseCount instead of NoPhraseCount to consolidate

fix for issue #157
2016-07-12 14:30:18 +02:00
Hieu Hoang
7fa586aa9b Merge ../mosesdecoder into perf_moses2 2016-07-11 09:50:22 +01:00
Philipp Koehn
ef9d327841 only train input or output truecaser, if only one is needed 2016-07-10 11:17:21 -04:00
Hieu Hoang
65667ad322 Merge ../mosesdecoder into perf_moses2 2016-07-05 23:36:15 +01:00
Hieu Hoang
bf4109655f Revert "Merge pull request #158 from da-web/patch-1"
This reverts commit f7ea8fe0da, reversing
changes made to 03d8747e65.
2016-07-05 11:26:04 +01:00
Hieu Hoang
7f6ce67bac Merge pull request #155 from ypeels/master
avoid name collisions when filtering multiple reordering tables
2016-07-03 20:20:24 +01:00
Hieu Hoang
f7ea8fe0da Merge pull request #158 from da-web/patch-1
Correctly consider score-options NoPhraseCount Argument
2016-07-03 20:20:13 +01:00
Hieu Hoang
0eb0bf642c Merge ../mosesdecoder into perf_moses2 2016-06-23 11:11:11 +01:00
Barry Haddow
aa37aba8aa add threshold pruning option to binarizer 2016-06-20 22:15:51 +01:00
da-web
51b2d0c302 Correctly consider score-options NoPhraseCount Argument
Handle and propagate NoPhraseCount score-option correctly (per default phrasetable is created WITH phrasecount feature):
1. pass --PhraseCount to consolidate (as --NoPhraseCount is not supported by consolidate)
2. consider --NoPhraseCount when calculating the basic_weight correctly (otherwise Moses.ini is wrong)
2016-06-10 13:02:57 +02:00
Hieu Hoang
b75ef6f619 Merge ../mosesdecoder into perf_moses2 2016-06-04 12:45:30 +01:00
Philipp Koehn
defbf8d7c3 barebone support for quality estimation in experiment.perl 2016-06-04 05:15:34 -04:00
Hieu Hoang
36812013bf Merge ../mosesdecoder into perf_moses2 2016-05-31 14:36:15 +01:00
Philipp Koehn
942eb5a8b1 allow configuration of operation sequence model loading, allow specification of KENLM/OSM loading in experiment.perl / train-model.perl 2016-05-29 11:46:42 -04:00
Rico Sennrich
26556d67e2 transliteration module: mertdir should be mosesdecoder/bin 2016-05-27 16:13:21 +01:00
Hieu Hoang
fae65a6ec0 Merge ../mosesdecoder into perf_moses2 2016-05-25 12:50:16 +01:00
Philipp Koehn
c07e6faed8 farasa_moses.sh not default tokenizer 2016-05-25 03:39:00 -04:00
Jonathan Chen
e2c596415d avoid name collisions when filtering multiple reordering tables 2016-05-17 12:00:54 -05:00
Hieu Hoang
2aacaa86ad Merge ../mosesdecoder into perf_moses2 2016-05-09 13:10:51 +01:00
Michael Denkowski
faf1bd5046 Arg handling fixes for mert-moses.pl compatibility 2016-05-05 13:00:13 -04:00
Hieu Hoang
83f2618514 Merge ../mosesdecoder into perf_moses2 2016-05-05 10:56:15 +01:00
Nadir
bb7263f0f9 Arabic Tokenizer 2016-05-04 16:20:55 +01:00
bicici
482e8f744f Update config.basic 2016-05-01 12:00:01 +03:00
bicici
6edf6d2df2 Update config.basic 2016-05-01 11:47:44 +03:00
Ondrej Bojar
0c3108b8ad Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2016-04-28 22:11:35 +02:00
Ondrej Bojar
02f5beefbc minor fixes, warnings 2016-04-28 22:10:25 +02:00
Hieu Hoang
e370e7202e Merge ../mosesdecoder into perf_moses2 2016-04-28 00:21:28 +01:00
Michael Denkowski
631e4845c1 Generic moses wrapper script
Useful for adding pre and post-processing steps to decoding in
mert-moses.pl and similar
2016-04-25 10:15:48 -07:00
Hieu Hoang
42bc056010 Merge ../mosesdecoder into perf_moses2 2016-04-24 00:10:48 +04:00
Hieu Hoang
17800fda1d add old mteval 2016-04-23 20:08:18 +01:00
Hieu Hoang
7d96adb2a7 add script for acquis cleaning 2016-04-19 10:02:46 +01:00
Hieu Hoang
18d96fe051 Merge ../mosesdecoder into perf_moses2 2016-04-15 09:31:37 +04:00
michaelhutt
271aaa67da fixes train-model.perl when specifying 'union' alignment 2016-04-14 16:06:08 -04:00
Hieu Hoang
dcc684ef8f binarize4moses2.perl 2016-03-31 02:25:38 +01:00
Hieu Hoang
ae5e18650a binarize4moses2.perl 2016-03-31 02:16:16 +01:00
Hieu Hoang
ff8caa1226 Merge ../mosesdecoder into perf_moses2 2016-03-28 23:58:20 +01:00
Kenneth Heafield
31698d5eca Remove deprecated lazyken=0 settings. Should be load instead, but don't specify if it's the default. 2016-03-24 12:58:10 +00:00
Michael Denkowski
ab9e408f11 Merge pull request #147 from moses-smt/mert-multi-irst
Fix mert-moses multi-moses for IRSTLM
2016-03-23 23:40:05 -04:00
michaelhutt
77a0b54960 removed invalid '--reduced' from cp command in OSM-Train.perl 2016-03-23 19:31:04 -04:00
Michael Denkowski
c8749aa69a Exit properly when callind moses with --show-weights 2016-03-23 16:37:15 -04:00
Michael Denkowski
c30a4a5611 Fix mert-moses multi-moses for IRSTLM 2016-03-23 12:42:25 -04:00
Hieu Hoang
95f252aa37 Merge ../mosesdecoder into perf_moses2 2016-03-22 10:20:21 +00:00
Hieu Hoang
000de1c1dd Merge pull request #144 from jimregan/patch-2
.' at end of sentence is missed
2016-03-22 08:07:17 +00:00
Michael Denkowski
1ac4ca5735 README formatting fix 2016-03-14 02:23:06 -04:00
Hieu Hoang
f96b48a041 Merge ../mosesdecoder into perf_moses2 2016-03-11 16:38:38 +00:00
Jeroen Vermeulen
a97ce16706 score_parallel.py: fix flexibility scoring.
Forgot to pass the half_file to bzcat when doing flexibility scoring.
2016-03-09 12:58:32 +01:00
Hieu Hoang
b9217cec1d Merge ../mosesdecoder into perf_moses2 2016-03-01 11:08:39 +00:00
Michael Denkowski
c6314d927d N-best re-ranker and trainer 2016-02-23 11:43:51 -05:00
Jim Regan
5d5bf1885d .' at end of sentence is missed 2016-02-23 14:54:20 +00:00
Hieu Hoang
6a363d893c Merge ../mosesdecoder into perf_moses2 2016-02-17 17:39:51 +00:00
Matthias Huck
3c041e80a3 minor code simplification of phrase-based extractor 2016-02-16 21:54:14 +00:00
Hieu Hoang
938ef79404 Merge ../mosesdecoder into perf_moses2 2016-02-12 19:40:04 +00:00
Matthias Huck
1659d6b4c8 Option for target constituent constrained phrase extraction. TargetConstituentAdjacencyFeature. 2016-02-12 17:46:57 +00:00
shuoyangd
7cf3b23962 fix search graph parsing for CYK+ 2016-02-04 17:21:24 -05:00
shuoyangd
1286791ba1 add nnjm-settings to access options in train_nplm.py 2016-02-04 17:18:23 -05:00
Hieu Hoang
06769e648d Merge ../mosesdecoder into perf_moses2 2016-02-04 13:05:36 +00:00
Matthias Huck
16a49d0d8d train-model.perl: don't switch to hierarchical if source or target syntax 2016-02-03 21:33:26 +00:00
Philipp Koehn
9132ab1ded bug fix of bug fix of generation table training 2016-02-01 14:01:06 -05:00
Philipp Koehn
904afd175d prune-generation step invaludated orginial build-generation step 2016-01-31 11:56:37 -05:00
Philipp Koehn
b4725e1c91 do not interpret $0 as a EMS settings variable 2016-01-31 11:55:44 -05:00
Philipp Koehn
39ba12592b fixed serious bug with reordering tables for different factors (wmt15 results affected) 2016-01-31 11:54:43 -05:00
Hieu Hoang
d4848f3ac4 Merge ../mosesdecoder into perf_moses2 2016-01-28 20:09:35 +00:00
Matthias Huck
80572b43ec undo unintended modification 2016-01-27 17:58:13 +00:00
Matthias Huck
c731851b92 integration of TargetPreferencesFeature in EMS create-config step 2016-01-27 17:54:57 +00:00
Hieu Hoang
38f999fa3f Merge ../mosesdecoder into perf_moses2 2016-01-13 14:57:20 +00:00
Matthias Huck
7764d86fcb tiny changes related to feature functions 2016-01-12 19:44:43 +00:00
Hieu Hoang
d0a48e71ad Merge ../mosesdecoder into perf_moses2 2016-01-12 09:23:07 +00:00
Matthias Huck
885b8b33a1 preparing extraction of Hiero soft syntactic preferences (target syntax) 2016-01-11 20:04:32 +00:00
Hieu Hoang
2c78a26b74 Merge ../mosesdecoder into perf_moses2 2016-01-10 13:21:53 +00:00
Matthias Huck
1d3feba8d0 preparing extraction of Hiero soft syntactic preferences (target syntax) 2016-01-09 23:02:31 +00:00
Hieu Hoang
bf19d71780 Merge ../mosesdecoder into perf_moses2 2016-01-06 21:03:33 +00:00
Barry Haddow
977e8eaf67 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2016-01-06 11:55:16 +00:00
Barry Haddow
7125096c29 enable nplm training on separate host, fix ems for nplm 2016-01-06 11:55:12 +00:00
Hieu Hoang
f179072902 cpu affinity offset 2016-01-04 00:19:35 +00:00
Hieu Hoang
589f0d97f0 cpu affinity offset 2016-01-04 00:05:17 +00:00
Hieu Hoang
abf41d04e9 cpu affinity offset 2016-01-03 23:22:50 +00:00
Matthias Huck
0a39efb6c8 Hiero phrase orientation: modify some parameters 2015-12-18 17:24:42 +00:00
Matthias Huck
bd3f573452 Hiero phrase orientation 2015-12-10 12:56:37 +00:00
Philipp Koehn
33f4e93915 no binarizing/filtering with mmsapt 2015-12-01 23:10:37 +00:00
Ulrich Germann
4c78e7c0b2 scripts/generic/bsbleu.py can now handle gzipped files 2015-11-29 17:57:57 +00:00
Ulrich Germann
c8b859de67 Merge remote-tracking branch 'legacy/master'
Conflicts:
	moses/server/TranslationRequest.cpp
2015-11-24 19:22:37 +00:00
Jeroen Vermeulen
710915c088 Python implementation of parallel scoring.
Re-implementation of score-parallel.perl.  Not a drop-in replacement;
the command line is similar but different and uses the standard Python
command-line parser.

Written without much knowledge of the original script, so documentation
in particular may seem nonsensical to experts.  If you see something
wrong, please help!
2015-11-24 14:37:18 +01:00
Philipp Koehn
94cd1f7433 when building mmsapt phrase table, also use mmsapt reordering table 2015-11-23 18:12:56 -05:00
Barry Haddow
10df006eed Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-11-23 12:16:46 +00:00
Michael Denkowski
b002fade50 Minimal buffering for multi_moses.py
Speeds things up when using multi-threaded instances
2015-11-18 13:54:54 -05:00
Barry Haddow
21d8111287 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-11-18 09:50:29 +00:00
David Madl
e36fb96557 LanguageModel, KenLM: avoid StaticData usage
* drop global lmodel-oov-feature option, and add it to LM FF config line instead
	use oov-feature=1 (bool) option instead
* drop LanguageModel::GetWeight()
* KenLM: use m_verbosity of FF instead of IFVERBOSE macro which uses StaticData

* train-model.perl: move language model OOV feature onto LM feature spec line
2015-11-17 16:15:13 +00:00
Barry Haddow
ccfe8ba018 remove unused method, and misleading comment 2015-11-10 21:35:08 +00:00
Ulrich Germann
ec71c2397b Allow multiple reference files to be specified on the command line; handle gzipped reference files. 2015-11-10 01:16:17 +00:00
Phil Williams
6a37dfd2ce Add a wrapper script for parsing English text with the BLLIP parser 2015-10-26 16:18:54 +00:00
Barry Haddow
d59bb883dc fix extra settings 2015-10-21 09:22:00 +01:00
Hieu Hoang
384e0b06d2 revert Greg's error handling commit. Cruise control breaks with message
sh: 1: cannot create /dev/stderr: Permission denied
2015-10-14 16:42:44 +01:00
Ubuntu
d6dda4f292 explicitly use bash /Uli Germannn 2015-10-13 13:40:48 +00:00
Ubuntu
81d106e74d Revert "Revert "Be more aware of errors in subprocesses""
This reverts commit 1fca3f8a75.
2015-10-13 13:33:40 +00:00
Ubuntu
1fca3f8a75 Revert "Be more aware of errors in subprocesses"
This reverts commit f6894f4623.
2015-10-13 12:53:02 +00:00
Matthias Huck
62748f5296 Revert "EMS: fix filtering issue when output-splitter is defined"
This reverts commit d5c41634e8.
2015-10-12 18:05:46 +01:00
Tomáš Fulajtár
dd9eb54ec4 Named group added for the safer 'protected patterns' recognition regexp.
In the original code there are the number references used , which might actualy colidate if any group is used inside the $protected_pattern string.  for example the protected_pattenr (loaded from file ) :  (http[s]?|ftp):\/\/[^:\/\s]+(\/\w+)*\/[\w\-\.]+.

If we use the number reference, the $2 will reffer to (http[s]?|ftp):, instead to  (.*) inside the :
 while ($t =~ /($protected_pattern)(.*)$/) {

Naming patterns resolves this issue.
2015-10-12 18:47:45 +02:00
Tomáš Fulajtár
83e25a3f5e Merge pull request #1 from moses-smt/master
update my local fork
2015-10-12 18:37:26 +02:00
Greg Hanneman
f6894f4623 Be more aware of errors in subprocesses 2015-10-09 16:05:25 +00:00
Michael Denkowski
923f512be0 Extend multi_moses.py to support multi-threaded moses instances 2015-10-08 11:50:34 -04:00
Nadir
15b4aa91b0 Extra Space 2015-10-08 13:16:58 +01:00
Nadir
965aeb9012 Interpolated OSM - Bug Fix 2015-10-08 12:03:09 +01:00
Nadir
2ec6fed898 Interpolated OSM 2015-10-07 13:57:32 +01:00
Michael Denkowski
35538bf894 Actually override threads specified in moses.ini 2015-10-06 15:29:42 -04:00
Rico Sennrich
c0fedd275b change default nplm setting to 1 hidden layer 2015-10-06 11:49:45 +01:00
Michael Denkowski
160a7f254b Parallelization with multiple instances of moses 2015-10-02 18:19:43 -04:00
Barry Haddow
bb1b5d3abd config of dropout 2015-09-25 13:53:39 +01:00
Hieu Hoang
1f5ec65f3c Merge pull request #129 from jimregan/patch-1
Basic tokenizer support for Irish (ga)
2015-09-24 16:07:17 +01:00
Hieu Hoang
d81dfda511 option to cat model files 2015-09-24 09:00:54 -04:00
Jim Regan
36e4951134 ga (mostly) behaves more like fr/it 2015-09-23 14:36:57 +01:00
Jim Regan
dfe682d823 Create nonbreaking_prefix.ga 2015-09-23 14:35:18 +01:00
Jim Regan
09a9f1b061 ga (mostly) behaves more like fr/it 2015-09-23 14:33:18 +01:00
Pierre Lison
31cc22cf14 redirecting output of which and split --help to /dev/null 2015-09-16 12:56:25 +02:00