Commit Graph

10581 Commits

Author SHA1 Message Date
Rico Sennrich
b32366ab8c fix future and total cost in multimodel(counts). (was broken since merge of branch weight-new in May) 2013-07-31 14:18:18 +02:00
Rico Sennrich
d0e2c43011 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-30 17:18:32 +02:00
Rico Sennrich
a15bc05a33 rename multimodel weights in moses server (harmonization with the new config format) 2013-07-30 17:02:34 +02:00
Rico Sennrich
29cde2a204 allow overriding table filtering in config (required for multimodelcounts) 2013-07-30 16:46:23 +02:00
Rico Sennrich
7b6239b663 multimodelcounts: use Word objects instead of strings in map (avoid costly conversion and string comparison) 2013-07-30 15:03:25 +02:00
Hieu Hoang
03f767ba84 Add debug out to support regression test on Ken's incremental search algorithm. Ken has his own hypothesis class... 2013-07-30 13:05:13 +01:00
Rico Sennrich
ccdcecc86f multimodel and mosesserver: instead of optimizing first model, select model by name. 2013-07-30 13:54:50 +02:00
Hieu Hoang
b05a443f36 correct arguments to substitute-filtered-tables-and-weights.perl 2013-07-30 11:14:17 +01:00
Ulrich Germann
cb1c06d502 Merge branch 'master' of github.com:moses-smt/mosesdecoder
Conflicts:
	moses/Jamfile
2013-07-28 16:51:13 +01:00
Ulrich Germann
56bb485dd5 Fixed missing #include. 2013-07-28 16:39:13 +01:00
Ulrich Germann
b3ed0d56d7 Fixed missing #include. 2013-07-28 16:38:33 +01:00
Ulrich Germann
a47b6cfafa Added call to tp->Evaluate(src) before adding a phrase table entry to the TargetPhraseCollection during lookup. 2013-07-28 16:37:20 +01:00
Ulrich Germann
1b1771dcc9 Items under 'generic' now included in libmoses' 2013-07-28 16:30:41 +01:00
Ulrich Germann
a0c13837e0 Fixed computation of lexical scores. 2013-07-28 16:28:41 +01:00
Hieu Hoang
abe90b5af7 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing 2013-07-27 04:18:50 +01:00
Hieu Hoang
38e312f44c Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:55:16 +01:00
Barry Haddow
29aa9ea153 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:56:44 +01:00
Barry Haddow
c127c58e9b fix to single thread build 2013-07-25 15:56:20 +01:00
Hieu Hoang
a3e3289b08 In corpus mode, replace number with number symbol 2013-07-25 15:54:47 +01:00
Barry Haddow
7081f06413 Fixes to the shared build 2013-07-25 15:24:34 +01:00
Hieu Hoang
76a9730ca8 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:23:12 +01:00
Hieu Hoang
e2c2bc59f1 beautify 2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213 @NUM@ --> @num@. In case using recaser 2013-07-25 15:16:15 +01:00
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Hieu Hoang
d0172ed5cd create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:52:05 +01:00
Hieu Hoang
c0aba71c79 bug processing unknown word with digits 2013-07-25 08:41:59 +01:00
Barry Haddow
f79746b3c2 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2 merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction 2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b Additional factoring to allow more NE recognizers; bug fixes 2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d More lattice fixes squashed by merge 2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98 Add option to do Penn Treebank style tokenization
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only).  This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.

Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday").  If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Kenneth Heafield
71ae8c9d19 LM/Factory.cpp -> FF/Factory.cpp oops 2013-07-24 12:13:11 +01:00
Ian Johnson
68779c66b9 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:52:21 +01:00
Ian Johnson
08f64dea28 Arrow pipeline submodules now use https protocol. 2013-07-24 11:52:14 +01:00
Barry Haddow
d5e40a5b08 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:38:23 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
e6a3df7e97 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-23 13:12:30 +01:00
Hieu Hoang
206b165d14 randlm compile with refactored code. No regression tests yet 2013-07-23 12:56:35 +01:00
Hieu Hoang
9b9e8cc759 eclipse file with randlm 2013-07-23 12:41:02 +01:00
Nadir Durrani
30544ae17e Sample Config File 2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd Sample Config File 2013-07-23 12:18:57 +01:00
Barry Haddow
50de0e06d1 Generate correct ini file for lattices 2013-07-23 11:46:37 +01:00
Barry Haddow
8ed8bcafc2 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-23 11:21:47 +01:00
Barry Haddow
887d5dad62 Restore EMS lattice fixes, squashed by merge. 2013-07-23 10:38:11 +01:00
Phil Williams
91cc7c329e parse-de-berkeley.perl: escape @ characters in input 2013-07-23 10:22:56 +01:00
Hieu Hoang
1e906bea73 add ControlRecombination feature function 2013-07-23 01:38:08 +01:00