Commit Graph

10591 Commits

Author SHA1 Message Date
Nicola Bertoldi
16e4220f17 functions to handle with Document-Level Translation tags 2013-08-14 12:20:51 +02:00
Nicola Bertoldi
614d7a0376 beautify 2013-08-11 23:43:26 +02:00
Nicola Bertoldi
5868653bd6 beautify 2013-08-11 23:41:23 +02:00
Nicola Bertoldi
7411227305 clean up related to the PhrrasePenalty producer
transform the PhrasePenalty basic feature functions into a FF like
WordPenalty
2013-08-11 23:32:54 +02:00
phikoehn
e50fc722e9 bug fix alternative weight setting 2013-08-07 15:35:40 +01:00
phikoehn
67c3063574 Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2013-08-07 05:32:59 +01:00
phikoehn
ab4e3c63a6 enriched trace 2013-08-07 05:31:45 +01:00
Hieu Hoang
302eec8283 beautify 2013-08-05 12:11:59 +01:00
Kenneth Heafield
78cdf82de8 Log10/loge weight change for incremental. TODO: debug n-best list generation 2013-08-02 17:56:41 +01:00
Hieu Hoang
f234aa203f number recognizer treats each word as atomic, replace all of the word or nothing at all. Recognizer is designed to be run after the text has been tokenized, not before 2013-08-01 16:55:11 +01:00
Rico Sennrich
b32366ab8c fix future and total cost in multimodel(counts). (was broken since merge of branch weight-new in May) 2013-07-31 14:18:18 +02:00
Rico Sennrich
d0e2c43011 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-30 17:18:32 +02:00
Rico Sennrich
a15bc05a33 rename multimodel weights in moses server (harmonization with the new config format) 2013-07-30 17:02:34 +02:00
Rico Sennrich
29cde2a204 allow overriding table filtering in config (required for multimodelcounts) 2013-07-30 16:46:23 +02:00
Rico Sennrich
7b6239b663 multimodelcounts: use Word objects instead of strings in map (avoid costly conversion and string comparison) 2013-07-30 15:03:25 +02:00
Hieu Hoang
03f767ba84 Add debug out to support regression test on Ken's incremental search algorithm. Ken has his own hypothesis class... 2013-07-30 13:05:13 +01:00
Rico Sennrich
ccdcecc86f multimodel and mosesserver: instead of optimizing first model, select model by name. 2013-07-30 13:54:50 +02:00
Hieu Hoang
b05a443f36 correct arguments to substitute-filtered-tables-and-weights.perl 2013-07-30 11:14:17 +01:00
Ulrich Germann
cb1c06d502 Merge branch 'master' of github.com:moses-smt/mosesdecoder
Conflicts:
	moses/Jamfile
2013-07-28 16:51:13 +01:00
Ulrich Germann
56bb485dd5 Fixed missing #include. 2013-07-28 16:39:13 +01:00
Ulrich Germann
b3ed0d56d7 Fixed missing #include. 2013-07-28 16:38:33 +01:00
Ulrich Germann
a47b6cfafa Added call to tp->Evaluate(src) before adding a phrase table entry to the TargetPhraseCollection during lookup. 2013-07-28 16:37:20 +01:00
Ulrich Germann
1b1771dcc9 Items under 'generic' now included in libmoses' 2013-07-28 16:30:41 +01:00
Ulrich Germann
a0c13837e0 Fixed computation of lexical scores. 2013-07-28 16:28:41 +01:00
Hieu Hoang
abe90b5af7 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing 2013-07-27 04:18:50 +01:00
Hieu Hoang
38e312f44c Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:55:16 +01:00
Barry Haddow
29aa9ea153 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:56:44 +01:00
Barry Haddow
c127c58e9b fix to single thread build 2013-07-25 15:56:20 +01:00
Hieu Hoang
a3e3289b08 In corpus mode, replace number with number symbol 2013-07-25 15:54:47 +01:00
Barry Haddow
7081f06413 Fixes to the shared build 2013-07-25 15:24:34 +01:00
Hieu Hoang
76a9730ca8 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-25 15:23:12 +01:00
Hieu Hoang
e2c2bc59f1 beautify 2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213 @NUM@ --> @num@. In case using recaser 2013-07-25 15:16:15 +01:00
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Hieu Hoang
d0172ed5cd create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:52:05 +01:00
Hieu Hoang
c0aba71c79 bug processing unknown word with digits 2013-07-25 08:41:59 +01:00
Barry Haddow
f79746b3c2 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2 merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction 2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b Additional factoring to allow more NE recognizers; bug fixes 2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d More lattice fixes squashed by merge 2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98 Add option to do Penn Treebank style tokenization
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only).  This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.

Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday").  If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Kenneth Heafield
71ae8c9d19 LM/Factory.cpp -> FF/Factory.cpp oops 2013-07-24 12:13:11 +01:00
Ian Johnson
68779c66b9 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:52:21 +01:00
Ian Johnson
08f64dea28 Arrow pipeline submodules now use https protocol. 2013-07-24 11:52:14 +01:00
Barry Haddow
d5e40a5b08 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:38:23 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00