Rico Sennrich
b32366ab8c
fix future and total cost in multimodel(counts). (was broken since merge of branch weight-new in May)
2013-07-31 14:18:18 +02:00
Rico Sennrich
d0e2c43011
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-30 17:18:32 +02:00
Rico Sennrich
a15bc05a33
rename multimodel weights in moses server (harmonization with the new config format)
2013-07-30 17:02:34 +02:00
Rico Sennrich
29cde2a204
allow overriding table filtering in config (required for multimodelcounts)
2013-07-30 16:46:23 +02:00
Rico Sennrich
7b6239b663
multimodelcounts: use Word objects instead of strings in map (avoid costly conversion and string comparison)
2013-07-30 15:03:25 +02:00
Hieu Hoang
03f767ba84
Add debug out to support regression test on Ken's incremental search algorithm. Ken has his own hypothesis class...
2013-07-30 13:05:13 +01:00
Rico Sennrich
ccdcecc86f
multimodel and mosesserver: instead of optimizing first model, select model by name.
2013-07-30 13:54:50 +02:00
Hieu Hoang
b05a443f36
correct arguments to substitute-filtered-tables-and-weights.perl
2013-07-30 11:14:17 +01:00
Ulrich Germann
cb1c06d502
Merge branch 'master' of github.com:moses-smt/mosesdecoder
...
Conflicts:
moses/Jamfile
2013-07-28 16:51:13 +01:00
Ulrich Germann
56bb485dd5
Fixed missing #include.
2013-07-28 16:39:13 +01:00
Ulrich Germann
b3ed0d56d7
Fixed missing #include.
2013-07-28 16:38:33 +01:00
Ulrich Germann
a47b6cfafa
Added call to tp->Evaluate(src) before adding a phrase table entry to the TargetPhraseCollection during lookup.
2013-07-28 16:37:20 +01:00
Ulrich Germann
1b1771dcc9
Items under 'generic' now included in libmoses'
2013-07-28 16:30:41 +01:00
Ulrich Germann
a0c13837e0
Fixed computation of lexical scores.
2013-07-28 16:28:41 +01:00
Hieu Hoang
abe90b5af7
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa
move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing
2013-07-27 04:18:50 +01:00
Hieu Hoang
38e312f44c
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:55:16 +01:00
Barry Haddow
29aa9ea153
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:56:44 +01:00
Barry Haddow
c127c58e9b
fix to single thread build
2013-07-25 15:56:20 +01:00
Hieu Hoang
a3e3289b08
In corpus mode, replace number with number symbol
2013-07-25 15:54:47 +01:00
Barry Haddow
7081f06413
Fixes to the shared build
2013-07-25 15:24:34 +01:00
Hieu Hoang
76a9730ca8
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:23:12 +01:00
Hieu Hoang
e2c2bc59f1
beautify
2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213
@NUM@ --> @num@. In case using recaser
2013-07-25 15:16:15 +01:00
Phil Williams
f0b603e6b5
extract-ghkm: write glue grammars for all sentence offsets
...
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Hieu Hoang
d0172ed5cd
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:52:05 +01:00
Hieu Hoang
c0aba71c79
bug processing unknown word with digits
2013-07-25 08:41:59 +01:00
Barry Haddow
f79746b3c2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2
merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction
2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b
Additional factoring to allow more NE recognizers; bug fixes
2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d
More lattice fixes squashed by merge
2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98
Add option to do Penn Treebank style tokenization
...
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only). This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.
Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday"). If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Kenneth Heafield
71ae8c9d19
LM/Factory.cpp -> FF/Factory.cpp oops
2013-07-24 12:13:11 +01:00
Ian Johnson
68779c66b9
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:52:21 +01:00
Ian Johnson
08f64dea28
Arrow pipeline submodules now use https protocol.
2013-07-24 11:52:14 +01:00
Barry Haddow
d5e40a5b08
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:38:23 +01:00
Phil Williams
b5584fdecf
extract-ghkm: workaround for extract-parallel issue
...
Don't write glue grammar or unknown word label files unless the sentence
offset is 0. This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.
TODO Better solutions might be:
1. modify extract-parallel so that it only configures one instance of
extract-ghkm to write the glue / unknown-lhs files (like the current
workaround, this assumes file chunks are representative of the whole)
2. add multithreading support directly to extract-ghkm
3. write distinct output files for each extract-ghkm instance and
combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
e6a3df7e97
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-23 13:12:30 +01:00
Hieu Hoang
206b165d14
randlm compile with refactored code. No regression tests yet
2013-07-23 12:56:35 +01:00
Hieu Hoang
9b9e8cc759
eclipse file with randlm
2013-07-23 12:41:02 +01:00
Nadir Durrani
30544ae17e
Sample Config File
2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd
Sample Config File
2013-07-23 12:18:57 +01:00
Barry Haddow
50de0e06d1
Generate correct ini file for lattices
2013-07-23 11:46:37 +01:00
Barry Haddow
8ed8bcafc2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-23 11:21:47 +01:00
Barry Haddow
887d5dad62
Restore EMS lattice fixes, squashed by merge.
2013-07-23 10:38:11 +01:00
Phil Williams
91cc7c329e
parse-de-berkeley.perl: escape @ characters in input
2013-07-23 10:22:56 +01:00
Hieu Hoang
1e906bea73
add ControlRecombination feature function
2013-07-23 01:38:08 +01:00