Nicola Bertoldi
16e4220f17
functions to handle with Document-Level Translation tags
2013-08-14 12:20:51 +02:00
Nicola Bertoldi
614d7a0376
beautify
2013-08-11 23:43:26 +02:00
Nicola Bertoldi
5868653bd6
beautify
2013-08-11 23:41:23 +02:00
Nicola Bertoldi
7411227305
clean up related to the PhrrasePenalty producer
...
transform the PhrasePenalty basic feature functions into a FF like
WordPenalty
2013-08-11 23:32:54 +02:00
phikoehn
e50fc722e9
bug fix alternative weight setting
2013-08-07 15:35:40 +01:00
phikoehn
67c3063574
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
2013-08-07 05:32:59 +01:00
phikoehn
ab4e3c63a6
enriched trace
2013-08-07 05:31:45 +01:00
Hieu Hoang
302eec8283
beautify
2013-08-05 12:11:59 +01:00
Kenneth Heafield
78cdf82de8
Log10/loge weight change for incremental. TODO: debug n-best list generation
2013-08-02 17:56:41 +01:00
Hieu Hoang
f234aa203f
number recognizer treats each word as atomic, replace all of the word or nothing at all. Recognizer is designed to be run after the text has been tokenized, not before
2013-08-01 16:55:11 +01:00
Rico Sennrich
b32366ab8c
fix future and total cost in multimodel(counts). (was broken since merge of branch weight-new in May)
2013-07-31 14:18:18 +02:00
Rico Sennrich
d0e2c43011
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-30 17:18:32 +02:00
Rico Sennrich
a15bc05a33
rename multimodel weights in moses server (harmonization with the new config format)
2013-07-30 17:02:34 +02:00
Rico Sennrich
29cde2a204
allow overriding table filtering in config (required for multimodelcounts)
2013-07-30 16:46:23 +02:00
Rico Sennrich
7b6239b663
multimodelcounts: use Word objects instead of strings in map (avoid costly conversion and string comparison)
2013-07-30 15:03:25 +02:00
Hieu Hoang
03f767ba84
Add debug out to support regression test on Ken's incremental search algorithm. Ken has his own hypothesis class...
2013-07-30 13:05:13 +01:00
Rico Sennrich
ccdcecc86f
multimodel and mosesserver: instead of optimizing first model, select model by name.
2013-07-30 13:54:50 +02:00
Hieu Hoang
b05a443f36
correct arguments to substitute-filtered-tables-and-weights.perl
2013-07-30 11:14:17 +01:00
Ulrich Germann
cb1c06d502
Merge branch 'master' of github.com:moses-smt/mosesdecoder
...
Conflicts:
moses/Jamfile
2013-07-28 16:51:13 +01:00
Ulrich Germann
56bb485dd5
Fixed missing #include.
2013-07-28 16:39:13 +01:00
Ulrich Germann
b3ed0d56d7
Fixed missing #include.
2013-07-28 16:38:33 +01:00
Ulrich Germann
a47b6cfafa
Added call to tp->Evaluate(src) before adding a phrase table entry to the TargetPhraseCollection during lookup.
2013-07-28 16:37:20 +01:00
Ulrich Germann
1b1771dcc9
Items under 'generic' now included in libmoses'
2013-07-28 16:30:41 +01:00
Ulrich Germann
a0c13837e0
Fixed computation of lexical scores.
2013-07-28 16:28:41 +01:00
Hieu Hoang
abe90b5af7
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa
move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing
2013-07-27 04:18:50 +01:00
Hieu Hoang
38e312f44c
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:55:16 +01:00
Barry Haddow
29aa9ea153
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:56:44 +01:00
Barry Haddow
c127c58e9b
fix to single thread build
2013-07-25 15:56:20 +01:00
Hieu Hoang
a3e3289b08
In corpus mode, replace number with number symbol
2013-07-25 15:54:47 +01:00
Barry Haddow
7081f06413
Fixes to the shared build
2013-07-25 15:24:34 +01:00
Hieu Hoang
76a9730ca8
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:23:12 +01:00
Hieu Hoang
e2c2bc59f1
beautify
2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213
@NUM@ --> @num@. In case using recaser
2013-07-25 15:16:15 +01:00
Phil Williams
f0b603e6b5
extract-ghkm: write glue grammars for all sentence offsets
...
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Hieu Hoang
d0172ed5cd
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:52:05 +01:00
Hieu Hoang
c0aba71c79
bug processing unknown word with digits
2013-07-25 08:41:59 +01:00
Barry Haddow
f79746b3c2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2
merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction
2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b
Additional factoring to allow more NE recognizers; bug fixes
2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d
More lattice fixes squashed by merge
2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98
Add option to do Penn Treebank style tokenization
...
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only). This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.
Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday"). If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Kenneth Heafield
71ae8c9d19
LM/Factory.cpp -> FF/Factory.cpp oops
2013-07-24 12:13:11 +01:00
Ian Johnson
68779c66b9
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:52:21 +01:00
Ian Johnson
08f64dea28
Arrow pipeline submodules now use https protocol.
2013-07-24 11:52:14 +01:00
Barry Haddow
d5e40a5b08
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:38:23 +01:00
Phil Williams
b5584fdecf
extract-ghkm: workaround for extract-parallel issue
...
Don't write glue grammar or unknown word label files unless the sentence
offset is 0. This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.
TODO Better solutions might be:
1. modify extract-parallel so that it only configures one instance of
extract-ghkm to write the glue / unknown-lhs files (like the current
workaround, this assumes file chunks are representative of the whole)
2. add multithreading support directly to extract-ghkm
3. write distinct output files for each extract-ghkm instance and
combine them on completion
2013-07-23 14:55:16 +01:00