Commit Graph

9527 Commits

Author SHA1 Message Date
Tetsuo Kiso
2a3c9fc679 Further optimization for extractor.
Fixes inefficient updating N-gram counts.

NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Hieu Hoang
5a783f166e Merge branch 'master' into weight-new 2012-12-06 22:05:14 +00:00
Tetsuo Kiso
8fdec9bf30 Use boost::unordered_map instead of std::map.
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
6c04c4ad9c Add more tests to the Data class. 2012-12-07 02:46:59 +09:00
Tetsuo Kiso
c7f6e38326 Use FilePiece to load N-best lists.
Since FilePiece is friendly with StringPiece.
2012-12-07 02:39:02 +09:00
Hieu Hoang
3d6d53bf49 delete hardcoded if() statements to show scores in n-best list. Excluded UnknownWordPenalty and made sure PhraseModel & Generation is in particular order 2012-12-06 17:28:56 +00:00
Hieu Hoang
08ca44b34e delete hardcoded if() statements for show weights. Excluded UnknownWordPenalty and made sure PhraseModel & Generation is in particular order 2012-12-06 17:13:00 +00:00
Hieu Hoang
28b70a5697 delete hardcoded if() statements for show weights. Excluded UnknownWordPenalty and made sure PhraseModel & Generation is in particular order 2012-12-06 16:59:54 +00:00
Tetsuo Kiso
38e145e556 Use util::TokenIter to tokenize n-best lists.
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
2012-12-07 01:39:22 +09:00
Hieu Hoang
e3def0bc78 convert all other weight-* to [weight] 2012-12-06 16:19:18 +00:00
Hieu Hoang
af459277b8 correct name of syntactic LM 2012-12-06 15:23:48 +00:00
Hieu Hoang
eb12e4c808 Merge branch 'master' into weight-new 2012-12-06 14:48:20 +00:00
Hieu Hoang
6b58f88df9 Works with multiple phrase-tables 2012-12-06 14:46:52 +00:00
Tetsuo Kiso
cd3fb3b831 Untabify. 2012-12-06 23:46:22 +09:00
Tetsuo Kiso
ac045a11c1 Speed up N-gram counts when running extractor.
By replacing std::map with boost::unordered_map.

Runtime of extractor on 100-best lists of 2679 sentences:

Before:
real    0m35.314s
user    0m34.030s
sys     0m1.280s

Ater:
real    0m26.729s
user    0m25.420s
sys     0m1.310s
2012-12-06 22:08:33 +09:00
Hieu Hoang
da9cd0e3aa clean up weights code for confusion networks & lattices. Works, except for multiple phrase-tables or factors 2012-12-05 20:21:33 +00:00
Hieu Hoang
b8d4c64d6d deprecate -translation-systems 2012-12-05 17:58:45 +00:00
Hieu Hoang
9f767d4eba lexical reordering model works with new weight setup 2012-12-05 17:19:10 +00:00
Hieu Hoang
768d165600 works ok for plain phrase-based decoding. No lexical reordering model 2012-12-05 17:12:01 +00:00
Hieu Hoang
4f3805a0d7 change \!UnknownWordPenalty to UnknownWordPenalty 2012-12-05 11:58:21 +00:00
Hieu Hoang
cb270a76be all regression tests passed 2012-12-04 18:39:06 +00:00
Hieu Hoang
08b508d686 indentation 2012-12-04 17:54:39 +00:00
Hieu Hoang
9fe742ce52 get rid function GetScoreProducerWeightShortName(). Fails 1 regression test 2012-12-04 17:09:23 +00:00
Hieu Hoang
33105a7ba7 get rid of int argument from GetScoreProducerWeightShortName() 2012-12-04 13:08:00 +00:00
Hieu Hoang
55f65c3104 race condition in chart decoding with -T arg 2012-12-03 14:57:33 +00:00
phikoehn
ab2effb6fe train MML in-/out-of-domain language models with same vocabulary 2012-12-01 13:46:59 +00:00
phikoehn
269883fedd Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2012-12-01 13:45:00 +00:00
phikoehn
0c5d000192 my change to weight-wt 2012-12-01 13:44:57 +00:00
Marcin Junczys-Dowmunt
205cea8644 Allow .minlexr suffix and bugfix 2012-12-01 00:38:20 +01:00
Eva Hasler
650d31fe73 don't need to specify weight-wt 2012-11-30 18:04:50 +00:00
Hieu Hoang
a07f71d095 race condition on letter sed cache. Requires locking 2012-11-30 17:15:32 +00:00
Hieu Hoang
7abb3c878a remove locking. Make wordIndex variable local 2012-11-30 13:50:59 +00:00
Hieu Hoang
5fd9cbb529 delete reference to numpy. Doesn't need it 2012-11-30 10:28:51 +00:00
Hieu Hoang
017bbe78e8 forgotten misc programs for Compact pt 2012-11-30 09:49:36 +00:00
phikoehn
338b7656a6 ooops 2012-11-30 07:36:59 +00:00
phikoehn
84cb04c05a fixes and extensions to modified Moore-Lewis filtering, now works with domain features 2012-11-30 07:28:31 +00:00
phikoehn
1f7ee0e6c5 change of settings for sigtest filtering 2012-11-29 23:44:10 +00:00
Hieu Hoang
d4ead15066 fuzzy match phrase-table is multi-threaded 2012-11-29 15:27:38 +00:00
Hieu Hoang
9aad7c65c9 move CompactPt to TranslationModel/ 2012-11-27 18:04:01 +00:00
Hieu Hoang
152064086f Merge branch 'master' of github.com:moses-smt/mosesdecoder 2012-11-27 17:33:42 +00:00
Hieu Hoang
b317ac1a34 compile error on misc programs 2012-11-27 17:33:04 +00:00
Hieu Hoang
bc1e96730d move CKY+Parser to TranslationModel/ 2012-11-27 17:23:31 +00:00
Hieu Hoang
ae8a48b022 move Score3Parser to TranslationModel/ 2012-11-27 17:09:23 +00:00
Hieu Hoang
1aae9aa23c move RuleTable to TranslationModel/ 2012-11-27 16:57:23 +00:00
Hieu Hoang
6bf2870f18 move the rest of DynSA to TranslationModel/ 2012-11-27 16:31:42 +00:00
Hieu Hoang
4d8e4ae6d8 move DynSAInclude to TranslationModel/ 2012-11-27 16:16:30 +00:00
Barry Haddow
f0e12912e7 mml-score.py. Support for combining with domain features. 2012-11-27 15:58:55 +00:00
Hieu Hoang
75108c0aaf minor debug messages 2012-11-27 15:39:08 +00:00
Hieu Hoang
0b54d32038 move fuzzy-match to TranslationModel/ 2012-11-27 15:36:24 +00:00
Hieu Hoang
59449f2925 make TranslationModel subdirectory and move files from moses/ into it 2012-11-27 15:08:31 +00:00