Commit Graph

9498 Commits

Author SHA1 Message Date
Kenneth Heafield
1530ae4f5f Fix state comparison (impacted 32-bit) 2013-01-09 13:15:04 +00:00
Barry Haddow
936dbf6516 Instance weighting 2013-01-08 16:40:00 +00:00
Barry Haddow
c86c11abbe instance weighting of lex weights 2013-01-08 15:34:29 +00:00
Barry Haddow
a55a936182 remove warning 2013-01-08 14:28:16 +00:00
Hieu Hoang
1b0f9f2e88 re-enable some score regressions tests. Was failing due to rounding errors 2013-01-08 13:54:35 +00:00
Hieu
5fe686888f Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2013-01-07 18:04:41 +00:00
Hieu
8e4f5a927d eclipse 2013-01-07 18:03:40 +00:00
Hieu
9973832624 eclipse 2013-01-07 18:03:32 +00:00
Hieu
86327b1e77 eclipse 2013-01-07 18:03:20 +00:00
Hieu
3bf83af8d1 eclipse 2013-01-07 18:03:11 +00:00
Hieu
18b162266c eclipse 2013-01-07 18:02:55 +00:00
Hieu
133e375e24 eclipse 2013-01-07 18:02:40 +00:00
Kenneth Heafield
366ca8afff Remove if that doesn't do anything, define _FILE_OFFSET_BITS 64 2013-01-07 17:38:04 +00:00
Hieu Hoang
506cb8e9d1 change lseek64() to lseek() and off64_t to off_t. Cygwin doesn't have those functions/types. They should be 64-bit if _FILE_OFFSET_BITS == 64 is set 2013-01-07 14:14:39 +00:00
Kenneth Heafield
ad209c8944 Collapse lseek macro 2013-01-04 22:42:32 +00:00
Hieu Hoang
77a3f32445 mac doesn't have lseek64() 2013-01-04 22:32:48 +00:00
Kenneth Heafield
f9ee7ae4b3 KenLM 0e5d259 including read_compressed fix 2013-01-04 21:02:47 +00:00
Kenneth Heafield
3203f7c92d If statically linking, check static libs for -mt 2013-01-04 21:02:13 +00:00
Hieu Hoang
c8448ded97 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-01-04 20:00:23 +00:00
Hieu Hoang
43337dfde3 get rid of HAVE_BOOST macro. Always has boost now 2013-01-04 20:00:04 +00:00
Barry Haddow
459acf87b1 Add support for instance weights file 2013-01-04 14:55:24 +00:00
hieu
1dfbe1113c delete MergeScorer in mert/ 2013-01-03 15:01:30 +00:00
hieu
15c776eda4 delete check and exit of Suffix Array phrase table implementation erroneously checked in 2013-01-03 14:35:39 +00:00
Barry Haddow
861792bfc5 extract can read an instance weights file.
Still have to parallelise.
2012-12-21 15:39:25 +00:00
Barry Haddow
8fe900d312 Testing of phrase length feature 2012-12-21 12:35:11 +00:00
Tetsuo Kiso
ce1b650b53 Fix memory leak.
The object was allocated with new, but it was not deleted.
This may not be a serious problem because the program mostly runs
a short time. However, it is not a good practice.
2012-12-21 03:06:41 +09:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a extract-ghkm: tweak label collection for unknown words
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform 2012-12-17 19:02:30 +00:00
Phil Williams
06081f7ddb extract-target-trees.py: minor fixes, code style 2012-12-17 18:49:50 +00:00
phikoehn
b275c94dbf allow for inclusion of extract from previous run 2012-12-12 07:02:59 +00:00
phikoehn
24e1df7520 support for use of baseline alignment model 2012-12-12 03:59:14 +00:00
phikoehn
438dcb1a34 bug fix in experiment.perl wrt. get-corpus-script 2012-12-10 23:50:14 +00:00
Barry Haddow
16ea68f55f Fix bug in mml scoring
Line length calculation was out of step with LM scoring.
2012-12-10 15:54:24 +00:00
Ales Tamchyna
598d65bcfd adding a simple command-line utility for computing sentence-level BLEU (+1) 2012-12-10 13:12:34 +01:00
phikoehn
ed2d191821 allow specification of end point for experiment.perl 2012-12-10 05:56:51 +00:00
phikoehn
ccf9e13d8e bug fix with multicore parallelizer 2012-12-09 22:27:02 +00:00
phikoehn
466b502ae0 minor bug fixes with MML 2012-12-09 20:31:20 +00:00
Tetsuo Kiso
2a3c9fc679 Further optimization for extractor.
Fixes inefficient updating N-gram counts.

NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Tetsuo Kiso
8fdec9bf30 Use boost::unordered_map instead of std::map.
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
6c04c4ad9c Add more tests to the Data class. 2012-12-07 02:46:59 +09:00
Tetsuo Kiso
c7f6e38326 Use FilePiece to load N-best lists.
Since FilePiece is friendly with StringPiece.
2012-12-07 02:39:02 +09:00
Tetsuo Kiso
38e145e556 Use util::TokenIter to tokenize n-best lists.
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
2012-12-07 01:39:22 +09:00
Tetsuo Kiso
cd3fb3b831 Untabify. 2012-12-06 23:46:22 +09:00
Tetsuo Kiso
ac045a11c1 Speed up N-gram counts when running extractor.
By replacing std::map with boost::unordered_map.

Runtime of extractor on 100-best lists of 2679 sentences:

Before:
real    0m35.314s
user    0m34.030s
sys     0m1.280s

Ater:
real    0m26.729s
user    0m25.420s
sys     0m1.310s
2012-12-06 22:08:33 +09:00
Hieu Hoang
55f65c3104 race condition in chart decoding with -T arg 2012-12-03 14:57:33 +00:00
phikoehn
ab2effb6fe train MML in-/out-of-domain language models with same vocabulary 2012-12-01 13:46:59 +00:00
phikoehn
269883fedd Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2012-12-01 13:45:00 +00:00
phikoehn
0c5d000192 my change to weight-wt 2012-12-01 13:44:57 +00:00
Marcin Junczys-Dowmunt
205cea8644 Allow .minlexr suffix and bugfix 2012-12-01 00:38:20 +01:00