Hieu Hoang
fd9a39d655
Merge pull request #25 from neubig/ems-external-bin-dir
...
EMS should rerun mkcls and GIZA++ on change of external-bin-dir
2013-01-10 02:42:55 -08:00
Graham Neubig
c55a1474df
Updated experiment.meta
2013-01-10 16:16:23 +09:00
Barry Haddow
2e8bad22e4
lex reordering scoring uses FilePiece/StringPiece
2013-01-09 17:38:48 +00:00
Kenneth Heafield
1530ae4f5f
Fix state comparison (impacted 32-bit)
2013-01-09 13:15:04 +00:00
Barry Haddow
936dbf6516
Instance weighting
2013-01-08 16:40:00 +00:00
Barry Haddow
c86c11abbe
instance weighting of lex weights
2013-01-08 15:34:29 +00:00
Barry Haddow
a55a936182
remove warning
2013-01-08 14:28:16 +00:00
Hieu Hoang
1b0f9f2e88
re-enable some score regressions tests. Was failing due to rounding errors
2013-01-08 13:54:35 +00:00
Hieu
5fe686888f
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2013-01-07 18:04:41 +00:00
Hieu
8e4f5a927d
eclipse
2013-01-07 18:03:40 +00:00
Hieu
9973832624
eclipse
2013-01-07 18:03:32 +00:00
Hieu
86327b1e77
eclipse
2013-01-07 18:03:20 +00:00
Hieu
3bf83af8d1
eclipse
2013-01-07 18:03:11 +00:00
Hieu
18b162266c
eclipse
2013-01-07 18:02:55 +00:00
Hieu
133e375e24
eclipse
2013-01-07 18:02:40 +00:00
Kenneth Heafield
366ca8afff
Remove if that doesn't do anything, define _FILE_OFFSET_BITS 64
2013-01-07 17:38:04 +00:00
Hieu Hoang
506cb8e9d1
change lseek64() to lseek() and off64_t to off_t. Cygwin doesn't have those functions/types. They should be 64-bit if _FILE_OFFSET_BITS == 64 is set
2013-01-07 14:14:39 +00:00
Kenneth Heafield
ad209c8944
Collapse lseek macro
2013-01-04 22:42:32 +00:00
Hieu Hoang
77a3f32445
mac doesn't have lseek64()
2013-01-04 22:32:48 +00:00
Kenneth Heafield
f9ee7ae4b3
KenLM 0e5d259 including read_compressed fix
2013-01-04 21:02:47 +00:00
Kenneth Heafield
3203f7c92d
If statically linking, check static libs for -mt
2013-01-04 21:02:13 +00:00
Hieu Hoang
c8448ded97
Merge branch 'master' of git://github.com/moses-smt/mosesdecoder
2013-01-04 20:00:23 +00:00
Hieu Hoang
43337dfde3
get rid of HAVE_BOOST macro. Always has boost now
2013-01-04 20:00:04 +00:00
Barry Haddow
459acf87b1
Add support for instance weights file
2013-01-04 14:55:24 +00:00
hieu
1dfbe1113c
delete MergeScorer in mert/
2013-01-03 15:01:30 +00:00
hieu
15c776eda4
delete check and exit of Suffix Array phrase table implementation erroneously checked in
2013-01-03 14:35:39 +00:00
Barry Haddow
861792bfc5
extract can read an instance weights file.
...
Still have to parallelise.
2012-12-21 15:39:25 +00:00
Barry Haddow
8fe900d312
Testing of phrase length feature
2012-12-21 12:35:11 +00:00
Tetsuo Kiso
ce1b650b53
Fix memory leak.
...
The object was allocated with new, but it was not deleted.
This may not be a serious problem because the program mostly runs
a short time. However, it is not a good practice.
2012-12-21 03:06:41 +09:00
Phil Williams
139148bc8f
extract-ghkm and friends: don't unescape special characters
...
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a
extract-ghkm: tweak label collection for unknown words
...
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f
extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform
2012-12-17 19:02:30 +00:00
Phil Williams
06081f7ddb
extract-target-trees.py: minor fixes, code style
2012-12-17 18:49:50 +00:00
phikoehn
b275c94dbf
allow for inclusion of extract from previous run
2012-12-12 07:02:59 +00:00
phikoehn
24e1df7520
support for use of baseline alignment model
2012-12-12 03:59:14 +00:00
phikoehn
438dcb1a34
bug fix in experiment.perl wrt. get-corpus-script
2012-12-10 23:50:14 +00:00
Barry Haddow
16ea68f55f
Fix bug in mml scoring
...
Line length calculation was out of step with LM scoring.
2012-12-10 15:54:24 +00:00
Ales Tamchyna
598d65bcfd
adding a simple command-line utility for computing sentence-level BLEU (+1)
2012-12-10 13:12:34 +01:00
phikoehn
ed2d191821
allow specification of end point for experiment.perl
2012-12-10 05:56:51 +00:00
phikoehn
ccf9e13d8e
bug fix with multicore parallelizer
2012-12-09 22:27:02 +00:00
phikoehn
466b502ae0
minor bug fixes with MML
2012-12-09 20:31:20 +00:00
Tetsuo Kiso
2a3c9fc679
Further optimization for extractor.
...
Fixes inefficient updating N-gram counts.
NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Tetsuo Kiso
8fdec9bf30
Use boost::unordered_map instead of std::map.
...
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
6c04c4ad9c
Add more tests to the Data class.
2012-12-07 02:46:59 +09:00
Tetsuo Kiso
c7f6e38326
Use FilePiece to load N-best lists.
...
Since FilePiece is friendly with StringPiece.
2012-12-07 02:39:02 +09:00
Tetsuo Kiso
38e145e556
Use util::TokenIter to tokenize n-best lists.
...
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
2012-12-07 01:39:22 +09:00
Tetsuo Kiso
cd3fb3b831
Untabify.
2012-12-06 23:46:22 +09:00
Tetsuo Kiso
ac045a11c1
Speed up N-gram counts when running extractor.
...
By replacing std::map with boost::unordered_map.
Runtime of extractor on 100-best lists of 2679 sentences:
Before:
real 0m35.314s
user 0m34.030s
sys 0m1.280s
Ater:
real 0m26.729s
user 0m25.420s
sys 0m1.310s
2012-12-06 22:08:33 +09:00
Hieu Hoang
55f65c3104
race condition in chart decoding with -T arg
2012-12-03 14:57:33 +00:00
phikoehn
ab2effb6fe
train MML in-/out-of-domain language models with same vocabulary
2012-12-01 13:46:59 +00:00