Hieu Hoang
cc8c6b7b10
beautify
2015-05-02 11:45:24 +01:00
Matthias Huck
4ee8f2dec1
sentence-bleu less greedy regarding memory
...
Don't load all references, read them line by line.
Corpora with millions of sentences can now be evaluated without consuming gigabytes of RAM.
2015-04-30 22:26:30 +01:00
Matthias Huck
34d1d3a904
sentence-bleu-nbest
2015-04-30 19:44:29 +01:00
Jeroen Vermeulen
789a2e2bc3
Fix some compile warnings (gcc 4.9.2).
...
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.
There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
2015-03-29 18:10:51 +07:00
Rico Sennrich
3d00e5dc8c
basic support for more metrics with kbmira
...
metrics need getReferenceLength (for background smoothing) to work with kbmira
2014-09-22 10:49:20 +01:00
Rico Sennrich
6810b225cc
calculateScore with float (for smoothing support)
2014-09-22 10:49:20 +01:00
Barry Haddow
efee2695c3
Merge 08811deb17337356cd8dae9c59c0160590679a35 from joshua
2014-07-21 11:04:43 +01:00
Hieu Hoang
d9be81596e
replace CHECK with UTIL_THROW_IF in mert
2013-11-18 18:13:10 +00:00
Hieu Hoang
6249432407
beautify
2013-05-29 18:16:15 +01:00
Barry Haddow
9ca364fb22
Implement brevity penalty smoothing for PRO
...
As in Nakov et al (Coling 2012)
2013-02-18 11:11:20 +00:00
Tetsuo Kiso
2a3c9fc679
Further optimization for extractor.
...
Fixes inefficient updating N-gram counts.
NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Tetsuo Kiso
8fdec9bf30
Use boost::unordered_map instead of std::map.
...
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
ac045a11c1
Speed up N-gram counts when running extractor.
...
By replacing std::map with boost::unordered_map.
Runtime of extractor on 100-best lists of 2679 sentences:
Before:
real 0m35.314s
user 0m34.030s
sys 0m1.280s
Ater:
real 0m26.729s
user 0m25.420s
sys 0m1.310s
2012-12-06 22:08:33 +09:00
Tetsuo Kiso
cccfb9a0c9
Using namespace std in a header file pollutes the global namespace.
...
Using directives should be put into the implementation files.
2012-11-05 00:43:36 +09:00
Barry Haddow
2b4e61d826
Merge branch 'trunk' into miramerge
...
Compiles, not tested.
Conflicts:
Jamroot
OnDiskPt/PhraseNode.h
OnDiskPt/TargetPhrase.cpp
OnDiskPt/TargetPhrase.h
OnDiskPt/TargetPhraseCollection.cpp
mert/BleuScorer.cpp
mert/Data.cpp
mert/FeatureData.cpp
moses-chart-cmd/src/Main.cpp
moses/src/AlignmentInfo.h
moses/src/ChartManager.cpp
moses/src/LM/Ken.cpp
moses/src/LM/Ken.h
moses/src/LMList.h
moses/src/LexicalReordering.h
moses/src/PhraseDictionaryTree.h
moses/src/ScoreIndexManager.h
moses/src/StaticData.h
moses/src/TargetPhrase.h
moses/src/Word.cpp
scripts/ems/experiment.meta
scripts/ems/experiment.perl
scripts/training/train-model.perl
2012-07-17 13:36:50 +01:00
Hieu Hoang
e3dd3a8d2c
namespace all classes in mert directory
2012-06-30 20:23:45 +01:00
Hieu Hoang
0cb63edcb9
merge Lexi Birch's LRScore from mert_mtm5 branch. Compiles and run. Hack, must double check with barry or lexi
2012-06-23 22:51:48 -04:00
Eva Hasler
e1c1a5343c
merge
2012-06-07 11:16:52 +01:00
Eva Hasler
6a6a35c65e
fix start weights in experiment.perl, add hypothesis queue for picking hope and fear translations, add variations to 1slack formulation
2012-06-01 01:49:42 +01:00
Colin Cherry
fd577d7a65
Batch k-best MIRA is written and integrated into mert-moses.pl
...
Regression tests all check out, and kbmira seems to work fine
on a Hansard French->English task.
HypPackEnumerator class may be of interest to pro.cpp and future
optimizers, as it abstracts a lot of the boilerplate involved in
enumerating multiple k-best lists.
MiraWeightVector is not really mira-specific - just a weight vector
that enables efficient averaging. Could be useful to a perceptron
as well. Same goes for MiraFeatureVector.
Interaction with sparse features is written, but untested.
2012-05-29 13:38:57 -04:00
Eva Hasler
30deedde9f
changed permission, everything changed..
2012-05-10 18:54:24 +01:00
Tetsuo Kiso
9c9d88a78a
Avoid "using namespace std" in headers.
2012-05-10 07:51:05 +09:00
Matous Machacek
440650bd6e
Added support for external unix filters to preprocess sentences in mert and evaluator
2012-05-09 19:21:41 +02:00
Eva
6c2a58a48e
clean up mira, add sampling from hope/model/fear
2012-04-29 21:29:18 -07:00
Eva
6f39ad0b3e
test
2012-04-28 23:11:30 -07:00
Tetsuo Kiso
d034eeb703
Add test cases for BLEU and sentence-level BLEU+1.
...
- Move a definition of sentenceLevelBleuPlusOne() from pro.cpp
to BleuScorer.cpp.
- Add check for the length of an input vector.
2012-04-07 01:02:32 +09:00
Tetsuo Kiso
eaa0ab486a
Add a test case for BLEU's clipped counts.
...
- Make BleuScorer::setReferenceFiles() more testable by
adding OpenReference() and OpenReferenceStream().
2012-04-04 22:33:30 +09:00
Tetsuo Kiso
8987fed667
Add thread unsafe Singleton class.
...
- Add Vocabulary factory and the unit test.
- Remove Scorer::ClearVocabulary().
2012-03-20 05:49:10 +09:00
Tetsuo Kiso
525f06452c
Change the Encoder class to Vocabulary.
...
- Introduce the namespace to avoid naming collisions. The class name
is used in KenLM.
- Add the unit test.
2012-03-20 03:43:04 +09:00
Tetsuo Kiso
2b28072f7a
Move Encoder class from Scorer.h to Ngram.h.
...
To add unit tests.
2012-03-19 23:21:02 +09:00
Tetsuo Kiso
f686e8771a
Add some functions to BleuScorer for unit testing.
...
This commit also includes
- Fix typo.
- Fix indentations.
- Add 'const' to Scorer::applyFactors().
2012-03-19 22:45:15 +09:00
Tetsuo Kiso
6b95a19eda
Create Reference class to clean up BleuScorer.
...
- Add an unit test for Reference.
- Move functions to calculate the reference length from
BleuScorer to Reference.
2012-03-18 05:58:40 +09:00
Tetsuo Kiso
c6536a134b
Clean up BleuScorer.
2012-03-14 22:44:51 +09:00
Tetsuo Kiso
5007f129d8
Clean up BleuScorer with lookup().
2012-03-14 22:41:29 +09:00
Tetsuo Kiso
fba01c7cdf
Create a header file for NgramCounts class.
...
The reason is that we want to add the unit test.
2012-03-14 22:14:11 +09:00
Tetsuo Kiso
ed6e6f00b1
Minor change for calculating BLEU.
...
To avoid defining the similar variables twice to calculate
document-wise BLEU and sentence-wise BLEU scores.
2012-03-10 02:49:31 +09:00
Matous Machacek
ba987c94ba
Support for using factors in mert and evaluator
...
example:
Use --factor "0|2" to use only first and third factor from nbest list and from reference.
If you use interpolated scorer, separate records with comma (e.g. --factor "0|2,1").
2012-02-28 02:27:23 +01:00
Tetsuo Kiso
c26e83fd09
Remove obsolete and unused logging statements.
2012-02-26 02:19:40 +09:00
Tetsuo Kiso
224c654fa5
Don't repeat calling functions many times.
...
Consider using constants the result if it is possible.
2012-02-26 02:12:59 +09:00
Tetsuo Kiso
669b9d9c7a
Minor change the logging utility for n-gram counts.
...
Use std::ostream instead of directly using std::cerr.
2012-02-26 02:01:03 +09:00
Tetsuo Kiso
8e0a61d0d7
Clean up calculation effective reference length.
2012-02-26 01:54:51 +09:00
Tetsuo Kiso
c4fa8a3865
Add a more efficient member to set up ScoreStats.
...
- Remove unnecessary conversions.
- Add 'const' to local variables.
2012-02-26 01:41:17 +09:00
Tetsuo Kiso
2c2bd63bbd
Replace string objects with const char[].
2012-02-26 01:18:08 +09:00
Tetsuo Kiso
17f06a3250
Hide the implementation details of Ngram counts from the header.
2012-02-26 01:11:56 +09:00
Tetsuo Kiso
0c9023abc6
Clean up commented out code snippets for debugging purposes.
2012-02-25 18:14:00 +09:00
Tetsuo Kiso
17e864e446
Create private class to encapssulate encoding process.
...
Instead of using typedefs inside a class only,
it might be better to create a private class to do same things.
2012-02-01 21:19:25 +09:00
Tetsuo Kiso
b19e7777ce
Add prefix 'm_' to private and protected members in Scorer classes.
2012-02-01 20:54:20 +09:00
Tetsuo Kiso
30fa97e404
Move reference length type into a private member of BleuScorer.
...
The reason is that the type is used as internal purpose.
2012-02-01 20:24:48 +09:00
Tetsuo Kiso
3ef03a77c4
Change casts to C++ style casts.
2012-02-01 18:13:00 +09:00
Tetsuo Kiso
142342f8be
Change casts to C++ style casts, and delete unnecessary casts.
2012-02-01 17:17:58 +09:00