Commit Graph

74 Commits

Author SHA1 Message Date
Hieu Hoang
cc8c6b7b10 beautify 2015-05-02 11:45:24 +01:00
Matthias Huck
4ee8f2dec1 sentence-bleu less greedy regarding memory
Don't load all references, read them line by line.
Corpora with millions of sentences can now be evaluated without consuming gigabytes of RAM.
2015-04-30 22:26:30 +01:00
Matthias Huck
34d1d3a904 sentence-bleu-nbest 2015-04-30 19:44:29 +01:00
Jeroen Vermeulen
789a2e2bc3 Fix some compile warnings (gcc 4.9.2).
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.

There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
2015-03-29 18:10:51 +07:00
Rico Sennrich
3d00e5dc8c basic support for more metrics with kbmira
metrics need getReferenceLength (for background smoothing) to work with kbmira
2014-09-22 10:49:20 +01:00
Rico Sennrich
6810b225cc calculateScore with float (for smoothing support) 2014-09-22 10:49:20 +01:00
Barry Haddow
efee2695c3 Merge 08811deb17337356cd8dae9c59c0160590679a35 from joshua 2014-07-21 11:04:43 +01:00
Hieu Hoang
d9be81596e replace CHECK with UTIL_THROW_IF in mert 2013-11-18 18:13:10 +00:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
Barry Haddow
9ca364fb22 Implement brevity penalty smoothing for PRO
As in Nakov et al (Coling 2012)
2013-02-18 11:11:20 +00:00
Tetsuo Kiso
2a3c9fc679 Further optimization for extractor.
Fixes inefficient updating N-gram counts.

NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Tetsuo Kiso
8fdec9bf30 Use boost::unordered_map instead of std::map.
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
ac045a11c1 Speed up N-gram counts when running extractor.
By replacing std::map with boost::unordered_map.

Runtime of extractor on 100-best lists of 2679 sentences:

Before:
real    0m35.314s
user    0m34.030s
sys     0m1.280s

Ater:
real    0m26.729s
user    0m25.420s
sys     0m1.310s
2012-12-06 22:08:33 +09:00
Tetsuo Kiso
cccfb9a0c9 Using namespace std in a header file pollutes the global namespace.
Using directives should be put into the implementation files.
2012-11-05 00:43:36 +09:00
Barry Haddow
2b4e61d826 Merge branch 'trunk' into miramerge
Compiles, not tested.

Conflicts:
	Jamroot
	OnDiskPt/PhraseNode.h
	OnDiskPt/TargetPhrase.cpp
	OnDiskPt/TargetPhrase.h
	OnDiskPt/TargetPhraseCollection.cpp
	mert/BleuScorer.cpp
	mert/Data.cpp
	mert/FeatureData.cpp
	moses-chart-cmd/src/Main.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartManager.cpp
	moses/src/LM/Ken.cpp
	moses/src/LM/Ken.h
	moses/src/LMList.h
	moses/src/LexicalReordering.h
	moses/src/PhraseDictionaryTree.h
	moses/src/ScoreIndexManager.h
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/Word.cpp
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-07-17 13:36:50 +01:00
Hieu Hoang
e3dd3a8d2c namespace all classes in mert directory 2012-06-30 20:23:45 +01:00
Hieu Hoang
0cb63edcb9 merge Lexi Birch's LRScore from mert_mtm5 branch. Compiles and run. Hack, must double check with barry or lexi 2012-06-23 22:51:48 -04:00
Eva Hasler
e1c1a5343c merge 2012-06-07 11:16:52 +01:00
Eva Hasler
6a6a35c65e fix start weights in experiment.perl, add hypothesis queue for picking hope and fear translations, add variations to 1slack formulation 2012-06-01 01:49:42 +01:00
Colin Cherry
fd577d7a65 Batch k-best MIRA is written and integrated into mert-moses.pl
Regression tests all check out, and kbmira seems to work fine
on a Hansard French->English task.

HypPackEnumerator class may be of interest to pro.cpp and future
optimizers, as it abstracts a lot of the boilerplate involved in
enumerating multiple k-best lists.

MiraWeightVector is not really mira-specific - just a weight vector
that enables efficient averaging. Could be useful to a perceptron
as well. Same goes for MiraFeatureVector.

Interaction with sparse features is written, but untested.
2012-05-29 13:38:57 -04:00
Eva Hasler
30deedde9f changed permission, everything changed.. 2012-05-10 18:54:24 +01:00
Tetsuo Kiso
9c9d88a78a Avoid "using namespace std" in headers. 2012-05-10 07:51:05 +09:00
Matous Machacek
440650bd6e Added support for external unix filters to preprocess sentences in mert and evaluator 2012-05-09 19:21:41 +02:00
Eva
6c2a58a48e clean up mira, add sampling from hope/model/fear 2012-04-29 21:29:18 -07:00
Eva
6f39ad0b3e test 2012-04-28 23:11:30 -07:00
Tetsuo Kiso
d034eeb703 Add test cases for BLEU and sentence-level BLEU+1.
- Move a definition of sentenceLevelBleuPlusOne() from pro.cpp
  to BleuScorer.cpp.
- Add check for the length of an input vector.
2012-04-07 01:02:32 +09:00
Tetsuo Kiso
eaa0ab486a Add a test case for BLEU's clipped counts.
- Make BleuScorer::setReferenceFiles() more testable by
  adding OpenReference() and OpenReferenceStream().
2012-04-04 22:33:30 +09:00
Tetsuo Kiso
8987fed667 Add thread unsafe Singleton class.
- Add Vocabulary factory and the unit test.
- Remove Scorer::ClearVocabulary().
2012-03-20 05:49:10 +09:00
Tetsuo Kiso
525f06452c Change the Encoder class to Vocabulary.
- Introduce the namespace to avoid naming collisions. The class name
  is used in KenLM.
- Add the unit test.
2012-03-20 03:43:04 +09:00
Tetsuo Kiso
2b28072f7a Move Encoder class from Scorer.h to Ngram.h.
To add unit tests.
2012-03-19 23:21:02 +09:00
Tetsuo Kiso
f686e8771a Add some functions to BleuScorer for unit testing.
This commit also includes
- Fix typo.
- Fix indentations.
- Add 'const' to Scorer::applyFactors().
2012-03-19 22:45:15 +09:00
Tetsuo Kiso
6b95a19eda Create Reference class to clean up BleuScorer.
- Add an unit test for Reference.
- Move functions to calculate the reference length from
  BleuScorer to Reference.
2012-03-18 05:58:40 +09:00
Tetsuo Kiso
c6536a134b Clean up BleuScorer. 2012-03-14 22:44:51 +09:00
Tetsuo Kiso
5007f129d8 Clean up BleuScorer with lookup(). 2012-03-14 22:41:29 +09:00
Tetsuo Kiso
fba01c7cdf Create a header file for NgramCounts class.
The reason is that we want to add the unit test.
2012-03-14 22:14:11 +09:00
Tetsuo Kiso
ed6e6f00b1 Minor change for calculating BLEU.
To avoid defining the similar variables twice to calculate
document-wise BLEU and sentence-wise BLEU scores.
2012-03-10 02:49:31 +09:00
Matous Machacek
ba987c94ba Support for using factors in mert and evaluator
example:
Use --factor "0|2" to use only first and third factor from nbest list and from reference.
If you use interpolated scorer, separate records with comma (e.g. --factor "0|2,1").
2012-02-28 02:27:23 +01:00
Tetsuo Kiso
c26e83fd09 Remove obsolete and unused logging statements. 2012-02-26 02:19:40 +09:00
Tetsuo Kiso
224c654fa5 Don't repeat calling functions many times.
Consider using constants the result if it is possible.
2012-02-26 02:12:59 +09:00
Tetsuo Kiso
669b9d9c7a Minor change the logging utility for n-gram counts.
Use std::ostream instead of directly using std::cerr.
2012-02-26 02:01:03 +09:00
Tetsuo Kiso
8e0a61d0d7 Clean up calculation effective reference length. 2012-02-26 01:54:51 +09:00
Tetsuo Kiso
c4fa8a3865 Add a more efficient member to set up ScoreStats.
- Remove unnecessary conversions.

- Add 'const' to local variables.
2012-02-26 01:41:17 +09:00
Tetsuo Kiso
2c2bd63bbd Replace string objects with const char[]. 2012-02-26 01:18:08 +09:00
Tetsuo Kiso
17f06a3250 Hide the implementation details of Ngram counts from the header. 2012-02-26 01:11:56 +09:00
Tetsuo Kiso
0c9023abc6 Clean up commented out code snippets for debugging purposes. 2012-02-25 18:14:00 +09:00
Tetsuo Kiso
17e864e446 Create private class to encapssulate encoding process.
Instead of using typedefs inside a class only,
it might be better to create a private class to do same things.
2012-02-01 21:19:25 +09:00
Tetsuo Kiso
b19e7777ce Add prefix 'm_' to private and protected members in Scorer classes. 2012-02-01 20:54:20 +09:00
Tetsuo Kiso
30fa97e404 Move reference length type into a private member of BleuScorer.
The reason is that the type is used as internal purpose.
2012-02-01 20:24:48 +09:00
Tetsuo Kiso
3ef03a77c4 Change casts to C++ style casts. 2012-02-01 18:13:00 +09:00
Tetsuo Kiso
142342f8be Change casts to C++ style casts, and delete unnecessary casts. 2012-02-01 17:17:58 +09:00