The accuracy of getrusage() is limited by the resolution
of software clock as described in
http://www.kernel.org/doc/man-pages/online/pages/man7/time.7.html
The assertions required a timer with microsecond accuracy.
However, we don't necessarily want the timer, and we don't
want to add some time-consuming processes to the test code because
we normally build programs again and again, which means
we want to run unit tests as quickly as possible.
example:
Use --factor "0|2" to use only first and third factor from nbest list and from reference.
If you use interpolated scorer, separate records with comma (e.g. --factor "0|2,1").
In the commit 4b6232b757,
I thought I had fixed the bug around the tuning on a subset of
features by checking whether pdim and the length of the
active features which you want to optimize in the tuning.
However, it was wrong. I should set Point::optindices
appropriately according to specified the subset.
- The Scorer and ScoreData objects allocated by the new
operator are now released using the ScopedVector class.
- Add 'virtual' to inherited functions from the Scorer
class.
- Add a high resolution timing function to measure the
wall-clock time by gettimeofday().
- Now the Timer class use getrusage() to measure the elapsed
CPU time as KenLM does.
- Revive Timer::restart().
- Add Timer::ToString() for reporting the detail statistics
as well as for debugging.
- Add a simple unit test for Timer.
example: to interpolate BLEU and CDER use --sctype=BLEU,CDER
to specify weights use --scconfig=weights:0.3+0.7
This scorer should replace MergeScorer (which requires mert-moses-multi.pl) soon.
Interpolated scorer is more universal and is used in the same way as other scorers.
This change might be useful to avoid duplicating the names.
The reason is that although MERT programs are standalone
applications, some header files such as data.h and
point.h have common guard macro names like "DATA_H" and
"POINT_H", and this is not good naming conventions
when you want to include external headers.
Some files actually include headers in Moses and KenLM's util.
We do not allow clients to access the following variables.
Instead, use the APIs which we provide for that.
Also, remove the unused function, and fix smoke tests.
When tokenizing a string delimited by spaces (say, "9 9 8 7 ")
with Tokenize(), resulting a sequence of strings are
{"9", "9", "8", "7", "" }, which is different
from we have expected. We are not interested in empty strings.
This commit fix this issue, and add unit tests for
the tokenize functions.
Replace README with a bunch of shell script
for smoke testing of MERT.
The README file was not a typical README file.
It was like a sample script to run mert and
extractor, so I renamed it as smoke tests stuff.