Commit Graph

30 Commits

Author SHA1 Message Date
Jeroen Vermeulen
38d790cac0 Add cross-platform randomizer module.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.

Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns.  If the implementation is not good enough, we can
now change it in a single place.

To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.

The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly.  Util::rand(), for wide types like size_t, calls std::rand() not
once but twice.  This behaviour was generalized into utils::wide_rand() and
friends.
2015-04-23 23:46:04 +07:00
Jeroen Vermeulen
75bfb75882 Thread-safe, platform-agnostic randomizer.
Some places in mert use srandom()/random(), but these are POSIX-specific.
The standard alternative, srand()/rand(), is not thread-safe.  This module
wraps srand()/rand() in mutexes (very short-lived, so should not cost much)
so that it relies on just Boost and the C standard library, not on a Unix-like
environment.

This may reduce the width of the random numbers on some platforms: it goes
from "long int" to just "int".  If that is a problem, we may have to use
Boost's randomizer utilities, or eventually, the C++ ones.
2015-04-22 20:43:29 +07:00
Jeroen Vermeulen
b8793fb788 Address two TODO notes in mert/evaluator.cpp.
The notes were about two objects which were created on the free store
using "new", then cleaned up using "delete".  May have been a Java
habit; the solution was as simple as creating them on the stack.
2015-04-10 13:25:51 +07:00
Jeroen Vermeulen
8a3ae2fd5c Portability and include fixes.
Add <cstdlib> include for srand()/rand(), and <unistd.h> for open() etc.
Include <unistd.h> on Windows if using MinGW.  Disable MeteorScorer on
Windows, since it doesn't have fork() and pipe().
2015-04-10 12:54:34 +07:00
Jeroen Vermeulen
536c6e375f Modernize "C" includes in mert.
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on.  In this
branch the #include names are updated for the mert/ subdirectory; more
branches to follow.

C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
2015-03-28 20:20:58 +07:00
Hieu Hoang
05ead45e71 beautify 2015-01-14 11:07:42 +00:00
Rico Sennrich
84ad576750 explicitly set BLEU as default scorer (for return-best-dev)
(evaluator doesn't accept --scconfig without --sctype)
2014-09-24 14:47:58 +01:00
Rico Sennrich
d39cbca0b9 (optionally) use n-best file for evaluator/return-best-dev
this adds support for metrics that rely on alignment / trees
2014-09-22 10:49:20 +01:00
jiejiang
744376b3fb moses windows build, with some TODO list 2013-12-18 20:15:39 +00:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
Tetsuo Kiso
38e145e556 Use util::TokenIter to tokenize n-best lists.
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
2012-12-07 01:39:22 +09:00
Hieu Hoang
121e258e84 namespace all classes in mert directory 2012-06-30 21:39:10 +01:00
Matous Machacek
440650bd6e Added support for external unix filters to preprocess sentences in mert and evaluator 2012-05-09 19:21:41 +02:00
Tetsuo Kiso
b5bcf48b17 Pass by pointers to Scorer instead of references. 2012-03-10 17:28:38 +09:00
Matous Machacek
ba987c94ba Support for using factors in mert and evaluator
example:
Use --factor "0|2" to use only first and third factor from nbest list and from reference.
If you use interpolated scorer, separate records with comma (e.g. --factor "0|2,1").
2012-02-28 02:27:23 +01:00
Matous Machacek
e3f0280f27 Change of evaluator usage (see mert/evaluator --help). 2012-02-26 23:04:02 +01:00
Tetsuo Kiso
cad03f7a03 Create a struct for command line options in extractor. 2012-02-01 12:23:15 +09:00
Tetsuo Kiso
4d3fd9fd4b Create a wrapper function to init seed.
Move g_bootstrap from a global variable to
a member of struct ProgramOption.
2012-02-01 11:49:26 +09:00
Tetsuo Kiso
1452f88ed5 Create a struct for command line options.
Add a wrapper function to parse the options, too.
2012-02-01 11:27:17 +09:00
Tetsuo Kiso
037af96a6e Create a utility class for mert/evaluator.cpp to avoid name collisions, just in case.
And introduce anonymous namespace for the class and global variables
as well.
2012-01-27 04:06:36 +09:00
Tetsuo Kiso
940dadaa4c Add whitespaces. 2012-01-27 03:39:13 +09:00
Tetsuo Kiso
f9eac588e7 Add prefixe 'g_' to global variables in mert/evaluator.cpp
While the size of mert/evaluator.cpp is still relatively small,
adding the marker to the variables allows us to easily distinguish
them from local variables.
2012-01-27 03:24:51 +09:00
Matous Machacek
5254e7917b mert/evaluator should now compute confidence interval correctly 2012-01-24 21:25:15 +01:00
Matous Machacek
6cbdfc513b fixed bugs in mert/evaluator, nicer printing of results 2012-01-24 19:18:44 +01:00
Matous Machacek
b4a50ec50b mert/evaluator can compute more metrics at once 2012-01-22 01:01:08 +01:00
Tetsuo Kiso
29c16d252a Minimize using #include headers in headers.
Should use it in .cpp files.
2011-11-14 15:15:30 +09:00
Tetsuo Kiso
3d70b2e1a5 Small change: modify initialization of the Data class. 2011-11-12 22:04:22 +09:00
Tetsuo Kiso
664ffe0130 Fix indentation. 2011-11-12 09:24:19 +09:00
Tetsuo Kiso
c2121695c2 Fix memory leaks in mert. 2011-11-11 20:40:59 +09:00
machacekmatous
642e8dce95 Added evaluator to MERT directory. This tool computes a metric score for given candidate and reference files:
evaluator --sctype PER --reference ref.file --candidate cand.file

usage: evaluator [options] --reference ref1[,ref2[,ref3...]] --candidate cand1[,cand2[,cand3...]]
[--sctype|-s] the scorer type (default BLEU)
[--scconfig|-c] configuration string passed to scorer
        This is of the form NAME1:VAL1,NAME2:VAL2 etc
[--reference|-R] comma separated list of reference files
[--candidate|-C] comma separated list of candidate files
[--bootstrap|-b] number of booststraped samples (default 0 - no bootstraping)
[--rseed|-r] the random seed for bootstraping (defaults to system clock)
[--help|-h] print this message and exit


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4153 1f5c12ca-751b-0410-a591-d2e778427230
2011-08-20 15:25:19 +00:00