The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.
Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns. If the implementation is not good enough, we can
now change it in a single place.
To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.
The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly. Util::rand(), for wide types like size_t, calls std::rand() not
once but twice. This behaviour was generalized into utils::wide_rand() and
friends.
Some places in mert use srandom()/random(), but these are POSIX-specific.
The standard alternative, srand()/rand(), is not thread-safe. This module
wraps srand()/rand() in mutexes (very short-lived, so should not cost much)
so that it relies on just Boost and the C standard library, not on a Unix-like
environment.
This may reduce the width of the random numbers on some platforms: it goes
from "long int" to just "int". If that is a problem, we may have to use
Boost's randomizer utilities, or eventually, the C++ ones.
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
This commit fixes compilation problems related to
snprintf() for Windows users.
Thanks to Raka Prasetya for reporting the errors.
Thanks also to Kenneth Heafield and Barry Haddow for suggestions.
This commit fixes compilation problems related to
snprintf() for Windows users.
Thanks to Raka Prasetya for reporting the errors.
Thanks also to Kenneth Heafield and Barry Haddow for suggestions.
- Use the function in Data::InitFeatureMap().
- Add an unit test for InitFeatureMap().
- Move helper functions for Data::loadnbest() to public for unit testing.
- Use the function in Data::InitFeatureMap().
- Add an unit test for InitFeatureMap().
- Move helper functions for Data::loadnbest() to public for unit testing.
Squashed commit of the following:
- Clean up PRO.
- Clean up ScoreStats.
- Clean up ScoreData.
- Clean up ScoreArray.
- Remove unnecessary headers.
- Clean up ScopedVector.
- Clean up Point.
- Clean up PerScorer.
- Clean up Optimizer.
- Clean up MergeScorer.
- Clean up InterpolatedScorer.
- Clean up FileStream.
- Clean up FeatureStats.
- Remove inefficient string concatenation.
- Clean up FeatureData.
- Clean up FeatureArray.
- Clean up Data.
Squashed commit of the following:
- Clean up PRO.
- Clean up ScoreStats.
- Clean up ScoreData.
- Clean up ScoreArray.
- Remove unnecessary headers.
- Clean up ScopedVector.
- Clean up Point.
- Clean up PerScorer.
- Clean up Optimizer.
- Clean up MergeScorer.
- Clean up InterpolatedScorer.
- Clean up FileStream.
- Clean up FeatureStats.
- Remove inefficient string concatenation.
- Clean up FeatureData.
- Clean up FeatureArray.
- Clean up Data.