The case where !m_fixed passed m_map_size to mmap(), but the "else"
clause passed map_size. In replacing mmap() with the portable wrapper,
I accidentally changed that to be m_map_size as well.
Besides fixing that, I'm changing the name of the variable to be more
clearly distinguishable from m_map_size.
Some FF (Mmsapt, LexicalReordering, Many single-value FF) provide this number during "registration";
when missing, a default weight vector of uniform 1.0 is automatically generated. This eliminates the
need for the user to figure out what the exact number of features is for each FF, which can get complicated,
e.g. in the case of Mmsapt/PhraseDictionaryBitextSampling.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.
Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns. If the implementation is not good enough, we can
now change it in a single place.
To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.
The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly. Util::rand(), for wide types like size_t, calls std::rand() not
once but twice. This behaviour was generalized into utils::wide_rand() and
friends.
Windows does not have popen()/pclose(), so FileHandler.cpp #define's them to
_popen()/_pclose(). But MinGW has similar macros built into <cstdio>, leading
to warnings. So skip the workaround on MinGW.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.
In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions. In theory this may help performance, but it's mainly for clarity.
The comments are based on reverse-engineering, and the unit tests are based
on the comments. It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!
I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.