Some FF (Mmsapt, LexicalReordering, Many single-value FF) provide this number during "registration";
when missing, a default weight vector of uniform 1.0 is automatically generated. This eliminates the
need for the user to figure out what the exact number of features is for each FF, which can get complicated,
e.g. in the case of Mmsapt/PhraseDictionaryBitextSampling.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.
Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns. If the implementation is not good enough, we can
now change it in a single place.
To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.
The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly. Util::rand(), for wide types like size_t, calls std::rand() not
once but twice. This behaviour was generalized into utils::wide_rand() and
friends.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings. Here's a pair of fixes for the simplest
instances:
moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
if (path == NULL) {
^~~~ ~~~~
(The code unnecessarily checks that an automatic variable has a
non-null address).
moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
^
(The code tries to cram too much into an "if" condition.)
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.
There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on. In this
branch the #include names are updated for the moses/ subdirectory; more
branches to follow.
C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
Phrase.CreateFromString() and Sentence.CreateFromString(),
as it was never used in those functions anyway ---
Word.CreateFromString() retrieves the factor delimiter
from StaticData directly.
speed-up of decoding depends on how much time is spent in parser:
10-50% speed-up for string-to-tree systems observed (more on long sentences and with high max-chart-span).
if you only use hiero or string-to-tree models (but none with source syntax), use compile-option --unlabelled-source for (small) efficiency gains.