This works around a problem when building against MinGW and then running
the resulting Windows binary on WINE. (Perverse, I know.) For some
reason the ftruncate() to 0 bytes succeeds, but the subsequent one to a
larger size fails. Even if the size is just 1 byte.
This happened where GenericModel::InitializeFromARPA called
BinaryFormat::SetupJustVocab, which called MapZeroedWrite, which calls
ResizeOrThrow twice; the second one failed.
* file_piece.cc used isnan() instead of std::isnan().
* Fdstream.h used close() but Windows doesn't have unistd.h.
Fixed Fdstream.h by using util::scoped_fd. Thanks Ken.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.
Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns. If the implementation is not good enough, we can
now change it in a single place.
To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.
The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly. Util::rand(), for wide types like size_t, calls std::rand() not
once but twice. This behaviour was generalized into utils::wide_rand() and
friends.
Some places in mert use srandom()/random(), but these are POSIX-specific.
The standard alternative, srand()/rand(), is not thread-safe. This module
wraps srand()/rand() in mutexes (very short-lived, so should not cost much)
so that it relies on just Boost and the C standard library, not on a Unix-like
environment.
This may reduce the width of the random numbers on some platforms: it goes
from "long int" to just "int". If that is a problem, we may have to use
Boost's randomizer utilities, or eventually, the C++ ones.
This makes temp_file and temp_dir work both on POSIX-like platforms and on
Windows.
It also fixes a bug where the temporary files/directories were created in
the current working directory, instead of in the system's standard
location for temporary files. Unfortunately the Windows and POSIX code
diverge quite a bit on that point.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places! Encoding the strings is just too
much work.
It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific. But util has a wrapper which also works on Windows.
This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash. The wrapper
will throw an exception with a helpful error message.
Add <cstdlib> include for srand()/rand(), and <unistd.h> for open() etc.
Include <unistd.h> on Windows if using MinGW. Disable MeteorScorer on
Windows, since it doesn't have fork() and pipe().
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on. In this
branch the #include names are updated for the util/ subdirectory; more
branches to follow.
C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
Avoid unspecified behavior of mmap when a file is resized reported by Christian Hardmeier
Fixes for Mavericks and a workaround for Boost's broken semaphore
Clean clang compile (of kenlm)
Merged some of 744376b3fb but also undid some of it because it was just masking a fundaemntal problem with pread rather than working around windows limitations