Commit Graph

4563 Commits

Author SHA1 Message Date
Ulrich Germann
315610c02a Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-27 16:39:40 +01:00
Ulrich Germann
ba6e17ce26 Code reorganization. 2015-04-27 16:31:22 +01:00
Ulrich Germann
37bb1de9ed Unused variable. 2015-04-27 16:30:59 +01:00
Ulrich Germann
fbf8b1f8b8 Code design debizarrification: Indexes of feature functions into the dense vector of all feature
values are now stored on the feature function instead of in a global map that is a static
member of ScoreComponentCollection.
2015-04-26 16:46:36 +01:00
Ulrich Germann
e63561ae7f Unused variable. 2015-04-26 15:41:32 +01:00
Hieu Hoang
41529227b2 boost unique lock 2015-04-26 18:11:11 +04:00
Ulrich Germann
bafe60c3a1 Make sure things work when curl-based biasing is disabled. 2015-04-26 03:14:40 +01:00
Ulrich Germann
0d72cdd72c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/Syntax/F2S/Manager-inl.h
	moses/TranslationModel/UG/mmsapt.cpp
2015-04-26 02:12:16 +01:00
Matthias Huck
55a4789a8b cleanup 2015-04-23 18:38:01 +01:00
Matthias Huck
3920e22c98 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-23 18:25:13 +01:00
Jeroen Vermeulen
8ac91c8d97 Fix unqualified call to rand_excl().
The call needed to be made explicitly to util::rand_excl().  Sorry.
2015-04-24 00:22:25 +07:00
Matthias Huck
bbcc8bf23b Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-23 18:14:40 +01:00
Matthias Huck
f24f31f965 n-best list creation in phrase-based decoding: improved efficiency with sparse features 2015-04-23 18:13:02 +01:00
Jeroen Vermeulen
38d790cac0 Add cross-platform randomizer module.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.

Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns.  If the implementation is not good enough, we can
now change it in a single place.

To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.

The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly.  Util::rand(), for wide types like size_t, calls std::rand() not
once but twice.  This behaviour was generalized into utils::wide_rand() and
friends.
2015-04-23 23:46:04 +07:00
Matthias Huck
7457099f51 SparseReordering: option to use pre-tuned feature weights internally 2015-04-23 17:25:02 +01:00
Jeroen Vermeulen
02d1d9a4af Don't work around missing popen() in MinGW.
Windows does not have popen()/pclose(), so FileHandler.cpp #define's them to
_popen()/_pclose().  But MinGW has similar macros built into <cstdio>, leading
to warnings.  So skip the workaround on MinGW.
2015-04-22 11:24:32 +07:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Ulrich Germann
7603ec95f7 Recognize lexicalized reordering scores on TranslationOption instances provided e.g. by phrase tables. 2015-04-21 17:54:40 +01:00
Ulrich Germann
b60d1427f9 Minor code reformatting. 2015-04-21 17:52:51 +01:00
Ulrich Germann
73b5561fe3 Added member function GetTranslationTask(). 2015-04-21 17:51:19 +01:00
Ulrich Germann
48b3b88c4a Minor edits for code readability. 2015-04-21 17:50:34 +01:00
Ulrich Germann
2c0851099b Work on integrating hierarchical lexicalized reordering models with sampled phrase tables. 2015-04-21 17:48:48 +01:00
Ulrich Germann
0d13edae24 Added entry for bitext-find. 2015-04-21 17:47:39 +01:00
Ulrich Germann
9a9e43ea2c Initial check-in: search utility for bi-concordancing. 2015-04-21 17:47:09 +01:00
Ulrich Germann
e7246686bf New constructor. 2015-04-21 17:46:12 +01:00
Ulrich Germann
1791f47bfb mmBitext now maintains a vector of document names. 2015-04-21 17:43:51 +01:00
Ulrich Germann
8a921f5dc9 Initial check-in. 2015-04-21 17:41:33 +01:00
Ulrich Germann
adc80953e4 Minor edits for better readability. 2015-04-21 17:40:31 +01:00
Ulrich Germann
70f83e5be9 Additions for writing out alignments in yawat format (for kwipc). 2015-04-21 17:39:06 +01:00
Ulrich Germann
f98de4dc83 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-18 18:04:20 +01:00
Ulrich Germann
28d9e55379 Bug fix. 2015-04-18 16:53:57 +01:00
Ulrich Germann
e028eb7847 A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.) 2015-04-17 23:12:32 +01:00
Jeroen Vermeulen
4647986a12 Include winsock2.h on Windows.
This makes socket code build successfully on Windows.
2015-04-18 00:59:40 +07:00
Jeroen Vermeulen
d56f317f2e New helper classes: temp_dir & temp_file.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places!  Encoding the strings is just too
much work.

It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
2015-04-17 22:57:55 +07:00
Barry Haddow
c5e9a58ae2 Hypergraph output shouldn't crash when nbest list in current directory 2015-04-17 16:17:12 +01:00
Jeroen Vermeulen
70b5fee1ed Merge branch 'wrap-mmap'
Replace use of mmap() with the cross-platform wrapper in util.
2015-04-17 18:58:18 +07:00
Jeroen Vermeulen
1e3e445e3f Use cross-platform mmap() wrapper in CompactPT.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific.  But util has a wrapper which also works on Windows.

This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash.  The wrapper
will throw an exception with a helpful error message.
2015-04-17 18:53:46 +07:00
Ales Tamchyna
2180730a58 check word alignment points in VW training to avoid cryptic segfaults 2015-04-16 15:28:23 +02:00
Jeroen Vermeulen
6a4943ca41 Replace deprecated bcopy() with memcpy().
The bcopy() function is POSIX-specific and deprecated.  The recommended
replacement (at least for non-overlapping source and destination ranges)
is memcpy(), which is in the standard C library.

Note that the source and destination parameters are in a different order
between these two functions.
2015-04-16 19:19:34 +07:00
Jeroen Vermeulen
21a93421dc Replace deprecated bzero() with memset().
The bzero() function is POSIX-specific and deprecated.  The recommended
replacement is memset(), which is in the standard C library.
2015-04-16 19:03:57 +07:00
Phil Williams
05b31b53f2 Implement -output-unknowns for search algorithms 7 and 9 (T2S/F2S) 2015-04-13 16:31:58 +01:00
Kenneth Heafield
0698da8b0f log(1 + ...) -> log1p(...) 2015-04-08 10:08:05 -04:00
Jeroen Vermeulen
464615a0c3 Fix some clang++ warnings.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings.  Here's a pair of fixes for the simplest
instances:

moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
  if (path == NULL) {
      ^~~~    ~~~~

(The code unnecessarily checks that an automatic variable has a
 non-null address).

moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
  if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
               ^

(The code tries to cram too much into an "if" condition.)
2015-04-07 22:58:17 +07:00
Ulrich Germann
e110e7df6b Bug fix. 2015-04-05 16:18:09 +01:00
Ulrich Germann
3e2f878576 Merge branch 'master' into mmt-dev
Conflicts:
	Jamroot
	moses/TranslationModel/UG/mmsapt.h
2015-04-05 15:51:50 +01:00
Ulrich Germann
46e31a285c - Code refactoring for Bitext class.
- Bug fixes and conceptual improvements in biased sampling. The sampling now
  tries to stick to the bias, even when an unsuitable corpus dominates
  the occurrences.
2015-04-05 14:29:00 +01:00
Michael Denkowski
66cfd14159 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-03 16:50:19 -04:00
Michael Denkowski
fdf4f5f571 Consistent line ending behavior for alignment printing options 2015-04-03 16:49:41 -04:00
Ulrich Germann
05c4e382ff Better logging during biased sampling in Mmsapt. 2015-04-03 21:12:44 +01:00