Commit Graph

995 Commits

Author SHA1 Message Date
Ulrich Germann
98c03dc047 Merge branch 'mmt-dev' into ranked-sampling.
Conflicts:
	moses/TranslationModel/UG/mm/ug_bitext.h
	moses/TranslationModel/UG/mm/ug_bitext_jstats.cc
	moses/TranslationModel/UG/mm/ug_bitext_jstats.h
	moses/TranslationModel/UG/mm/ug_bitext_pstats.cc
	moses/TranslationModel/UG/mm/ug_sampling_bias.cc
	moses/TranslationModel/UG/mm/ug_sampling_bias.h
2015-07-03 15:30:33 +01:00
Ulrich Germann
9dae3eb785 Code cleanup. 2015-07-03 15:11:21 +01:00
Ulrich Germann
f78bb4a6e9 Bigger K-best list to accommodate phrase extraction failure. 2015-07-02 23:56:53 +01:00
Ulrich Germann
1c25b29ebb Show from which documents phrase translations were collected. 2015-07-02 23:55:14 +01:00
Ulrich Germann
64ec34df5d Proper indentation with spaces (no tabs). 2015-07-02 23:49:00 +01:00
Ulrich Germann
b05ca8cb80 Fixes to make code compile on various versions of gcc. 2015-07-02 18:06:55 +01:00
Ulrich Germann
e94921dc44 Removal of 'using namespace ...' from several header files. 2015-07-02 01:32:34 +01:00
Ulrich Germann
61067b4fa5 Merge branch 'FF_ttptr' of http://github.com/moses-smt/mosesdecoder 2015-07-01 13:25:26 +01:00
Ulrich Germann
41a11dfe8a Allow ports other than 80 as the server ports for the context bias server. 2015-06-25 18:37:28 +01:00
XapaJIaMnu
fcc9bb1e60 when using the suffix array PT, set the ttask in the targetPhrase 2015-06-25 16:52:14 +01:00
XapaJIaMnu
a3ecd9f2a7 Revert "Break everything by trying to add ttasksptr to TargetPhrase" and try an easier approach
This reverts commit afdc1b480e.
2015-06-25 15:47:39 +01:00
XapaJIaMnu
afdc1b480e Break everything by trying to add ttasksptr to TargetPhrase 2015-06-25 15:47:17 +01:00
Ulrich Germann
22cc22064c Changed implementation of indocs (to keep track of which documents phrases come from) from vector to map. 2015-06-25 15:43:13 +01:00
XapaJIaMnu
47a488767e Enable the bias weights to be (re)set by the server. 2015-06-25 13:12:33 +01:00
Ulrich Germann
78b2810cfe Allow context server to use ports other than 80. 2015-06-24 18:09:22 +01:00
XapaJIaMnu
5a0168a6fa forgot to negate a condition 2015-06-23 17:27:01 +01:00
XapaJIaMnu
e50926abf6 Enable the Suffix array to get context_weights from command line 2015-06-23 16:58:58 +01:00
MosesAdmin
e57ca5ec34 daily automatic beautifier 2015-06-22 00:00:43 +01:00
Marcin Junczys-Dowmunt
58f0187e8b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-06-21 19:24:53 +02:00
Marcin Junczys-Dowmunt
6151003c13 Remove C++11 oddities 2015-06-21 19:24:43 +02:00
Hieu Hoang
0f943dd9c1 clang compile errors 2015-06-21 21:16:12 +04:00
Marcin Junczys-Dowmunt
1bd10e104c workaround/cleaning for weird copy-constructor behaviour with C++11 2015-06-21 18:27:56 +02:00
Ulrich Germann
65bd46df65 Added feature with cumulative bias. 2015-06-19 21:50:01 +01:00
Phil Williams
90470e878d Fix some C++11-related compilation errors (clang) 2015-06-19 15:58:14 +01:00
Ulrich Germann
a627fd3cc6 Bug fix: set_bias_for_ranking needs to lock. 2015-06-15 14:22:32 +01:00
Ulrich Germann
9d46c5efa1 Rearrangement of members to match initialization order. 2015-06-15 14:20:45 +01:00
Ulrich Germann
4582e43473 Merge branch 'master' into ranked-sampling 2015-06-08 15:45:04 +01:00
Ulrich Germann
bca0f651da Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-06-08 15:44:37 +01:00
Ulrich Germann
3a5acb56cc Added some logging messages. 2015-06-08 15:32:53 +01:00
Ulrich Germann
b2a3bd280e Allow intrusive pointers to const objects. 2015-06-08 14:10:48 +01:00
Ulrich Germann
2f125eddc3 Bug fix. Readability. 2015-06-08 14:08:35 +01:00
Ulrich Germann
5e2e63f678 Integration of ranked sampling. 2015-06-08 14:06:54 +01:00
Ulrich Germann
5dc9d68d2d Initial check-in. 2015-06-08 14:05:41 +01:00
Ulrich Germann
78b0aab65b Work in progress. 2015-06-08 14:04:19 +01:00
Ulrich Germann
36c3f9dda8 Work in progress. Bug fix (release pstats in deconstructor!). Various other changes. 2015-06-08 14:03:20 +01:00
Ulrich Germann
d34b107b91 Initial check-in. 2015-06-08 14:00:31 +01:00
Ulrich Germann
69f15d0c5a New member function wait that won't return until sampling is done. 2015-06-08 14:00:17 +01:00
Ulrich Germann
ac99ec519f Have SentenceBias keep track of document ids. 2015-06-08 13:53:39 +01:00
Ulrich Germann
ff97627e30 Update to emacs variables at top. 2015-06-08 13:52:34 +01:00
Ulrich Germann
4fcb9b98f7 Keeping track of cumulative bias scores. 2015-06-08 13:51:54 +01:00
Ulrich Germann
f1de677530 SentenceBias now has access to mapping from sentence IDs to document IDs. 2015-06-08 13:50:37 +01:00
Ulrich Germann
3c767fc333 New field to store cumulative bias scores. 2015-06-08 13:48:40 +01:00
Ulrich Germann
e8a4a9b10a New member function to expose mapping from sentence IDs to document ids. 2015-06-08 13:45:51 +01:00
Ulrich Germann
d5b0ec7562 Initial check-in. 2015-06-08 12:20:25 +01:00
Ulrich Germann
7a57ce4dc2 Missing #pragma once. 2015-06-06 13:22:04 +01:00
Ulrich Germann
56dee1d4ac Bug fixes: missing #include and const declaration of find_trg_phr-bounds(). 2015-06-06 13:21:33 +01:00
Ulrich Germann
d4234847cd Added #include. 2015-06-05 22:51:58 +01:00
Ulrich Germann
f87f123366 Added member function find_trg_phrase_bound(PhraseExtractionRecord& rec) to Bitext class. 2015-06-05 22:50:17 +01:00
Ulrich Germann
8ae2894107 Initial check-in. 2015-06-05 22:29:26 +01:00
Ulrich Germann
53752f70a7 Added member function find_trg_phr_bound(PhraseExtractionRecord& rec). 2015-06-05 22:28:02 +01:00
Ulrich Germann
c7fffab82c Bug fixes. 2015-06-05 22:27:10 +01:00
Ulrich Germann
8a547ea82f Added missing #include. 2015-06-05 22:25:49 +01:00
Ulrich Germann
704432cf0f Bug fixes. 2015-06-05 22:25:13 +01:00
Ulrich Germann
623eb7bb77 Instantiation of btfix via boost::intrusive_ptr in Mmsapt.
This is in preparation for distinct bitext samplers which need to
ensure the lifetime of the bitext while sampling.
2015-06-05 21:15:47 +01:00
Ulrich Germann
e8ee56876e Initial check-in. 2015-06-05 17:24:53 +01:00
Ulrich Germann
8f4b2afe26 #include a few more things. 2015-06-05 16:30:07 +01:00
Ulrich Germann
1b4b3a5103 Mmsapt: btfix now instatiated via intrusive pointer
... to prevent deletion while Mmsapt is live.
2015-06-05 16:27:49 +01:00
Ulrich Germann
47fa99b61b Added member function size() to LRU_Cache. 2015-06-05 16:26:47 +01:00
Ulrich Germann
243a6a8b3b Added #define for intrusive pointer. 2015-06-05 16:23:00 +01:00
Ulrich Germann
576c743aee Simplified #include. 2015-06-05 16:22:03 +01:00
Ulrich Germann
5cb1d95e09 Added member function for retrieving nbest list items without sorting. 2015-06-05 16:21:09 +01:00
Ulrich Germann
5a56a5b496 Added target for forced relinking only (no forced recompilation); temporarily disabled tcmalloc. 2015-06-05 16:20:08 +01:00
Ulrich Germann
83fa1b6a88 Initial check-in. 2015-06-03 12:59:32 +01:00
Ulrich Germann
0afe139810 Initial check-in. 2015-06-03 12:55:58 +01:00
Ulrich Germann
debdd21899 Optional initialization of SentenceBias. 2015-06-03 12:53:38 +01:00
Ulrich Germann
f024eede74 Added ca() as short replacement for approxOccurrenceCount() to tsa_tree_iterator. 2015-06-03 12:51:44 +01:00
Hieu Hoang
3ea5faead8 codelite 2015-06-02 21:44:58 +04:00
Jeroen Vermeulen
35cf55d4d2 Trailing spaces. 2015-06-02 15:03:18 +07:00
Ulrich Germann
d62d2dc95f Bug fix. 2015-06-01 23:10:50 +01:00
Ulrich Germann
aa4eed93d5 Bug fix related to getting rid of using namespace std; . 2015-06-01 18:55:40 +01:00
Ulrich Germann
cc800742b1 Updated Makefile for local compiles. 2015-06-01 18:26:27 +01:00
Ulrich Germann
99896cfd2c Untangling bitext class from Moses dependencies, so that the class can be used
independently of Moses again.
2015-06-01 18:25:04 +01:00
Ulrich Germann
349163f3fd Bug fix and in-line code documentation. 2015-06-01 18:21:52 +01:00
Ulrich Germann
25f98a446e Bug fix in building imTtrack directly from input stream. 2015-06-01 18:19:34 +01:00
Ulrich Germann
c82ee9a4e9 Bug fix. 2015-05-24 16:44:41 +01:00
Ulrich Germann
da052b7f2b Removed dependency on libcurlpp, as it was difficult to link that staticly. 2015-05-24 16:05:14 +01:00
Ulrich Germann
dcb8e5d3e0 Preparation for allowing context-aware decoding. 2015-05-19 02:35:39 +01:00
Hieu Hoang
39139e7a64 beautify. 2015-05-15 18:09:38 +01:00
Marcin Junczys-Dowmunt
7652ab9118 quick fix for out-of-bound alignment points 2015-05-15 09:12:51 +02:00
Jeroen Vermeulen
0859e9a844 Remove trailing whitespace from C++ files. 2015-05-13 17:05:43 +07:00
Jeroen Vermeulen
1364a7d599 Fix typo in mmap call.
The case where !m_fixed passed m_map_size to mmap(), but the "else"
clause passed map_size.  In replacing mmap() with the portable wrapper,
I accidentally changed that to be m_map_size as well.

Besides fixing that, I'm changing the name of the variable to be more
clearly distinguishable from m_map_size.
2015-05-12 09:58:47 +07:00
Ulrich Germann
7da7ce52da Added context buffering in IOWrapper for context-sensitive decoding.
Unfortunately, this seems to slow things down quite a bit.
2015-05-11 00:34:24 +01:00
Ulrich Germann
db5ccff364 Tweaks to logging for biased sampling. 2015-05-11 00:33:21 +01:00
Ulrich Germann
1778238d73 Logging of latency of bias lookup via server. 2015-05-11 00:32:20 +01:00
Ulrich Germann
8a174beb44 Additional check for document map if document bias is requested. 2015-05-11 00:30:32 +01:00
Nicola Bertoldi
90a982e579 merge remote into local 2015-05-04 09:42:44 +02:00
Nicola Bertoldi
c4f04670c2 made ProbingPT constructor compliant with PhraseDictionary signature 2015-05-04 09:25:50 +02:00
Hieu Hoang
cc8c6b7b10 beautify 2015-05-02 11:45:24 +01:00
Jeroen Vermeulen
eca5824100 Remove trailing whitespace in C++ files. 2015-04-30 12:05:11 +07:00
Ulrich Germann
324b1a9b56 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-29 20:20:54 +01:00
Ulrich Germann
e4f5c69109 One step closer to eliminating the requirement to provide num-features=... in the config file.
Some FF (Mmsapt, LexicalReordering, Many single-value FF) provide this number during "registration";
when missing, a default weight vector of uniform 1.0 is automatically generated. This eliminates the
need for the user to figure out what the exact number of features is for each FF, which can get complicated,
e.g. in the case of Mmsapt/PhraseDictionaryBitextSampling.
2015-04-29 20:16:52 +01:00
Ulrich Germann
c76f1c338d Uninitialized variable. 2015-04-29 20:16:43 +01:00
Jeroen Vermeulen
616b589da3 Fix a bunch of compiler warnings.
Warnings are useful, but only if there are few!
2015-04-29 21:18:51 +07:00
Ulrich Germann
315610c02a Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-27 16:39:40 +01:00
Ulrich Germann
37bb1de9ed Unused variable. 2015-04-27 16:30:59 +01:00
Ulrich Germann
fbf8b1f8b8 Code design debizarrification: Indexes of feature functions into the dense vector of all feature
values are now stored on the feature function instead of in a global map that is a static
member of ScoreComponentCollection.
2015-04-26 16:46:36 +01:00
Ulrich Germann
e63561ae7f Unused variable. 2015-04-26 15:41:32 +01:00
Hieu Hoang
41529227b2 boost unique lock 2015-04-26 18:11:11 +04:00
Ulrich Germann
bafe60c3a1 Make sure things work when curl-based biasing is disabled. 2015-04-26 03:14:40 +01:00
Ulrich Germann
0d72cdd72c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/Syntax/F2S/Manager-inl.h
	moses/TranslationModel/UG/mmsapt.cpp
2015-04-26 02:12:16 +01:00
Jeroen Vermeulen
8ac91c8d97 Fix unqualified call to rand_excl().
The call needed to be made explicitly to util::rand_excl().  Sorry.
2015-04-24 00:22:25 +07:00
Jeroen Vermeulen
38d790cac0 Add cross-platform randomizer module.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.

Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns.  If the implementation is not good enough, we can
now change it in a single place.

To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.

The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly.  Util::rand(), for wide types like size_t, calls std::rand() not
once but twice.  This behaviour was generalized into utils::wide_rand() and
friends.
2015-04-23 23:46:04 +07:00
Jeroen Vermeulen
02d1d9a4af Don't work around missing popen() in MinGW.
Windows does not have popen()/pclose(), so FileHandler.cpp #define's them to
_popen()/_pclose().  But MinGW has similar macros built into <cstdio>, leading
to warnings.  So skip the workaround on MinGW.
2015-04-22 11:24:32 +07:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Ulrich Germann
2c0851099b Work on integrating hierarchical lexicalized reordering models with sampled phrase tables. 2015-04-21 17:48:48 +01:00
Ulrich Germann
0d13edae24 Added entry for bitext-find. 2015-04-21 17:47:39 +01:00
Ulrich Germann
9a9e43ea2c Initial check-in: search utility for bi-concordancing. 2015-04-21 17:47:09 +01:00
Ulrich Germann
e7246686bf New constructor. 2015-04-21 17:46:12 +01:00
Ulrich Germann
1791f47bfb mmBitext now maintains a vector of document names. 2015-04-21 17:43:51 +01:00
Ulrich Germann
8a921f5dc9 Initial check-in. 2015-04-21 17:41:33 +01:00
Ulrich Germann
adc80953e4 Minor edits for better readability. 2015-04-21 17:40:31 +01:00
Ulrich Germann
70f83e5be9 Additions for writing out alignments in yawat format (for kwipc). 2015-04-21 17:39:06 +01:00
Ulrich Germann
f98de4dc83 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-18 18:04:20 +01:00
Ulrich Germann
28d9e55379 Bug fix. 2015-04-18 16:53:57 +01:00
Ulrich Germann
e028eb7847 A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.) 2015-04-17 23:12:32 +01:00
Jeroen Vermeulen
d56f317f2e New helper classes: temp_dir & temp_file.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places!  Encoding the strings is just too
much work.

It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
2015-04-17 22:57:55 +07:00
Jeroen Vermeulen
1e3e445e3f Use cross-platform mmap() wrapper in CompactPT.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific.  But util has a wrapper which also works on Windows.

This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash.  The wrapper
will throw an exception with a helpful error message.
2015-04-17 18:53:46 +07:00
Jeroen Vermeulen
464615a0c3 Fix some clang++ warnings.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings.  Here's a pair of fixes for the simplest
instances:

moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
  if (path == NULL) {
      ^~~~    ~~~~

(The code unnecessarily checks that an automatic variable has a
 non-null address).

moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
  if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
               ^

(The code tries to cram too much into an "if" condition.)
2015-04-07 22:58:17 +07:00
Ulrich Germann
e110e7df6b Bug fix. 2015-04-05 16:18:09 +01:00
Ulrich Germann
3e2f878576 Merge branch 'master' into mmt-dev
Conflicts:
	Jamroot
	moses/TranslationModel/UG/mmsapt.h
2015-04-05 15:51:50 +01:00
Ulrich Germann
46e31a285c - Code refactoring for Bitext class.
- Bug fixes and conceptual improvements in biased sampling. The sampling now
  tries to stick to the bias, even when an unsuitable corpus dominates
  the occurrences.
2015-04-05 14:29:00 +01:00
Ulrich Germann
05c4e382ff Better logging during biased sampling in Mmsapt. 2015-04-03 21:12:44 +01:00
Ulrich Germann
b6c887b370 Minor bug fix in logging biased sampling for phrase lookup. 2015-04-03 20:18:55 +01:00
Ulrich Germann
93ce2423df 1. A context string for biased sampling in Mmsapt can now be provided on the
command line with --context-string. Not available in server mode yet.
2. Numerous bug fixes related to biased sampling.
3. Biased sampling now checks that the sampling sticks to the bias. If
   the distribution of samples deviates too much from the bias, samples
   whose selection would push the sample distribution even further from the bias
   are not considered, even if that means that fewer samples are chosen in total.
2015-04-03 16:16:52 +01:00
Jeroen Vermeulen
ebc0930500 Replace use of tmpnam with boost::filesystem.
Silences a few annoying warnings from gcc: "tmpnam is dangerous" (and
the suggestion to use mkstemp instead).
2015-04-02 10:42:06 +07:00
XapaJIaMnu
29a729c99b Remove old obsolete probingPT tests 2015-04-01 16:58:21 +01:00
Ulrich Germann
a9dbced81d Bug fix. 2015-03-30 02:56:49 +01:00
Ulrich Germann
fcbfc5a535 Feature functions and the constructors of TranslationOptionCollections
now have access to the current translation task.

This was done to allow context-sensitive processing (if provided by the FF).
2015-03-30 01:20:17 +01:00
Ulrich Germann
79cd40d2c4 Disabled temporarily. Needs to be adapted to API changes in Mmsapt. 2015-03-29 23:58:17 +01:00
Ulrich Germann
2899645992 Cleanup. 2015-03-29 23:57:14 +01:00
Ulrich Germann
3541838a46 Included TargetPhraseCollectionCache.* in fakelib mmsapt. 2015-03-29 23:55:47 +01:00
Ulrich Germann
1525f1ea62 Cleanup. 2015-03-29 23:44:06 +01:00
Ulrich Germann
529a766da7 Initial check-in. 2015-03-29 23:43:50 +01:00
Jeroen Vermeulen
b124d99330 Use boost::filesystem for "rm -rf".
Replaces a system() call (which was a portability problem) and fixes,
en passant, a warning about its return value being ignored.
2015-03-29 18:33:58 +07:00
Jeroen Vermeulen
789a2e2bc3 Fix some compile warnings (gcc 4.9.2).
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.

There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
2015-03-29 18:10:51 +07:00
Ulrich Germann
1b23edf62f Cache for the N most recently used TargetPhraseCollections. Refactored out of mmsapt.h. 2015-03-28 14:41:08 +00:00
Jeroen Vermeulen
a9c8f44896 Modernize "C" includes in moses.
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on.  In this
branch the #include names are updated for the moses/ subdirectory; more
branches to follow.

C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
2015-03-28 20:09:03 +07:00
Hieu Hoang
1064aaacbe delete typedefs for UINT32 and UINT64. MSVC now has uint32_t and uint64_t /Ken 2015-03-25 00:55:39 +00:00
Ulrich Germann
8ca11d941d 1. Lifetime of tasks in ThreadPool is now managed via shared pointers.
2. Code cleanup in IOWrapper and a bit elsewhere.
2015-03-21 16:12:52 +00:00
Ulrich Germann
ee4e396a4d Removed pointer to TranslationTask in InputTypes again. Not the right place to store this information. 2015-03-21 15:29:37 +00:00
Ulrich Germann
dcffbb5f4d Made LRModel::ReorderingType an enumerated type. 2015-03-16 00:24:11 +00:00
Ulrich Germann
085c88cc7b Eliminated sources of some compiler warnings (unused variables; signed/usigned comparisons). 2015-03-15 22:45:01 +00:00
Ulrich Germann
ad805c133b Instances of InputType (and derived classes) now know which TranslationTask (if any) created them.
This is a first step towards providing phrase tables etc. access to context information etc.
associated with specific translation tasks.
2015-03-15 20:38:31 +00:00
Ulrich Germann
2a66a55c85 Added document map (maps from sentences to document ids) to Bitext class.
Minor overhaul to the bias regime, which allows to specify bias by document
name (as provided in the document map) rather than by sentence in the static
parallel corpus.
2015-03-15 13:32:09 +00:00
Ulrich Germann
51824355f9 Sampling now keeps track of counts for hierarchical lexicalized reordering. 2015-03-10 10:41:41 +00:00
Ulrich Germann
524376fad4 Code cleanup. 2015-03-09 00:34:47 +00:00
Hieu Hoang
32de075022 beautify 2015-02-19 12:27:23 +00:00
Ulrich Germann
ccf44f39fb Code cleanup and reorganization. A few classes have been renamed to shorter names. 2015-02-15 01:45:22 +00:00
Hieu Hoang
755bd609f5 Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-06 15:52:25 +00:00
Hieu Hoang
70e8eb54ce Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
Marcin Junczys-Dowmunt
4140756fdf Add missing chck for empty range while flushing 2015-01-22 22:18:19 +01:00
Marcin Junczys-Dowmunt
7d9013a85b Work-around for temporary translation option collection size during phrase table binarization 2015-01-19 23:15:08 +01:00
Marcin Junczys-Dowmunt
fbcf2dcb56 Fixed thread-safety 2015-01-19 21:56:04 +01:00
Marcin Junczys-Dowmunt
82c603213a Thread-safety and constness 2015-01-18 23:58:28 +01:00
Marcin Junczys-Dowmunt
16ffc2c978 Added new VW feature and execption to Simple9 2015-01-18 23:26:32 +01:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Hieu Hoang
05ead45e71 beautify 2015-01-14 11:07:42 +00:00
Phil Williams
e5ebf30664 Fix a few warnings. 2015-01-13 21:13:55 +00:00
Hieu Hoang
be0ab92d16 delete oov pt 2015-01-09 22:32:08 +00:00
Hieu Hoang
e195bdf6d9 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-01-08 02:37:01 +04:00
Rico Sennrich
7123d1cc80 eliminate spurious copy / memory leak 2015-01-07 18:42:20 +00:00
Hieu Hoang
ff7fbd55ee add oovpt 2015-01-07 15:33:42 +04:00
Hieu Hoang
99b4b63c0c change signature of GetChartRuleCollection() 2015-01-07 12:59:08 +04:00
Hieu Hoang
b9bef2fc44 add oovpt 2015-01-07 12:18:09 +04:00
Hieu Hoang
3b3f11365d delete UserMessage. Too difficult to police 2015-01-07 10:01:10 +04:00
Hieu Hoang
1e0a2835bf add oovpt 2015-01-04 19:10:48 +05:30
XapaJIaMnu
d0807c45f2 Fixed crash in probingPT when probability is precisely 0 2014-12-23 15:21:06 +00:00
Nicola Bertoldi
d0cddf0f2d Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-12-16 17:35:47 +01:00
Nikolay Bogoychev
d0f4402e86 Fix incorrect hashing in ProbingPT 2014-12-16 11:15:12 +00:00
Nicola Bertoldi
4e77665d30 better handling of cache-based models with inconsistent parameters 2014-12-15 17:42:41 +01:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Nicola Bertoldi
cea2d9d8bb beautify 2014-12-09 12:39:37 +01:00
Hieu Hoang
8c6310bf4c Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-12-05 23:26:24 +00:00
Matthias Huck
bfeb7d641f log output 2014-12-05 22:31:54 +00:00
Hieu Hoang
4b10c59bea add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager 2014-12-05 21:33:59 +00:00
Rico Sennrich
56921cae3b small simplification of recursive CYK+
(following Chris Dyer's suggestion and Phil's refactoring in S2T decoder)
2014-12-01 11:05:17 +00:00
Ulrich Germann
7aa4d5d8d5 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
Conflicts:
	moses-cmd/simulate-pe.cc
2014-11-20 17:55:51 +00:00
XapaJIaMnu
52c520c042 Resolve merge conflicts 2014-11-20 15:50:32 +00:00
Ulrich Germann
bda7ace530 Minor changes due to changes in the Moses API. Removed from list of standard programs to be compiled and installed. May need some work to get it working again. 2014-11-16 16:31:12 +00:00
XapaJIaMnu
4bea830188 doesn't work 2014-11-13 15:50:05 +00:00
Hieu Hoang
e1092c0dad merge 2014-11-07 14:35:36 +00:00
Laura Kieras
ecae85e9a8 mm2dTable now opens its data file read-only, using mapped_file_source, so that we don't need write permissions on the file 2014-11-04 16:30:46 -05:00
Ulrich Germann
07202c544c Added ptable-describe-features to list features used by PhraseDictionaryBitextSampling. 2014-10-25 12:06:38 -07:00
Ulrich Germann
44215b79c0 Added ptable-describe-features to list features used by PhraseDictionaryBitextSampling. 2014-10-25 12:06:24 -07:00
Ulrich Germann
53ef6c5c38 Added demo program for use of suffix arrays. 2014-10-23 11:11:28 -07:00
Barry Haddow
562cf7e007 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-10-21 15:11:22 +01:00
Hieu Hoang
cce818015d Merge ../mosesdecoder into merge-cmd 2014-10-10 15:50:12 +01:00
Phil Williams
05ecc914c2 Fix a few more compiler warnings (from Clang mostly). 2014-10-10 15:47:53 +01:00
Phil Williams
ee57e59f2b Fix a few compiler warnings (from Clang mostly). 2014-10-10 14:22:53 +01:00
Hieu Hoang
1743f7eeb2 Merge ../mosesdecoder into merge-cmd 2014-10-08 17:55:07 +01:00
Ulrich Germann
576931b088 Mmsapt now adds word alignment info to target phrases. 2014-10-07 18:08:31 +01:00
Hieu Hoang
33ed15ef19 move misc common functions into moses/ 2014-09-30 14:22:38 +01:00
Barry Haddow
091948bff0 Improved debug 2014-09-18 17:03:19 +01:00
Ulrich Germann
1d834e2b48 Fixed bug with respect to adding check option to Mmsapt::Load(). 2014-09-10 18:51:20 +02:00
Ulrich Germann
a58c7ceb18 Fixed issues with ambiguity in typedef of uint64_t (conflict between boost typedef and stdint typedef). 2014-09-10 12:07:57 +02:00
Ulrich Germann
31578d4915 Finished code for bias loading from Mmsapt config file. 2014-09-09 18:07:26 +01:00
Ulrich Germann
cda94c7d85 Fix in biased sampling. Started code on loading and using bias in Mmsapt. 2014-09-09 17:45:48 +01:00
Ulrich Germann
f86fa65a6f Added utility count-ptable-features to count features in Mmsapt given a moses.ini config line. 2014-09-08 16:56:45 +01:00
Ulrich Germann
db6e5de641 Added initial code for utility to count features of PhraseDictionaryBitextSampling. 2014-09-08 11:03:05 +01:00
Ulrich Germann
5571ec91c6 Code cleanup. 2014-09-08 09:26:09 +01:00
Ulrich Germann
a86d49fc88 Added bias to bitext sampling. 2014-09-08 09:26:08 +01:00
Ulrich Germann
cef6460981 Initial check-in. 2014-09-08 09:26:08 +01:00
Ulrich Germann
a87a9ff207 Moved class PhrasePair back to ug_bitext.
Moved function expand() from mmsapt.cc to ug_bitext.h.
Added new lookup function to class Bitext.
Bug fixes related to inverse lookup in class Bitext.
2014-09-08 09:26:08 +01:00
Ulrich Germann
b588df77f0 Bug fix related to threading. 2014-09-08 09:26:08 +01:00
Ulrich Germann
2405293aaa Fiddling around with the code. Not for production. 2014-09-08 09:26:08 +01:00
Ulrich Germann
90c91ae9bb Added fakelib stringdist. 2014-09-08 09:26:08 +01:00
Ulrich Germann
9af3a61678 Added try-align2. 2014-09-08 09:26:08 +01:00
Ulrich Germann
a028fec7af Work in progress. 2014-09-08 09:26:08 +01:00
Michael Denkowski
3304030a4e Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-09-04 11:19:32 -04:00
Michael Denkowski
6c33bc99dc Option to add TM-specific word and phrase counts 2014-09-04 11:17:42 -04:00
Michael Denkowski
756bcf0f15 Option to add TM-specific word and phrase counts 2014-09-04 01:49:26 -04:00
Rico Sennrich
2a46e8ccea parse chart compression for faster CYK+ parsing with syntax systems. 2014-09-01 18:16:22 +01:00
Michael Denkowski
1c45d780d4 all-restrict mode for MultiModel (restrict to phrases in first model) 2014-08-26 13:43:23 -04:00
Hieu Hoang
97e5a30d3a compiles with clang on osx 2014-08-25 18:07:42 +01:00
Michael Denkowski
da0ed4df81 tunable=false option for mmsapt 2014-08-18 19:22:50 -04:00
Michael Denkowski
93e99be108 Mode to pass through "all" scores in MultiModel 2014-08-18 17:57:05 -04:00
Nicola Bertoldi
77e9e91b08 minor fixes 2014-08-18 19:13:51 +02:00
Hieu Hoang
00a338d576 clang only function 2014-08-14 16:44:20 +01:00
Hieu Hoang
303387f9ac compiles with clang on osx 2014-08-14 16:17:21 +01:00
Hieu Hoang
fcbd64b3ac eclipse 2014-08-14 14:04:25 +01:00
Hieu Hoang
2bbaf69409 Merge branch 'master' into bo-safe 2014-08-13 18:52:14 +01:00
Hieu Hoang
94c44c03d5 merge 2014-08-13 18:03:05 +01:00
Hieu Hoang
18c1c4a132 method rename 2014-08-08 18:11:30 +01:00
Hieu Hoang
efa5befb16 method rename 2014-08-08 15:59:34 +01:00
Ulrich Germann
95b04d2558 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-08-05 21:28:06 +01:00
Ulrich Germann
5480499309 Fixed (?) problem with multiple identical extractable target phrases per source phrase occurrence. 2014-08-05 21:26:29 +01:00
Michael Denkowski
13942b77ab Add alias PhraseDictionaryBitextSampling 2014-08-05 14:47:07 -04:00
Ulrich Germann
f32a313a05 Mmsapt now uses timespec on linux, timeval om MacOS for time stamps. 2014-08-05 02:22:20 +01:00
Hieu Hoang
11471de9b8 mac osx 2014-08-04 18:50:10 +01:00
Ulrich Germann
c269abb083 Added num_read_write.cc to fakelib mm. 2014-08-04 17:52:08 +01:00
Ulrich Germann
9fad5d3eb0 Eliminated dependence on endian.h and related byte swapping on big-endian machines. 2014-08-04 17:52:08 +01:00
Hieu Hoang
3f29ed10f1 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2014-08-05 11:00:01 +01:00
Hieu Hoang
84d6b25802 TargetPhrase to have pointer to the phrase table that creates it 2014-08-05 10:59:48 +01:00
Hieu Hoang
f447a23067 TargetPhrase to have pointer to the phrase table that creates it 2014-08-05 10:26:42 +01:00
Hieu Hoang
e863592f40 TargetPhrase to have pointer to the phrase table that creates it 2014-08-04 19:28:04 +01:00
Hieu Hoang
abe68be588 initialise m_container 2014-08-04 15:59:32 +01:00
Hieu Hoang
3f3912772d initialise m_container 2014-08-04 15:46:40 +01:00
Hieu Hoang
5f90ccdb13 initialise m_container 2014-08-04 15:20:22 +01:00
Marcin Junczys-Dowmunt
5c9017c632 Forgot to add SetFeaturesToApply 2014-08-03 19:44:43 +02:00
Marcin Junczys-Dowmunt
ff6ed8cd21 Fixed segfault for features depending on factors not in phrase table (i.e. added by generation models) 2014-08-03 18:03:42 +02:00
Hieu Hoang
688bf4c061 each target phrase knows what decode graph created it 2014-08-02 17:15:01 +01:00
hieu
5741ef2635 compile error in gcc 4.4 2014-07-30 18:01:51 +01:00
Ulrich Germann
f9d167345a Changed feature and parameter names for Mmsapt / PhraseDictionaryBitextSampling as requested by PK. 2014-07-29 13:57:00 +01:00
Ulrich Germann
6a1beb770d Cleanup work to get rid of compiler warnings. 2014-07-29 13:51:44 +01:00
Nicola Bertoldi
1063012892 added a flag do disable the decaying in the cache 2014-07-22 11:25:03 +02:00
Nicola Bertoldi
02bf6d5d5e fixings about file loading and precomputation of ascores 2014-07-22 09:45:41 +02:00
Hieu Hoang
b10760f428 delete PhraseTableImplementation. Old enum 2014-07-18 20:36:53 +01:00
Hieu Hoang
1347b153ee compiles with c++11. Used by oxlm 2014-07-17 23:13:06 +01:00
Ulrich Germann
f06b145735 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2014-07-10 17:24:42 +01:00