Commit Graph

995 Commits

Author SHA1 Message Date
mjdenkowski
28f4e3b1a7 Compatibility: mmsapt and factored translation with generation models 2015-09-02 01:55:06 -04:00
Ulrich Germann
e8f010b9af Removed ORLM. 2015-08-17 18:11:04 +01:00
Ulrich Germann
8b3f2d4338 Bye-bye, PhraseDictionaryDynSuffixArray. 2015-08-17 15:35:35 +01:00
MosesAdmin
8af06a6f0d daily automatic beautifier 2015-08-12 00:01:03 +01:00
Hieu Hoang
4a3363479e remove namespace pollution from old dynamic suffix array and randlm 2015-08-11 12:44:42 +04:00
Ulrich Germann
1dcd077806 More namespace fixes. 2015-08-10 15:14:44 +01:00
Ulrich Germann
0e2dc56360 Namespace cleanup 2015-08-10 11:03:31 +01:00
Ulrich Germann
03463facd7 Cleanup. 2015-08-10 10:14:28 +01:00
Ulrich Germann
16c637b8a5 Back to intrusive pointer for btfix in Mmsapt. Shared pointer causes segfaults. Needs further investigation. 2015-08-10 09:32:26 +01:00
Ulrich Germann
7d1987121f Minor update of declaration of binread(). 2015-08-10 09:30:53 +01:00
Ulrich Germann
c40082f94c Bug fix: restored passing of server information to bias client. 2015-08-10 00:09:56 +01:00
Ulrich Germann
a68b77c300 Minor fix in logging of interaction with bias server. 2015-08-10 00:08:44 +01:00
Ulrich Germann
6b084a0587 clang can't handle boost intrusive pointers, it seems. 2015-08-09 22:49:35 +01:00
Ulrich Germann
9702084641 Deleted unused code. 2015-08-09 22:48:26 +01:00
Ulrich Germann
f20c4cbbc0 Namespace refactoring in mmsapt. 2015-08-09 20:51:21 +01:00
Ulrich Germann
fcd0c17af3 Choice of map type in pstats. 2015-08-07 19:16:27 +01:00
Ulrich Germann
883c34aee9 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/SearchNormalBatch.cpp
	moses/TranslationModel/UG/mm/ug_bitext.h
	moses/TranslationModel/UG/mm/ug_typedefs.h
	moses/TranslationModel/UG/mmsapt.cpp
	moses/TranslationModel/UG/mmsapt.h
2015-08-07 14:14:19 +01:00
Ulrich Germann
a07eb65118 sptr<> -> SPTR<> in preparation for merge with legacy master 2015-08-07 13:35:02 +01:00
U-DESKTOP-ONHNTIV\hieuh
8fce5c783d cygwin 2015-08-05 19:57:41 +01:00
Hieu Hoang
d94c2d210b #define name clashed with internal class name when using clang 2015-08-02 06:33:32 +04:00
Ulrich Germann
51bc36d131 Report bias server errors. 2015-07-31 23:21:56 +00:00
Ulrich Germann
fd93df3b8f Added virtual deconstructor. 2015-07-31 23:21:01 +00:00
Ulrich Germann
362f26d5eb Logging in constructor of http_client. Member function to return server error message. 2015-07-31 23:19:03 +00:00
Ulrich Germann
3439de6341 Bug fix in uri_encode. Logging of lookup in http_client constructor. 2015-07-31 23:18:08 +00:00
Ulrich Germann
ecfc8d8b1a Logging in query_bias_server(). 2015-07-31 17:06:31 +01:00
Ulrich Germann
6ab2d4d3eb Removed ranking stuff prior to merge with master. 2015-07-28 20:15:14 +01:00
Ulrich Germann
451016891e No ranked sampling for the time being. 2015-07-28 19:35:27 +01:00
Ulrich Germann
d67723fd29 Merge branch 'master' of http://github.com/moses-smt/mosesdecoder into ranked-sampling
Conflicts:
	moses/TargetPhrase.cpp
	moses/TargetPhrase.h
2015-07-28 14:29:49 +01:00
Ulrich Germann
d1cb249a7f Removed building of cooccurrence table from mmlex-build. 2015-07-28 14:24:06 +01:00
Ulrich Germann
2faa9e6fe4 Multi-threaded sorting when building suffix array. 2015-07-28 14:23:23 +01:00
Ulrich Germann
70a1c88614 New dummy bias that always returns 1.
Purpose: to keep track of phrase counts per document. If no bias is given,
no per-documents counts are stored.
2015-07-26 21:23:13 +01:00
Ulrich Germann
f26e2008ca work in progress 2015-07-26 21:21:19 +01:00
Ulrich Germann
a1652b4a97 Fix typo. 2015-07-26 21:20:40 +01:00
Ulrich Germann
8da4804631 Initial check-in. 2015-07-23 00:13:19 +01:00
Ulrich Germann
8e393c79ab Logging of priming time in ranked sampling. 2015-07-23 00:12:34 +01:00
Ulrich Germann
f1bde0af05 Map instead of vector for bias map in SamplingBias. 2015-07-23 00:12:00 +01:00
Ulrich Germann
09d0909e0f Trying to make sampling more efficient for large document collections underlying the sampling phrase table. 2015-07-23 00:10:52 +01:00
Ulrich Germann
053037816b Increased verbosity threshold for logging document map. 2015-07-23 00:08:33 +01:00
Ulrich Germann
5aaa8fcbfa 1. Fixed concurrency issue in context handling.
2. Added phrase table feature function PScoreLengthRatio.
2015-07-21 15:36:28 +01:00
Ulrich Germann
506e02bdec Added utility function len_from_pid(). 2015-07-21 15:32:47 +01:00
Ulrich Germann
80db5487bc Fixed typo in comment. 2015-07-21 15:31:46 +01:00
Philipp Koehn
b4b30cff7a fix some compile time warnings about unsigned / signed int 2015-07-20 11:45:23 -04:00
Ulrich Germann
ad4fdc59c2 Re-enabled actual caching (get always returned NULL). Moved point of locking in release() in an attempt to battle segfaults. 2015-07-18 19:02:47 +01:00
Ulrich Germann
e94007c7f4 Mmsapt can now handle factorized phrase tables with more than one factor. 2015-07-13 17:51:44 +01:00
Ulrich Germann
cc5f128944 Allow 'ranked' as alias for sampling method 'rank'. 2015-07-11 00:24:20 +01:00
Ulrich Germann
03e19dd915 Commented out m_rnd_denom. Not used. 2015-07-07 20:16:41 +01:00
Ulrich Germann
8bdbfe583f 1. Added initialization of pstats cache on ContextForQuery.
2. Code cleanup: removed obsolete code.
2015-07-07 00:12:56 +01:00
Ulrich Germann
47eb0bd41e Added seeding of random generator to produce the same results across repeated runs of the decoder. 2015-07-07 00:12:20 +01:00
Ulrich Germann
4dd2ea3117 Added random sampling to BitextSampler. 2015-07-05 13:08:57 +01:00
Ulrich Germann
e1f31666c3 Fixes to make things compile after merging with branch mmt-dev. 2015-07-03 17:20:27 +01:00
Ulrich Germann
98c03dc047 Merge branch 'mmt-dev' into ranked-sampling.
Conflicts:
	moses/TranslationModel/UG/mm/ug_bitext.h
	moses/TranslationModel/UG/mm/ug_bitext_jstats.cc
	moses/TranslationModel/UG/mm/ug_bitext_jstats.h
	moses/TranslationModel/UG/mm/ug_bitext_pstats.cc
	moses/TranslationModel/UG/mm/ug_sampling_bias.cc
	moses/TranslationModel/UG/mm/ug_sampling_bias.h
2015-07-03 15:30:33 +01:00
Ulrich Germann
9dae3eb785 Code cleanup. 2015-07-03 15:11:21 +01:00
Ulrich Germann
f78bb4a6e9 Bigger K-best list to accommodate phrase extraction failure. 2015-07-02 23:56:53 +01:00
Ulrich Germann
1c25b29ebb Show from which documents phrase translations were collected. 2015-07-02 23:55:14 +01:00
Ulrich Germann
64ec34df5d Proper indentation with spaces (no tabs). 2015-07-02 23:49:00 +01:00
Ulrich Germann
b05ca8cb80 Fixes to make code compile on various versions of gcc. 2015-07-02 18:06:55 +01:00
Ulrich Germann
e94921dc44 Removal of 'using namespace ...' from several header files. 2015-07-02 01:32:34 +01:00
Ulrich Germann
61067b4fa5 Merge branch 'FF_ttptr' of http://github.com/moses-smt/mosesdecoder 2015-07-01 13:25:26 +01:00
Ulrich Germann
41a11dfe8a Allow ports other than 80 as the server ports for the context bias server. 2015-06-25 18:37:28 +01:00
XapaJIaMnu
fcc9bb1e60 when using the suffix array PT, set the ttask in the targetPhrase 2015-06-25 16:52:14 +01:00
XapaJIaMnu
a3ecd9f2a7 Revert "Break everything by trying to add ttasksptr to TargetPhrase" and try an easier approach
This reverts commit afdc1b480e.
2015-06-25 15:47:39 +01:00
XapaJIaMnu
afdc1b480e Break everything by trying to add ttasksptr to TargetPhrase 2015-06-25 15:47:17 +01:00
Ulrich Germann
22cc22064c Changed implementation of indocs (to keep track of which documents phrases come from) from vector to map. 2015-06-25 15:43:13 +01:00
XapaJIaMnu
47a488767e Enable the bias weights to be (re)set by the server. 2015-06-25 13:12:33 +01:00
Ulrich Germann
78b2810cfe Allow context server to use ports other than 80. 2015-06-24 18:09:22 +01:00
XapaJIaMnu
5a0168a6fa forgot to negate a condition 2015-06-23 17:27:01 +01:00
XapaJIaMnu
e50926abf6 Enable the Suffix array to get context_weights from command line 2015-06-23 16:58:58 +01:00
MosesAdmin
e57ca5ec34 daily automatic beautifier 2015-06-22 00:00:43 +01:00
Marcin Junczys-Dowmunt
58f0187e8b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-06-21 19:24:53 +02:00
Marcin Junczys-Dowmunt
6151003c13 Remove C++11 oddities 2015-06-21 19:24:43 +02:00
Hieu Hoang
0f943dd9c1 clang compile errors 2015-06-21 21:16:12 +04:00
Marcin Junczys-Dowmunt
1bd10e104c workaround/cleaning for weird copy-constructor behaviour with C++11 2015-06-21 18:27:56 +02:00
Ulrich Germann
65bd46df65 Added feature with cumulative bias. 2015-06-19 21:50:01 +01:00
Phil Williams
90470e878d Fix some C++11-related compilation errors (clang) 2015-06-19 15:58:14 +01:00
Ulrich Germann
a627fd3cc6 Bug fix: set_bias_for_ranking needs to lock. 2015-06-15 14:22:32 +01:00
Ulrich Germann
9d46c5efa1 Rearrangement of members to match initialization order. 2015-06-15 14:20:45 +01:00
Ulrich Germann
4582e43473 Merge branch 'master' into ranked-sampling 2015-06-08 15:45:04 +01:00
Ulrich Germann
bca0f651da Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-06-08 15:44:37 +01:00
Ulrich Germann
3a5acb56cc Added some logging messages. 2015-06-08 15:32:53 +01:00
Ulrich Germann
b2a3bd280e Allow intrusive pointers to const objects. 2015-06-08 14:10:48 +01:00
Ulrich Germann
2f125eddc3 Bug fix. Readability. 2015-06-08 14:08:35 +01:00
Ulrich Germann
5e2e63f678 Integration of ranked sampling. 2015-06-08 14:06:54 +01:00
Ulrich Germann
5dc9d68d2d Initial check-in. 2015-06-08 14:05:41 +01:00
Ulrich Germann
78b0aab65b Work in progress. 2015-06-08 14:04:19 +01:00
Ulrich Germann
36c3f9dda8 Work in progress. Bug fix (release pstats in deconstructor!). Various other changes. 2015-06-08 14:03:20 +01:00
Ulrich Germann
d34b107b91 Initial check-in. 2015-06-08 14:00:31 +01:00
Ulrich Germann
69f15d0c5a New member function wait that won't return until sampling is done. 2015-06-08 14:00:17 +01:00
Ulrich Germann
ac99ec519f Have SentenceBias keep track of document ids. 2015-06-08 13:53:39 +01:00
Ulrich Germann
ff97627e30 Update to emacs variables at top. 2015-06-08 13:52:34 +01:00
Ulrich Germann
4fcb9b98f7 Keeping track of cumulative bias scores. 2015-06-08 13:51:54 +01:00
Ulrich Germann
f1de677530 SentenceBias now has access to mapping from sentence IDs to document IDs. 2015-06-08 13:50:37 +01:00
Ulrich Germann
3c767fc333 New field to store cumulative bias scores. 2015-06-08 13:48:40 +01:00
Ulrich Germann
e8a4a9b10a New member function to expose mapping from sentence IDs to document ids. 2015-06-08 13:45:51 +01:00
Ulrich Germann
d5b0ec7562 Initial check-in. 2015-06-08 12:20:25 +01:00
Ulrich Germann
7a57ce4dc2 Missing #pragma once. 2015-06-06 13:22:04 +01:00
Ulrich Germann
56dee1d4ac Bug fixes: missing #include and const declaration of find_trg_phr-bounds(). 2015-06-06 13:21:33 +01:00
Ulrich Germann
d4234847cd Added #include. 2015-06-05 22:51:58 +01:00
Ulrich Germann
f87f123366 Added member function find_trg_phrase_bound(PhraseExtractionRecord& rec) to Bitext class. 2015-06-05 22:50:17 +01:00
Ulrich Germann
8ae2894107 Initial check-in. 2015-06-05 22:29:26 +01:00
Ulrich Germann
53752f70a7 Added member function find_trg_phr_bound(PhraseExtractionRecord& rec). 2015-06-05 22:28:02 +01:00
Ulrich Germann
c7fffab82c Bug fixes. 2015-06-05 22:27:10 +01:00
Ulrich Germann
8a547ea82f Added missing #include. 2015-06-05 22:25:49 +01:00
Ulrich Germann
704432cf0f Bug fixes. 2015-06-05 22:25:13 +01:00
Ulrich Germann
623eb7bb77 Instantiation of btfix via boost::intrusive_ptr in Mmsapt.
This is in preparation for distinct bitext samplers which need to
ensure the lifetime of the bitext while sampling.
2015-06-05 21:15:47 +01:00
Ulrich Germann
e8ee56876e Initial check-in. 2015-06-05 17:24:53 +01:00
Ulrich Germann
8f4b2afe26 #include a few more things. 2015-06-05 16:30:07 +01:00
Ulrich Germann
1b4b3a5103 Mmsapt: btfix now instatiated via intrusive pointer
... to prevent deletion while Mmsapt is live.
2015-06-05 16:27:49 +01:00
Ulrich Germann
47fa99b61b Added member function size() to LRU_Cache. 2015-06-05 16:26:47 +01:00
Ulrich Germann
243a6a8b3b Added #define for intrusive pointer. 2015-06-05 16:23:00 +01:00
Ulrich Germann
576c743aee Simplified #include. 2015-06-05 16:22:03 +01:00
Ulrich Germann
5cb1d95e09 Added member function for retrieving nbest list items without sorting. 2015-06-05 16:21:09 +01:00
Ulrich Germann
5a56a5b496 Added target for forced relinking only (no forced recompilation); temporarily disabled tcmalloc. 2015-06-05 16:20:08 +01:00
Ulrich Germann
83fa1b6a88 Initial check-in. 2015-06-03 12:59:32 +01:00
Ulrich Germann
0afe139810 Initial check-in. 2015-06-03 12:55:58 +01:00
Ulrich Germann
debdd21899 Optional initialization of SentenceBias. 2015-06-03 12:53:38 +01:00
Ulrich Germann
f024eede74 Added ca() as short replacement for approxOccurrenceCount() to tsa_tree_iterator. 2015-06-03 12:51:44 +01:00
Hieu Hoang
3ea5faead8 codelite 2015-06-02 21:44:58 +04:00
Jeroen Vermeulen
35cf55d4d2 Trailing spaces. 2015-06-02 15:03:18 +07:00
Ulrich Germann
d62d2dc95f Bug fix. 2015-06-01 23:10:50 +01:00
Ulrich Germann
aa4eed93d5 Bug fix related to getting rid of using namespace std; . 2015-06-01 18:55:40 +01:00
Ulrich Germann
cc800742b1 Updated Makefile for local compiles. 2015-06-01 18:26:27 +01:00
Ulrich Germann
99896cfd2c Untangling bitext class from Moses dependencies, so that the class can be used
independently of Moses again.
2015-06-01 18:25:04 +01:00
Ulrich Germann
349163f3fd Bug fix and in-line code documentation. 2015-06-01 18:21:52 +01:00
Ulrich Germann
25f98a446e Bug fix in building imTtrack directly from input stream. 2015-06-01 18:19:34 +01:00
Ulrich Germann
c82ee9a4e9 Bug fix. 2015-05-24 16:44:41 +01:00
Ulrich Germann
da052b7f2b Removed dependency on libcurlpp, as it was difficult to link that staticly. 2015-05-24 16:05:14 +01:00
Ulrich Germann
dcb8e5d3e0 Preparation for allowing context-aware decoding. 2015-05-19 02:35:39 +01:00
Hieu Hoang
39139e7a64 beautify. 2015-05-15 18:09:38 +01:00
Marcin Junczys-Dowmunt
7652ab9118 quick fix for out-of-bound alignment points 2015-05-15 09:12:51 +02:00
Jeroen Vermeulen
0859e9a844 Remove trailing whitespace from C++ files. 2015-05-13 17:05:43 +07:00
Jeroen Vermeulen
1364a7d599 Fix typo in mmap call.
The case where !m_fixed passed m_map_size to mmap(), but the "else"
clause passed map_size.  In replacing mmap() with the portable wrapper,
I accidentally changed that to be m_map_size as well.

Besides fixing that, I'm changing the name of the variable to be more
clearly distinguishable from m_map_size.
2015-05-12 09:58:47 +07:00
Ulrich Germann
7da7ce52da Added context buffering in IOWrapper for context-sensitive decoding.
Unfortunately, this seems to slow things down quite a bit.
2015-05-11 00:34:24 +01:00
Ulrich Germann
db5ccff364 Tweaks to logging for biased sampling. 2015-05-11 00:33:21 +01:00
Ulrich Germann
1778238d73 Logging of latency of bias lookup via server. 2015-05-11 00:32:20 +01:00
Ulrich Germann
8a174beb44 Additional check for document map if document bias is requested. 2015-05-11 00:30:32 +01:00
Nicola Bertoldi
90a982e579 merge remote into local 2015-05-04 09:42:44 +02:00
Nicola Bertoldi
c4f04670c2 made ProbingPT constructor compliant with PhraseDictionary signature 2015-05-04 09:25:50 +02:00
Hieu Hoang
cc8c6b7b10 beautify 2015-05-02 11:45:24 +01:00
Jeroen Vermeulen
eca5824100 Remove trailing whitespace in C++ files. 2015-04-30 12:05:11 +07:00
Ulrich Germann
324b1a9b56 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-29 20:20:54 +01:00
Ulrich Germann
e4f5c69109 One step closer to eliminating the requirement to provide num-features=... in the config file.
Some FF (Mmsapt, LexicalReordering, Many single-value FF) provide this number during "registration";
when missing, a default weight vector of uniform 1.0 is automatically generated. This eliminates the
need for the user to figure out what the exact number of features is for each FF, which can get complicated,
e.g. in the case of Mmsapt/PhraseDictionaryBitextSampling.
2015-04-29 20:16:52 +01:00
Ulrich Germann
c76f1c338d Uninitialized variable. 2015-04-29 20:16:43 +01:00
Jeroen Vermeulen
616b589da3 Fix a bunch of compiler warnings.
Warnings are useful, but only if there are few!
2015-04-29 21:18:51 +07:00
Ulrich Germann
315610c02a Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-27 16:39:40 +01:00
Ulrich Germann
37bb1de9ed Unused variable. 2015-04-27 16:30:59 +01:00
Ulrich Germann
fbf8b1f8b8 Code design debizarrification: Indexes of feature functions into the dense vector of all feature
values are now stored on the feature function instead of in a global map that is a static
member of ScoreComponentCollection.
2015-04-26 16:46:36 +01:00
Ulrich Germann
e63561ae7f Unused variable. 2015-04-26 15:41:32 +01:00
Hieu Hoang
41529227b2 boost unique lock 2015-04-26 18:11:11 +04:00
Ulrich Germann
bafe60c3a1 Make sure things work when curl-based biasing is disabled. 2015-04-26 03:14:40 +01:00
Ulrich Germann
0d72cdd72c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into mmt-dev
Conflicts:
	moses/Syntax/F2S/Manager-inl.h
	moses/TranslationModel/UG/mmsapt.cpp
2015-04-26 02:12:16 +01:00
Jeroen Vermeulen
8ac91c8d97 Fix unqualified call to rand_excl().
The call needed to be made explicitly to util::rand_excl().  Sorry.
2015-04-24 00:22:25 +07:00
Jeroen Vermeulen
38d790cac0 Add cross-platform randomizer module.
The code uses two mechanisms for generating random numbers: srand()/rand(),
which is not thread-safe, and srandom()/random(), which is POSIX-specific.

Here I add a util/random.cc module that centralizes these calls, and unifies
some common usage patterns.  If the implementation is not good enough, we can
now change it in a single place.

To keep things simple, this uses the portable srand()/rand() but protects them
with a lock to avoid concurrency problems.

The hard part was to keep the regression tests passing: they rely on fixed
sequences of random numbers, so a small code change could break them very
thoroughly.  Util::rand(), for wide types like size_t, calls std::rand() not
once but twice.  This behaviour was generalized into utils::wide_rand() and
friends.
2015-04-23 23:46:04 +07:00
Jeroen Vermeulen
02d1d9a4af Don't work around missing popen() in MinGW.
Windows does not have popen()/pclose(), so FileHandler.cpp #define's them to
_popen()/_pclose().  But MinGW has similar macros built into <cstdio>, leading
to warnings.  So skip the workaround on MinGW.
2015-04-22 11:24:32 +07:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Ulrich Germann
2c0851099b Work on integrating hierarchical lexicalized reordering models with sampled phrase tables. 2015-04-21 17:48:48 +01:00
Ulrich Germann
0d13edae24 Added entry for bitext-find. 2015-04-21 17:47:39 +01:00
Ulrich Germann
9a9e43ea2c Initial check-in: search utility for bi-concordancing. 2015-04-21 17:47:09 +01:00
Ulrich Germann
e7246686bf New constructor. 2015-04-21 17:46:12 +01:00
Ulrich Germann
1791f47bfb mmBitext now maintains a vector of document names. 2015-04-21 17:43:51 +01:00
Ulrich Germann
8a921f5dc9 Initial check-in. 2015-04-21 17:41:33 +01:00
Ulrich Germann
adc80953e4 Minor edits for better readability. 2015-04-21 17:40:31 +01:00
Ulrich Germann
70f83e5be9 Additions for writing out alignments in yawat format (for kwipc). 2015-04-21 17:39:06 +01:00
Ulrich Germann
f98de4dc83 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-18 18:04:20 +01:00
Ulrich Germann
28d9e55379 Bug fix. 2015-04-18 16:53:57 +01:00
Ulrich Germann
e028eb7847 A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.) 2015-04-17 23:12:32 +01:00
Jeroen Vermeulen
d56f317f2e New helper classes: temp_dir & temp_file.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places!  Encoding the strings is just too
much work.

It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
2015-04-17 22:57:55 +07:00
Jeroen Vermeulen
1e3e445e3f Use cross-platform mmap() wrapper in CompactPT.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific.  But util has a wrapper which also works on Windows.

This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash.  The wrapper
will throw an exception with a helpful error message.
2015-04-17 18:53:46 +07:00
Jeroen Vermeulen
464615a0c3 Fix some clang++ warnings.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings.  Here's a pair of fixes for the simplest
instances:

moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
  if (path == NULL) {
      ^~~~    ~~~~

(The code unnecessarily checks that an automatic variable has a
 non-null address).

moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
  if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
               ^

(The code tries to cram too much into an "if" condition.)
2015-04-07 22:58:17 +07:00
Ulrich Germann
e110e7df6b Bug fix. 2015-04-05 16:18:09 +01:00
Ulrich Germann
3e2f878576 Merge branch 'master' into mmt-dev
Conflicts:
	Jamroot
	moses/TranslationModel/UG/mmsapt.h
2015-04-05 15:51:50 +01:00
Ulrich Germann
46e31a285c - Code refactoring for Bitext class.
- Bug fixes and conceptual improvements in biased sampling. The sampling now
  tries to stick to the bias, even when an unsuitable corpus dominates
  the occurrences.
2015-04-05 14:29:00 +01:00
Ulrich Germann
05c4e382ff Better logging during biased sampling in Mmsapt. 2015-04-03 21:12:44 +01:00
Ulrich Germann
b6c887b370 Minor bug fix in logging biased sampling for phrase lookup. 2015-04-03 20:18:55 +01:00
Ulrich Germann
93ce2423df 1. A context string for biased sampling in Mmsapt can now be provided on the
command line with --context-string. Not available in server mode yet.
2. Numerous bug fixes related to biased sampling.
3. Biased sampling now checks that the sampling sticks to the bias. If
   the distribution of samples deviates too much from the bias, samples
   whose selection would push the sample distribution even further from the bias
   are not considered, even if that means that fewer samples are chosen in total.
2015-04-03 16:16:52 +01:00
Jeroen Vermeulen
ebc0930500 Replace use of tmpnam with boost::filesystem.
Silences a few annoying warnings from gcc: "tmpnam is dangerous" (and
the suggestion to use mkstemp instead).
2015-04-02 10:42:06 +07:00
XapaJIaMnu
29a729c99b Remove old obsolete probingPT tests 2015-04-01 16:58:21 +01:00
Ulrich Germann
a9dbced81d Bug fix. 2015-03-30 02:56:49 +01:00
Ulrich Germann
fcbfc5a535 Feature functions and the constructors of TranslationOptionCollections
now have access to the current translation task.

This was done to allow context-sensitive processing (if provided by the FF).
2015-03-30 01:20:17 +01:00
Ulrich Germann
79cd40d2c4 Disabled temporarily. Needs to be adapted to API changes in Mmsapt. 2015-03-29 23:58:17 +01:00
Ulrich Germann
2899645992 Cleanup. 2015-03-29 23:57:14 +01:00
Ulrich Germann
3541838a46 Included TargetPhraseCollectionCache.* in fakelib mmsapt. 2015-03-29 23:55:47 +01:00
Ulrich Germann
1525f1ea62 Cleanup. 2015-03-29 23:44:06 +01:00
Ulrich Germann
529a766da7 Initial check-in. 2015-03-29 23:43:50 +01:00
Jeroen Vermeulen
b124d99330 Use boost::filesystem for "rm -rf".
Replaces a system() call (which was a portability problem) and fixes,
en passant, a warning about its return value being ignored.
2015-03-29 18:33:58 +07:00
Jeroen Vermeulen
789a2e2bc3 Fix some compile warnings (gcc 4.9.2).
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.

There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
2015-03-29 18:10:51 +07:00
Ulrich Germann
1b23edf62f Cache for the N most recently used TargetPhraseCollections. Refactored out of mmsapt.h. 2015-03-28 14:41:08 +00:00
Jeroen Vermeulen
a9c8f44896 Modernize "C" includes in moses.
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on.  In this
branch the #include names are updated for the moses/ subdirectory; more
branches to follow.

C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
2015-03-28 20:09:03 +07:00
Hieu Hoang
1064aaacbe delete typedefs for UINT32 and UINT64. MSVC now has uint32_t and uint64_t /Ken 2015-03-25 00:55:39 +00:00
Ulrich Germann
8ca11d941d 1. Lifetime of tasks in ThreadPool is now managed via shared pointers.
2. Code cleanup in IOWrapper and a bit elsewhere.
2015-03-21 16:12:52 +00:00
Ulrich Germann
ee4e396a4d Removed pointer to TranslationTask in InputTypes again. Not the right place to store this information. 2015-03-21 15:29:37 +00:00
Ulrich Germann
dcffbb5f4d Made LRModel::ReorderingType an enumerated type. 2015-03-16 00:24:11 +00:00
Ulrich Germann
085c88cc7b Eliminated sources of some compiler warnings (unused variables; signed/usigned comparisons). 2015-03-15 22:45:01 +00:00
Ulrich Germann
ad805c133b Instances of InputType (and derived classes) now know which TranslationTask (if any) created them.
This is a first step towards providing phrase tables etc. access to context information etc.
associated with specific translation tasks.
2015-03-15 20:38:31 +00:00
Ulrich Germann
2a66a55c85 Added document map (maps from sentences to document ids) to Bitext class.
Minor overhaul to the bias regime, which allows to specify bias by document
name (as provided in the document map) rather than by sentence in the static
parallel corpus.
2015-03-15 13:32:09 +00:00
Ulrich Germann
51824355f9 Sampling now keeps track of counts for hierarchical lexicalized reordering. 2015-03-10 10:41:41 +00:00
Ulrich Germann
524376fad4 Code cleanup. 2015-03-09 00:34:47 +00:00
Hieu Hoang
32de075022 beautify 2015-02-19 12:27:23 +00:00
Ulrich Germann
ccf44f39fb Code cleanup and reorganization. A few classes have been renamed to shorter names. 2015-02-15 01:45:22 +00:00
Hieu Hoang
755bd609f5 Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-06 15:52:25 +00:00