mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-09-19 07:07:24 +03:00

Author	SHA1	Message	Date
Ulrich Germann	315610c02a	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder	2015-04-27 16:39:40 +01:00
Ulrich Germann	ba6e17ce26	Code reorganization.	2015-04-27 16:31:22 +01:00
Ulrich Germann	37bb1de9ed	Unused variable.	2015-04-27 16:30:59 +01:00
Ulrich Germann	fbf8b1f8b8	Code design debizarrification: Indexes of feature functions into the dense vector of all feature values are now stored on the feature function instead of in a global map that is a static member of ScoreComponentCollection.	2015-04-26 16:46:36 +01:00
Ulrich Germann	e63561ae7f	Unused variable.	2015-04-26 15:41:32 +01:00
Hieu Hoang	41529227b2	boost unique lock	2015-04-26 18:11:11 +04:00
Ulrich Germann	bafe60c3a1	Make sure things work when curl-based biasing is disabled.	2015-04-26 03:14:40 +01:00
Ulrich Germann	0d72cdd72c	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into mmt-dev Conflicts: moses/Syntax/F2S/Manager-inl.h moses/TranslationModel/UG/mmsapt.cpp	2015-04-26 02:12:16 +01:00
Matthias Huck	55a4789a8b	cleanup	2015-04-23 18:38:01 +01:00
Matthias Huck	3920e22c98	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder	2015-04-23 18:25:13 +01:00
Jeroen Vermeulen	8ac91c8d97	Fix unqualified call to rand_excl(). The call needed to be made explicitly to util::rand_excl(). Sorry.	2015-04-24 00:22:25 +07:00
Matthias Huck	bbcc8bf23b	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder	2015-04-23 18:14:40 +01:00
Matthias Huck	f24f31f965	n-best list creation in phrase-based decoding: improved efficiency with sparse features	2015-04-23 18:13:02 +01:00
Jeroen Vermeulen	38d790cac0	Add cross-platform randomizer module. The code uses two mechanisms for generating random numbers: srand()/rand(), which is not thread-safe, and srandom()/random(), which is POSIX-specific. Here I add a util/random.cc module that centralizes these calls, and unifies some common usage patterns. If the implementation is not good enough, we can now change it in a single place. To keep things simple, this uses the portable srand()/rand() but protects them with a lock to avoid concurrency problems. The hard part was to keep the regression tests passing: they rely on fixed sequences of random numbers, so a small code change could break them very thoroughly. Util::rand(), for wide types like size_t, calls std::rand() not once but twice. This behaviour was generalized into utils::wide_rand() and friends.	2015-04-23 23:46:04 +07:00
Matthias Huck	7457099f51	SparseReordering: option to use pre-tuned feature weights internally	2015-04-23 17:25:02 +01:00
Jeroen Vermeulen	02d1d9a4af	Don't work around missing popen() in MinGW. Windows does not have popen()/pclose(), so FileHandler.cpp #define's them to _popen()/_pclose(). But MinGW has similar macros built into <cstdio>, leading to warnings. So skip the workaround on MinGW.	2015-04-22 11:24:32 +07:00
Jeroen Vermeulen	32722ab5b1	Support tokenize(const std::string &) as well. Convenience wrapper: the actual function takes a const char[], but many of the call sites want to pass a string and have to call its c_str() first.	2015-04-22 10:35:18 +07:00
Jeroen Vermeulen	b2d821a141	Unify tokenize() into util, and unit-test it. The duplicate definition works fine in environments where the inline definition becomes a weak symbol in the object file, but if it gets generated as a regular definition, the duplicate definition causes link problems. In most call sites the return value could easily be made const, which gives both the reader and the compiler a bit more certainty about the code's intentions. In theory this may help performance, but it's mainly for clarity. The comments are based on reverse-engineering, and the unit tests are based on the comments. It's possible that some of what's in there is not essential, in which case, don't feel bad about changing it! I left a third identical definition in place, though I updated it with my changes to avoid creeping divergence, and noted the duplication in a comment. It would be nice to get rid of this definition as well, but it'd introduce headers from the main Moses tree into biconcor, which may be against policy.	2015-04-22 09:59:05 +07:00
Ulrich Germann	7603ec95f7	Recognize lexicalized reordering scores on TranslationOption instances provided e.g. by phrase tables.	2015-04-21 17:54:40 +01:00
Ulrich Germann	b60d1427f9	Minor code reformatting.	2015-04-21 17:52:51 +01:00
Ulrich Germann	73b5561fe3	Added member function GetTranslationTask().	2015-04-21 17:51:19 +01:00
Ulrich Germann	48b3b88c4a	Minor edits for code readability.	2015-04-21 17:50:34 +01:00
Ulrich Germann	2c0851099b	Work on integrating hierarchical lexicalized reordering models with sampled phrase tables.	2015-04-21 17:48:48 +01:00
Ulrich Germann	0d13edae24	Added entry for bitext-find.	2015-04-21 17:47:39 +01:00
Ulrich Germann	9a9e43ea2c	Initial check-in: search utility for bi-concordancing.	2015-04-21 17:47:09 +01:00
Ulrich Germann	e7246686bf	New constructor.	2015-04-21 17:46:12 +01:00
Ulrich Germann	1791f47bfb	mmBitext now maintains a vector of document names.	2015-04-21 17:43:51 +01:00
Ulrich Germann	8a921f5dc9	Initial check-in.	2015-04-21 17:41:33 +01:00
Ulrich Germann	adc80953e4	Minor edits for better readability.	2015-04-21 17:40:31 +01:00
Ulrich Germann	70f83e5be9	Additions for writing out alignments in yawat format (for kwipc).	2015-04-21 17:39:06 +01:00
Ulrich Germann	f98de4dc83	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder	2015-04-18 18:04:20 +01:00
Ulrich Germann	28d9e55379	Bug fix.	2015-04-18 16:53:57 +01:00
Ulrich Germann	e028eb7847	A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.)	2015-04-17 23:12:32 +01:00
Jeroen Vermeulen	4647986a12	Include winsock2.h on Windows. This makes socket code build successfully on Windows.	2015-04-18 00:59:40 +07:00
Jeroen Vermeulen	d56f317f2e	New helper classes: temp_dir & temp_file. I'm adding these because boost::filesystem::unique_path introduces encoding issues: on Windows the path is in wchar_t, breaking use of those strings in various places! Encoding the strings is just too much work. It's still possible that the current temp_file implementation won't build on Windows (it uses POSIX mkstemp() and close()) but that can be fixed underneath the API.	2015-04-17 22:57:55 +07:00
Barry Haddow	c5e9a58ae2	Hypergraph output shouldn't crash when nbest list in current directory	2015-04-17 16:17:12 +01:00
Jeroen Vermeulen	70b5fee1ed	Merge branch 'wrap-mmap' Replace use of mmap() with the cross-platform wrapper in util.	2015-04-17 18:58:18 +07:00
Jeroen Vermeulen	1e3e445e3f	Use cross-platform mmap() wrapper in CompactPT. The MmapAllocator header made use of sys/mman.h and mmap(), which are Unix-specific. But util has a wrapper which also works on Windows. This also fixes the error handling: when mmap() failed, the old code would return an invalid (but non-NULL!) pointer — leading to a crash. The wrapper will throw an exception with a helpful error message.	2015-04-17 18:53:46 +07:00
Ales Tamchyna	2180730a58	check word alignment points in VW training to avoid cryptic segfaults	2015-04-16 15:28:23 +02:00
Jeroen Vermeulen	6a4943ca41	Replace deprecated bcopy() with memcpy(). The bcopy() function is POSIX-specific and deprecated. The recommended replacement (at least for non-overlapping source and destination ranges) is memcpy(), which is in the standard C library. Note that the source and destination parameters are in a different order between these two functions.	2015-04-16 19:19:34 +07:00
Jeroen Vermeulen	21a93421dc	Replace deprecated bzero() with memset(). The bzero() function is POSIX-specific and deprecated. The recommended replacement is memset(), which is in the standard C library.	2015-04-16 19:03:57 +07:00
Phil Williams	05b31b53f2	Implement -output-unknowns for search algorithms 7 and 9 (T2S/F2S)	2015-04-13 16:31:58 +01:00
Kenneth Heafield	0698da8b0f	log(1 + ...) -> log1p(...)	2015-04-08 10:08:05 -04:00
Jeroen Vermeulen	464615a0c3	Fix some clang++ warnings. Compiling with clang++ at the default warning/error levels produces some interesting warnings. Here's a pair of fixes for the simplest instances: moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7: warning: comparison of array 'path' equal to a null pointer is always false [-Wtautological-pointer-compare] if (path == NULL) { ^~~~ ~~~~ (The code unnecessarily checks that an automatic variable has a non-null address). moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20: warning: unsequenced modification and access to 'den_val' [-Wunsequenced] if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) && ^ (The code tries to cram too much into an "if" condition.)	2015-04-07 22:58:17 +07:00
Ulrich Germann	e110e7df6b	Bug fix.	2015-04-05 16:18:09 +01:00
Ulrich Germann	3e2f878576	Merge branch 'master' into mmt-dev Conflicts: Jamroot moses/TranslationModel/UG/mmsapt.h	2015-04-05 15:51:50 +01:00
Ulrich Germann	46e31a285c	- Code refactoring for Bitext class. - Bug fixes and conceptual improvements in biased sampling. The sampling now tries to stick to the bias, even when an unsuitable corpus dominates the occurrences.	2015-04-05 14:29:00 +01:00
Michael Denkowski	66cfd14159	Merge branch 'master' of https://github.com/moses-smt/mosesdecoder	2015-04-03 16:50:19 -04:00
Michael Denkowski	fdf4f5f571	Consistent line ending behavior for alignment printing options	2015-04-03 16:49:41 -04:00
Ulrich Germann	05c4e382ff	Better logging during biased sampling in Mmsapt.	2015-04-03 21:12:44 +01:00

1 2 3 4 5 ...

4563 Commits