Commit Graph

14011 Commits

Author SHA1 Message Date
Ulrich Germann
8a921f5dc9 Initial check-in. 2015-04-21 17:41:33 +01:00
Ulrich Germann
adc80953e4 Minor edits for better readability. 2015-04-21 17:40:31 +01:00
Ulrich Germann
70f83e5be9 Additions for writing out alignments in yawat format (for kwipc). 2015-04-21 17:39:06 +01:00
Hieu Hoang
c15f3ef068 duplicated functionality with ems/support/lmplz-wrapper.perl 2015-04-21 17:54:34 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
95435f2a2e better detection of pigz 2015-04-20 20:40:50 +04:00
Hieu Hoang
eb37437d09 don't output warnings. It wasn't originally there before the 'env perl' change. This script should be tightened up at some point, eg use strict, debug warning messages 2015-04-20 16:18:51 +04:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Ulrich Germann
f98de4dc83 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-18 18:04:20 +01:00
Ulrich Germann
28d9e55379 Bug fix. 2015-04-18 16:53:57 +01:00
Ulrich Germann
e028eb7847 A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.) 2015-04-17 23:12:32 +01:00
Jeroen Vermeulen
4647986a12 Include winsock2.h on Windows.
This makes socket code build successfully on Windows.
2015-04-18 00:59:40 +07:00
Jeroen Vermeulen
abfdb61bc9 Cross-platform tempfile implementation.
This makes temp_file and temp_dir work both on POSIX-like platforms and on
Windows.

It also fixes a bug where the temporary files/directories were created in
the current working directory, instead of in the system's standard
location for temporary files.  Unfortunately the Windows and POSIX code
diverge quite a bit on that point.
2015-04-18 00:21:18 +07:00
Jeroen Vermeulen
d56f317f2e New helper classes: temp_dir & temp_file.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places!  Encoding the strings is just too
much work.

It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
2015-04-17 22:57:55 +07:00
Barry Haddow
c5e9a58ae2 Hypergraph output shouldn't crash when nbest list in current directory 2015-04-17 16:17:12 +01:00
Jeroen Vermeulen
70b5fee1ed Merge branch 'wrap-mmap'
Replace use of mmap() with the cross-platform wrapper in util.
2015-04-17 18:58:18 +07:00
Jeroen Vermeulen
1e3e445e3f Use cross-platform mmap() wrapper in CompactPT.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific.  But util has a wrapper which also works on Windows.

This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash.  The wrapper
will throw an exception with a helpful error message.
2015-04-17 18:53:46 +07:00
Barry Haddow
e45c41e665 Testing of Viterbi decoding on hypergraph. 2015-04-17 12:29:41 +01:00
Ales Tamchyna
2180730a58 check word alignment points in VW training to avoid cryptic segfaults 2015-04-16 15:28:23 +02:00
Jeroen Vermeulen
6a4943ca41 Replace deprecated bcopy() with memcpy().
The bcopy() function is POSIX-specific and deprecated.  The recommended
replacement (at least for non-overlapping source and destination ranges)
is memcpy(), which is in the standard C library.

Note that the source and destination parameters are in a different order
between these two functions.
2015-04-16 19:19:34 +07:00
Jeroen Vermeulen
21a93421dc Replace deprecated bzero() with memset().
The bzero() function is POSIX-specific and deprecated.  The recommended
replacement is memset(), which is in the standard C library.
2015-04-16 19:03:57 +07:00
Hieu Hoang
e279135d78 create file stream and delete it at the end if user has specified a file for input & output 2015-04-16 00:50:54 +04:00
Hieu Hoang
54c16ee86e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-15 18:54:26 +04:00
Hieu Hoang
c544dba7ad don't use /dev/stdin and /dev/stdout. Compatibility issues with some Redhat 2015-04-15 18:53:13 +04:00
Hieu Hoang
044968bb4b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-14 11:30:33 +04:00
Hieu Hoang
7af653ac80 misc script to parallelize madamira on grid engine 2015-04-14 11:29:56 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Phil Williams
05b31b53f2 Implement -output-unknowns for search algorithms 7 and 9 (T2S/F2S) 2015-04-13 16:31:58 +01:00
Hieu Hoang
2f7c328db9 codelite 2015-04-11 20:21:50 +04:00
Hieu Hoang
8190a5e1d6 Merge pull request #107 from flammie/master
Finnish detokenisation
2015-04-11 14:20:37 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Kenneth Heafield
d6a66d39bd Delete unused code 2015-04-10 09:36:57 -04:00
Jeroen Vermeulen
b8793fb788 Address two TODO notes in mert/evaluator.cpp.
The notes were about two objects which were created on the free store
using "new", then cleaned up using "delete".  May have been a Java
habit; the solution was as simple as creating them on the stack.
2015-04-10 13:25:51 +07:00
Jeroen Vermeulen
8a3ae2fd5c Portability and include fixes.
Add <cstdlib> include for srand()/rand(), and <unistd.h> for open() etc.
Include <unistd.h> on Windows if using MinGW.  Disable MeteorScorer on
Windows, since it doesn't have fork() and pipe().
2015-04-10 12:54:34 +07:00
Kenneth Heafield
0698da8b0f log(1 + ...) -> log1p(...) 2015-04-08 10:08:05 -04:00
Michael Denkowski
2682cc0f9b typo fix 2015-04-07 17:06:18 -04:00
Jeroen Vermeulen
464615a0c3 Fix some clang++ warnings.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings.  Here's a pair of fixes for the simplest
instances:

moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
  if (path == NULL) {
      ^~~~    ~~~~

(The code unnecessarily checks that an automatic variable has a
 non-null address).

moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
  if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
               ^

(The code tries to cram too much into an "if" condition.)
2015-04-07 22:58:17 +07:00
Flammie Pirinen
fc8ee03b8d examples 2015-04-07 16:19:07 +01:00
Flammie Pirinen
ef52bc66f6 full set of cases and caps 2015-04-07 16:16:43 +01:00
Flammie Pirinen
85230e8334 add fi to list to silence warnings 2015-04-07 16:00:48 +01:00
Flammie Pirinen
f9deb6de3b also lowercase if case fail 2015-04-07 15:55:02 +01:00
Flammie Pirinen
5817806ec7 fix detokenising : in abbrev. case suffix case 2015-04-07 15:51:29 +01:00
Hieu Hoang
54e55f2dcb better detection of pigz, sort, split. In case they are not in the default directory 2015-04-06 11:31:44 +04:00
Ulrich Germann
e110e7df6b Bug fix. 2015-04-05 16:18:09 +01:00
Ulrich Germann
3e2f878576 Merge branch 'master' into mmt-dev
Conflicts:
	Jamroot
	moses/TranslationModel/UG/mmsapt.h
2015-04-05 15:51:50 +01:00
Ulrich Germann
34776c5e7c Merge branch 'mmt-dev' of https://github.com/ugermann/mosesdecoder into mmt-dev 2015-04-05 14:29:45 +01:00
Ulrich Germann
46e31a285c - Code refactoring for Bitext class.
- Bug fixes and conceptual improvements in biased sampling. The sampling now
  tries to stick to the bias, even when an unsuitable corpus dominates
  the occurrences.
2015-04-05 14:29:00 +01:00