Commit Graph

13823 Commits

Author SHA1 Message Date
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Hieu Hoang
c15f3ef068 duplicated functionality with ems/support/lmplz-wrapper.perl 2015-04-21 17:54:34 +04:00
Hieu Hoang
ab01d30687 make sure GetOptions doesn consume -T by confusing it with --text 2015-04-21 17:53:46 +04:00
Rico Sennrich
15d3c3f259 be more tolerant about xml input 2015-04-21 14:04:25 +01:00
Rico Sennrich
5a3d5b6bdd EMS: LM:mock-parse can be actual parser 2015-04-21 10:21:24 +01:00
Hieu Hoang
95435f2a2e better detection of pigz 2015-04-20 20:40:50 +04:00
Hieu Hoang
eb37437d09 don't output warnings. It wasn't originally there before the 'env perl' change. This script should be tightened up at some point, eg use strict, debug warning messages 2015-04-20 16:18:51 +04:00
Hieu Hoang
1b9dc6cfae more butinah tweaks 2015-04-19 11:50:50 +04:00
Hieu Hoang
637e8a17e8 add pre tokenization cleaning script. In case training has bad, overlying long lines which blows up some taggers/segmenters, eg. mada 2015-04-19 11:21:07 +04:00
Ulrich Germann
f98de4dc83 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-18 18:04:20 +01:00
Ulrich Germann
28d9e55379 Bug fix. 2015-04-18 16:53:57 +01:00
Ulrich Germann
e028eb7847 A single output factor in Mmsapt can now be specified externally. (Before: hard-coded to 0.) 2015-04-17 23:12:32 +01:00
Jeroen Vermeulen
4647986a12 Include winsock2.h on Windows.
This makes socket code build successfully on Windows.
2015-04-18 00:59:40 +07:00
Jeroen Vermeulen
abfdb61bc9 Cross-platform tempfile implementation.
This makes temp_file and temp_dir work both on POSIX-like platforms and on
Windows.

It also fixes a bug where the temporary files/directories were created in
the current working directory, instead of in the system's standard
location for temporary files.  Unfortunately the Windows and POSIX code
diverge quite a bit on that point.
2015-04-18 00:21:18 +07:00
Jeroen Vermeulen
d56f317f2e New helper classes: temp_dir & temp_file.
I'm adding these because boost::filesystem::unique_path introduces
encoding issues: on Windows the path is in wchar_t, breaking use of
those strings in various places!  Encoding the strings is just too
much work.

It's still possible that the current temp_file implementation won't
build on Windows (it uses POSIX mkstemp() and close()) but that can
be fixed underneath the API.
2015-04-17 22:57:55 +07:00
Barry Haddow
c5e9a58ae2 Hypergraph output shouldn't crash when nbest list in current directory 2015-04-17 16:17:12 +01:00
Jeroen Vermeulen
70b5fee1ed Merge branch 'wrap-mmap'
Replace use of mmap() with the cross-platform wrapper in util.
2015-04-17 18:58:18 +07:00
Jeroen Vermeulen
1e3e445e3f Use cross-platform mmap() wrapper in CompactPT.
The MmapAllocator header made use of sys/mman.h and mmap(), which are
Unix-specific.  But util has a wrapper which also works on Windows.

This also fixes the error handling: when mmap() failed, the old code would
return an invalid (but non-NULL!) pointer — leading to a crash.  The wrapper
will throw an exception with a helpful error message.
2015-04-17 18:53:46 +07:00
Barry Haddow
e45c41e665 Testing of Viterbi decoding on hypergraph. 2015-04-17 12:29:41 +01:00
Ales Tamchyna
2180730a58 check word alignment points in VW training to avoid cryptic segfaults 2015-04-16 15:28:23 +02:00
Jeroen Vermeulen
6a4943ca41 Replace deprecated bcopy() with memcpy().
The bcopy() function is POSIX-specific and deprecated.  The recommended
replacement (at least for non-overlapping source and destination ranges)
is memcpy(), which is in the standard C library.

Note that the source and destination parameters are in a different order
between these two functions.
2015-04-16 19:19:34 +07:00
Jeroen Vermeulen
21a93421dc Replace deprecated bzero() with memset().
The bzero() function is POSIX-specific and deprecated.  The recommended
replacement is memset(), which is in the standard C library.
2015-04-16 19:03:57 +07:00
Hieu Hoang
e279135d78 create file stream and delete it at the end if user has specified a file for input & output 2015-04-16 00:50:54 +04:00
Hieu Hoang
54c16ee86e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-15 18:54:26 +04:00
Hieu Hoang
c544dba7ad don't use /dev/stdin and /dev/stdout. Compatibility issues with some Redhat 2015-04-15 18:53:13 +04:00
Hieu Hoang
044968bb4b Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-14 11:30:33 +04:00
Hieu Hoang
7af653ac80 misc script to parallelize madamira on grid engine 2015-04-14 11:29:56 +04:00
Hieu Hoang
6162223690 add use warnings to all perl scripts 2015-04-13 20:42:33 +04:00
Phil Williams
05b31b53f2 Implement -output-unknowns for search algorithms 7 and 9 (T2S/F2S) 2015-04-13 16:31:58 +01:00
Hieu Hoang
2f7c328db9 codelite 2015-04-11 20:21:50 +04:00
Hieu Hoang
8190a5e1d6 Merge pull request #107 from flammie/master
Finnish detokenisation
2015-04-11 14:20:37 +04:00
Dingyuan Wang
4aba64ed53 Merge pull request #106 from gumblex/master
Fix some problems in EMS
2015-04-11 09:26:25 +08:00
Kenneth Heafield
d6a66d39bd Delete unused code 2015-04-10 09:36:57 -04:00
Jeroen Vermeulen
b8793fb788 Address two TODO notes in mert/evaluator.cpp.
The notes were about two objects which were created on the free store
using "new", then cleaned up using "delete".  May have been a Java
habit; the solution was as simple as creating them on the stack.
2015-04-10 13:25:51 +07:00
Jeroen Vermeulen
8a3ae2fd5c Portability and include fixes.
Add <cstdlib> include for srand()/rand(), and <unistd.h> for open() etc.
Include <unistd.h> on Windows if using MinGW.  Disable MeteorScorer on
Windows, since it doesn't have fork() and pipe().
2015-04-10 12:54:34 +07:00
Kenneth Heafield
0698da8b0f log(1 + ...) -> log1p(...) 2015-04-08 10:08:05 -04:00
Michael Denkowski
2682cc0f9b typo fix 2015-04-07 17:06:18 -04:00
Jeroen Vermeulen
464615a0c3 Fix some clang++ warnings.
Compiling with clang++ at the default warning/error levels produces
some interesting warnings.  Here's a pair of fixes for the simplest
instances:

moses/TranslationModel/RuleTable/PhraseDictionaryFuzzyMatch.cpp:133:7:
warning: comparison of array 'path' equal to a null pointer is always
false [-Wtautological-pointer-compare]
  if (path == NULL) {
      ^~~~    ~~~~

(The code unnecessarily checks that an automatic variable has a
 non-null address).

moses/TranslationModel/DynSAInclude/onlineRLM.h:305:20:
warning: unsequenced modification and access to 'den_val' [-Wunsequenced]
  if(((den_val = query(&ngram[len - num_fnd], num_fnd - 1)) > 0) &&
               ^

(The code tries to cram too much into an "if" condition.)
2015-04-07 22:58:17 +07:00
Flammie Pirinen
fc8ee03b8d examples 2015-04-07 16:19:07 +01:00
Flammie Pirinen
ef52bc66f6 full set of cases and caps 2015-04-07 16:16:43 +01:00
Flammie Pirinen
85230e8334 add fi to list to silence warnings 2015-04-07 16:00:48 +01:00
Flammie Pirinen
f9deb6de3b also lowercase if case fail 2015-04-07 15:55:02 +01:00
Flammie Pirinen
5817806ec7 fix detokenising : in abbrev. case suffix case 2015-04-07 15:51:29 +01:00
Hieu Hoang
54e55f2dcb better detection of pigz, sort, split. In case they are not in the default directory 2015-04-06 11:31:44 +04:00
Hieu Hoang
02185a85fb store temp run files in current directory, not /tmp 2015-04-05 17:02:48 +04:00
Hieu Hoang
93ad52d2f9 leave in runPath for debugging 2015-04-05 16:49:12 +04:00
Hieu Hoang
4cb8a1837e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-04-05 16:45:17 +04:00
Hieu Hoang
7ffdddef13 script to submit ems job to grid engine as 1 job. Hardcoded for NYUAD at the mo 2015-04-05 16:44:24 +04:00
Michael Denkowski
66cfd14159 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-04-03 16:50:19 -04:00
Michael Denkowski
fdf4f5f571 Consistent line ending behavior for alignment printing options 2015-04-03 16:49:41 -04:00