- Fix case where "foo bar baz" appears but "bar baz" does not. Previously probing silently returned the wrong answer and trie silently broke.
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.
- kenlm assumes that "foo bar" is present if "foo bar baz" is. This is now checked.
- Binary format version number bump because the format has changed to support the above.
- Lower memory consumption trie building. But it will take longer for to ensure correct handling of blanks and aggressive recombination.
- Fix progress bar newlines on trie building.
Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
mmap works; utility to build binary format included.
Configuration struct (including unknown handling options).
config option to build a binary format while loading an ARPA.
Doesn't require Boost or ICU.
Works on 32 and 64 bit.
query appends </s>.
Reduced memory consumption: 12 bytes per 5-gram instead of 16 bytes on 64-bit machines.
Reduced memory consumption: vocabulary takes 8 bytes/word instead of 12 bytes/word if sorted is
used.
Removed some cruft that wasn't needed by this code.
Compiles on Mac OS X.
Add script to run tests; these depend on Boost.
SRI wrapper works again, is slightly faster, no longer depends on Boost, and has a test.
Debugging code only appears with -DDEBUG, so the default is fast.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3447 1f5c12ca-751b-0410-a591-d2e778427230