- Fix case where "foo bar baz" appears but "bar baz" does not. Previously probing silently returned the wrong answer and trie silently broke.
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.
- kenlm assumes that "foo bar" is present if "foo bar baz" is. This is now checked.
- Binary format version number bump because the format has changed to support the above.
- Lower memory consumption trie building. But it will take longer for to ensure correct handling of blanks and aggressive recombination.
- Fix progress bar newlines on trie building.
Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
Improved portability:
Hopefully handle big endian architectures (trie will fail at runtime with if this isn't working yet).
Remove dependence on err.h.
Handle some Solaris weirdness wrt mmap and strerror.
Clean up murmur_hash header.
Add comparison and ZeroRemaining requested by Chris Dyer.
More number parsing in FilePiece.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3737 1f5c12ca-751b-0410-a591-d2e778427230