Commit Graph

9 Commits

Author SHA1 Message Date
Kenneth Heafield
d732f63ec2 KenLM update including progress on ARM and MinGW from NICT 2011-11-10 20:46:59 +00:00
Hieu Hoang
a93f4691f6 win32 2011-10-23 09:37:47 +07:00
heafield
3402bdfe7a Merge mtm_lm into trunk.
There's a fair number of files with no change that somebody must have touched in the branch so metadata is being recorded. 
Updates kenlm binary file format, sorry. 
It looks like OOV isn't being computed in EvaluateChart anyway, just phrasal.  
  


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4247 1f5c12ca-751b-0410-a591-d2e778427230
2011-09-21 16:06:48 +00:00
heafield
22ce1d2f19 kenlm update
- Fix case where "foo bar baz" appears but "bar baz" does not.  Previously probing silently returned the wrong answer and trie silently broke.  
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.  
- kenlm assumes that "foo bar" is present if "foo bar baz" is.  This is now checked.  
- Binary format version number bump because the format has changed to support the above.  
- Lower memory consumption trie building.  But it will take longer for to ensure correct handling of blanks and aggressive recombination.  
- Fix progress bar newlines on trie building.

Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 19:11:48 +00:00
heafield
2784923899 Rename a bunch of kenlm files. A ./regenerate-makefiles.sh is required.
Make loading with MAP_POPULATE on Linux and read on other OSes the default.
Use LM #9 for lazy loading, as recommended by other devs.  
Slightly faster trie.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3688 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-06 00:40:16 +00:00
heafield
614d6002a6 Integrate heafield-refactorlm. Faster kenlm with new binary format. Stateful language model
framework.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3671 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 17:50:40 +00:00
heafield
770df2a92d Unbodge kenlm by moving compilation to kenlm/ instead of kenlm/lm. Changing the headers every
time I copied to Moses was getting annoying.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3587 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-28 16:26:55 +00:00
hieuhoang1972
32d3565b04 ken lm integration
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3543 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-21 22:43:29 +00:00
heafield
a02268a7c1 Fix memory corruption with exceptions.
Fix compilation with -m64 in murmur_hash.  
Extract most mmap calls.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3494 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-16 19:53:33 +00:00