mosesdecoder/kenlm
hieuhoang1972 473e0e3e96 Ken's LM
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3421 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-10 00:36:07 +00:00
..
lm Ken's LM 2010-09-10 00:36:07 +00:00
util Ken's LM 2010-09-10 00:36:07 +00:00
compile.sh Ken's LM 2010-09-10 00:36:07 +00:00
COPYING Ken's LM 2010-09-10 00:36:07 +00:00
COPYING.LESSER Ken's LM 2010-09-10 00:36:07 +00:00
LICENSE Ken's LM 2010-09-10 00:36:07 +00:00
README Ken's LM 2010-09-10 00:36:07 +00:00

This is a language model under active development.  However, the API is mostly stable.  

Currently, it loads an ARPA file in 2/3 the time SRI takes and uses 6.5 GB when SRI takes 11 GB.  I'm working on optimizing this even further.  

Binary format is coming soon now.  It's already using mmap; the only change is to pass an fd to this mmap call.  

Currently it depends on Boost (mostly lexical_cast) and ICU (only StringPiece).  I am actively working on removing these dependencies.  My normal build system is Boost Jam.  I've stripped this out and simplified to a shell script ./compile.sh for you.  

I recommend copying the code and distributing it with your decoder.  However, please send improvements to me so that they can be integrated into the core package.  

Also included is a wrapper to SRI with the same interface.