mirror of
https://github.com/moses-smt/mosesdecoder.git
synced 2024-12-27 22:14:57 +03:00
30 lines
2.1 KiB
Plaintext
30 lines
2.1 KiB
Plaintext
Language model inference code by Kenneth Heafield <infer at kheafield.com>
|
|
The official website is http://kheafield.com/code/kenlm/ . If you're a decoder developer, please download the latest version from there instead of copying from Moses.
|
|
|
|
While the primary means of building kenlm for use in Moses is the Moses build system, you can also compile independently using:
|
|
./compile.sh to compile the code
|
|
./test.sh to compile and run tests; requires Boost
|
|
./clean.sh to clean
|
|
|
|
The rest of the documentation is directed at decoder developers.
|
|
|
|
Binary format via mmap is supported. Run ./build_binary to make one then pass the binary file name instead.
|
|
|
|
Currently, it assumes POSIX APIs for errno, sterror_r, open, close, mmap, munmap, ftruncate, fstat, and read. This is tested on Linux and the non-UNIX Mac OS X. I welcome submissions porting (via #ifdef) to other systems (e.g. Windows) but proudly have no machine on which to test it.
|
|
|
|
A brief note to Mac OS X users: your gcc is too old to recognize the pack pragma. The warning effectively means that, on 64-bit machines, the model will use 16 bytes instead of 12 bytes per n-gram of maximum order (those of lower order are already 16 bytes) in the probing and sorted models. The trie is not impacted by this.
|
|
|
|
It does not depend on Boost or ICU. However, if you use Boost and/or ICU in the rest of your code, you should define HAVE_BOOST and/or HAVE_ICU in util/have.hh. Defining HAVE_BOOST will let you hash StringPiece. Defining HAVE_ICU will use ICU's StringPiece to prevent a conflict with the one provided here.
|
|
|
|
The recommend way to use this:
|
|
Copy the code and distribute with your decoder.
|
|
Set HAVE_ICU and HAVE_BOOST at the top of util/have.hh as instructed above.
|
|
Look at compile.sh and reimplement using your build system.
|
|
Use either the interface in lm/model.hh or lm/virtual_interface.hh
|
|
Interface documentation is in comments of lm/virtual_interface.hh (including for lm/model.hh).
|
|
|
|
I recommend copying the code and distributing it with your decoder. However, please send improvements to me so that they can be integrated into the package.
|
|
|
|
Also included:
|
|
A wrapper to SRI with the same interface.
|