mirror of
https://github.com/marian-nmt/marian.git
synced 2024-12-11 09:54:22 +03:00
1021 B
1021 B
amuNN
A C++ decoder for Neural Machine Translation (NMT) models trained with Theano-based scripts from DL4MT (https://github.com/nyu-dl/dl4mt-tutorial)
Requirements:
- CMake 3.5.1 (due to CUDA related bugs in earlier versions)
- Boost 1.5
- CUDA 7.5
- yaml-cpp 0.5 (https://github.com/jbeder/yaml-cpp.git)
Optional
- KenLM for n-gram language models (https://github.com/kpu/kenlm, current master)
Compilation
The project is a standard Cmake out-of-source build:
mkdir build
cd build
cmake ..
make -j
Or with KenLM support:
cmake .. -DKENLM=path/to/kenlm
On Ubuntu 16.04, you need g++4.9 and cuda-7.5 and a boost version compiled with g++4.9
CUDA_BIN_PATH=/usr/local/cuda-7.5 BOOST_ROOT=/path/to/custom/boost cmake .. \
-DCMAKE_CXX_COMPILER=g++-4.9 -DCUDA_HOST_COMPILER=/usr/bin/g++-4.9 -DKENLM=path/to/kenlm
Vocabularies (*.pkl extension) need to be converted to text with the scripts in the scripts folder.
python scripts/vocab2txt.py vocab.en.pkl > vocab.en