amuNN

A C++ decoder for Neural Machine Translation (NMT) models trained with Theano-based scripts from Nematus (https://github.com/rsennrich/nematus) or DL4MT (https://github.com/nyu-dl/dl4mt-tutorial)

We aim at keeping compatibility with Nematus (at least as long as there is no training framework in amunNN), the continued compatbility with DL4MT will not be guaranteed.

Requirements:

CMake 3.5.1 (due to CUDA related bugs in earlier versions)
Boost 1.5
CUDA 7.5
yaml-cpp 0.5 (https://github.com/jbeder/yaml-cpp.git)

Optional

KenLM for n-gram language models (https://github.com/kpu/kenlm, current master)

Compilation

The project is a standard Cmake out-of-source build:

mkdir build
cd build
cmake ..
make -j

Or with KenLM support:

cmake .. -DKENLM=path/to/kenlm

On Ubuntu 16.04, you currently need g++4.9 to compile and cuda-7.5, this also requires a custom boost build compiled with g++4.9 instead of the standard g++5.3. The binaries are not compatible. g++5 support will probably arrive with cuda-8.0.

CUDA_BIN_PATH=/usr/local/cuda-7.5 BOOST_ROOT=/path/to/custom/boost cmake .. \
-DCMAKE_CXX_COMPILER=g++-4.9 -DCUDA_HOST_COMPILER=/usr/bin/g++-4.9

Vocabulary files

Vocabulary files (and all other config files) in amuNN are by default YAML files. amuNN also reads gzipped yml.gz files.

Vocabulary files from models trained with Nematus can be used directly as JSON is a proper subset of YAML.
Vocabularies for models trained with DL4MT (*.pkl extension) need to be converted to JSON/YAML with either of the two scripts below:

python scripts/pkl2json.py vocab.en.pkl > vocab.json
python scripts/pkl2yaml.py vocab.en.pkl > vocab.yml

Running amuNN

./bin/amunn -c config.yml <<< "This is a test ."

Configuration files

An example configuration:

# performance
beam-size: 12
devices: [0]
normalize: true
threads-per-device: 1

# scorer configuration
scorers: 
  F0:
    path: model.en-de.npz 
    type: Nematus
  
# vocabularies
source-vocab: [ vocab.en.yml.gz ]
target-vocab: vocab.de.yml.gz

# scorer weights
weights: 
  F0: 1.0

2.2 KiB Raw Blame History