Fast Neural Machine Translation in C++
Go to file
2016-09-02 22:33:40 +01:00
amunmt add cu & cpp 2016-09-02 21:32:34 +01:00
cmake Towards YAML configurations 2016-04-28 20:00:43 +02:00
notebooks typos 2016-06-14 01:03:07 +02:00
scripts check for matching weights and scorers 2016-05-02 21:01:34 +02:00
src .cu -> .cpp 2016-09-02 22:33:40 +01:00
.gitignore Add python's pyc files to ignore 2016-04-15 16:27:48 +02:00
CMakeLists.txt compiles on OSX 2016-09-01 19:20:52 +01:00
LICENSE word wrap 2016-05-01 15:45:19 +02:00
README.md Update README.md 2016-08-30 20:00:36 +01:00

AmuNMT

Join the chat at https://gitter.im/emjotde/amunmt

A C++ decoder for Neural Machine Translation (NMT) models trained with Theano-based scripts from Nematus (https://github.com/rsennrich/nematus) or DL4MT (https://github.com/nyu-dl/dl4mt-tutorial)

We aim at keeping compatibility with Nematus (at least as long as there is no training framework in AmunNMT), the continued compatbility with DL4MT will not be guaranteed.

Requirements:

  • CMake 3.5.1 (due to CUDA related bugs in earlier versions)
  • Boost 1.5
  • CUDA 7.5 (8.0 recommended)

Optional

Compilation

The project is a standard Cmake out-of-source build:

mkdir build
cd build
cmake ..
make -j

Or with KenLM support:

cmake .. -DKENLM=path/to/kenlm

On Ubuntu 16.04, you currently need g++4.9 to compile and cuda-7.5, this also requires a custom boost build compiled with g++4.9 instead of the standard g++5.3. The binaries are not compatible.

CUDA_BIN_PATH=/usr/local/cuda-7.5 BOOST_ROOT=/path/to/custom/boost cmake .. \
-DCMAKE_CXX_COMPILER=g++-4.9 -DCUDA_HOST_COMPILER=/usr/bin/g++-4.9

With cuda-8.0 (RC) it is possible to use g++5 (but not g++6) which makes most of the above tricks obsolete.

Vocabulary files

Vocabulary files (and all other config files) in AmuNMT are by default YAML files. AmuNMT also reads gzipped yml.gz files.

  • Vocabulary files from models trained with Nematus can be used directly as JSON is a proper subset of YAML.
  • Vocabularies for models trained with DL4MT (*.pkl extension) need to be converted to JSON/YAML with either of the two scripts below:
python scripts/pkl2json.py vocab.en.pkl > vocab.json
python scripts/pkl2yaml.py vocab.en.pkl > vocab.yml

Running AmuNMT

./bin/amun -c config.yml <<< "This is a test ."

Configuration files

An example configuration:

# Paths are relative to config file location
relative-paths: yes

# performance settings
beam-size: 12
devices: [0]
normalize: yes
threads-per-device: 1

# scorer configuration
scorers: 
  F0:
    path: model.en-de.npz 
    type: Nematus

# scorer weights
weights: 
  F0: 1.0

# vocabularies
source-vocab: vocab.en.yml.gz
target-vocab: vocab.de.yml.gz

Example usage