Fast Neural Machine Translation in C++
Go to file
2016-10-06 17:23:59 +01:00
amunmt nsight file. C++14 doesn't compile with nsight anyway. Might have to find another editor 2016-10-06 17:23:59 +01:00
cmake Towards YAML configurations 2016-04-28 20:00:43 +02:00
notebooks paper 2016-10-04 00:53:50 +01:00
paper Merge branch 'master' of github.com:emjotde/amunmt 2016-10-06 10:46:49 +01:00
scripts Add mode (CPU|GPU) option 2016-10-05 13:20:15 +00:00
src Fix bug with cpu mode: added default value for threads. 2016-10-06 15:34:23 +00:00
.gitignore paper 2016-10-04 00:53:50 +01:00
CMakeLists.txt Add mode (CPU|GPU) option 2016-10-05 13:20:15 +00:00
LICENSE Update LICENSE 2016-09-30 00:12:53 +01:00
README.md Update README: cpu/gpu mode, cpu compilation and BPE support 2016-10-05 15:54:54 +00:00

AmuNMT

Join the chat at https://gitter.im/emjotde/amunmt

A C++ decoder for Neural Machine Translation (NMT) models trained with Theano-based scripts from Nematus (https://github.com/rsennrich/nematus) or DL4MT (https://github.com/nyu-dl/dl4mt-tutorial)

We aim at keeping compatibility with Nematus (at least as long as there is no training framework in AmunNMT), the continued compatbility with DL4MT will not be guaranteed.

If you this, please cite:

Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang (2016). Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions (https://arxiv.org/abs/1610.01108)

Tested on Ubuntu 14.04 LTS

  • CMake 3.5.1 (due to CUDA related bugs in earlier versions)
  • GCC/G++ 4.9
  • Boost 1.54
  • CUDA 7.5

Tested on Ubuntu 16.04 LTS

  • CMake 3.5.1 (due to CUDA related bugs in earlier versions)
  • GCC/G++ 5.4
  • Boost 1.61
  • CUDA 8.0

Also compiles the CPU version.

The CPU-only version will automatically be compiled if CUDA cannot be detected by CMAKE. Tested on different machines and distributions:

  • CMake 3.5.1
  • The CPU version should be a lot more forgiving concerning GCC/G++ or Boost versions.

Compilation

The project is a standard Cmake out-of-source build:

mkdir build
cd build
cmake ..
make -j

If you want to compile only CPU version on a machine with CUDA, add -DNOCUDA=ON flag:

cmake -DNOCUDA=ON ..

Vocabulary files

Vocabulary files (and all other config files) in AmuNMT are by default YAML files. AmuNMT also reads gzipped yml.gz files.

  • Vocabulary files from models trained with Nematus can be used directly as JSON is a proper subset of YAML.
  • Vocabularies for models trained with DL4MT (*.pkl extension) need to be converted to JSON/YAML with either of the two scripts below:
python scripts/pkl2json.py vocab.en.pkl > vocab.json
python scripts/pkl2yaml.py vocab.en.pkl > vocab.yml

Running AmuNMT

./bin/amun -c config.yml <<< "This is a test ."

Configuration files

An example configuration:

# Paths are relative to config file location
relative-paths: yes

# performance settings
beam-size: 12
devices: [0]
normalize: yes
threads-per-device: 1

# scorer configuration
scorers:
  F0:
    path: model.en-de.npz
    type: Nematus

# scorer weights
weights:
  F0: 1.0

# vocabularies
source-vocab: vocab.en.yml.gz
target-vocab: vocab.de.yml.gz

BPE Support

AmuNMT has integrated support for BPE encoding. There are two option bpe and debpe. The bpe option gets a path to a file with codes. To turn on deBPE on ouput, set debpe to yes.e.g.

bpe: bpe.codes
debpe: yes

CPU|GPU Mode

Even if you compile amuNMT with CUDA, you can run anum on CPU. To switch it, set the mode to CPU:

mode: CPU

Example usage