fairseq/examples/joint_alignment_translation
zmleok 424fffa649 Use the HTTPS URL for cloning fastBPE (#1441)
Summary:
Cloning with SSH URL raises permission error.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1441

Differential Revision: D18809516

Pulled By: myleott

fbshipit-source-id: 1c296d3c369179e337fc9f7fe0ef6bc1efcce749
2019-12-04 07:22:17 -08:00
..
prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh Use the HTTPS URL for cloning fastBPE (#1441) 2019-12-04 07:22:17 -08:00
README.md Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877) 2019-09-30 06:57:32 -07:00

Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)

This page includes instructions for training models described in Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019).

Training a joint alignment-translation model on WMT'18 En-De

1. Extract and preprocess the WMT'18 En-De data
./prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
2. Generate alignments from statistical alignment toolkits e.g. Giza++/FastAlign.

In this example, we use FastAlign.

git clone git@github.com:clab/fast_align.git
pushd fast_align
mkdir build
cd build
cmake ..
make
popd
ALIGN=fast_align/build/fast_align
paste bpe.32k/train.en bpe.32k/train.de | awk -F '\t' '{print $1 " ||| " $2}' > bpe.32k/train.en-de
$ALIGN -i bpe.32k/train.en-de -d -o -v > bpe.32k/train.align
3. Preprocess the dataset with the above generated alignments.
fairseq-preprocess \
    --source-lang en --target-lang de \
    --trainpref bpe.32k/train \
    --validpref bpe.32k/valid \
    --testpref bpe.32k/test \
    --align-suffix align \
    --destdir binarized/ \
    --joined-dictionary \
    --workers 32
4. Train a model
fairseq-train \
    binarized \
    --arch transformer_wmt_en_de_big_align --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --activation-fn relu\
    --lr 0.0002 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
    --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
    --max-tokens 3500 --label-smoothing 0.1 \
    --save-dir ./checkpoints --log-interval 1000 --max-update 60000 \
    --keep-interval-updates -1 --save-interval-updates 0 \
    --load-alignments --criterion label_smoothed_cross_entropy_with_alignment \
    --fp16

Note that the --fp16 flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.

If you want to train the above model with big batches (assuming your machine has 8 GPUs):

  • add --update-freq 8 to simulate training on 8x8=64 GPUs
  • increase the learning rate; 0.0007 works well for big batches
5. Evaluate and generate the alignments (BPE level)
fairseq-generate \
    binarized --gen-subset test --print-alignment \
    --source-lang en --target-lang de \
    --path checkpoints/checkpoint_best.pt --beam 5 --nbest 1
6. Other resources.

The code for:

  1. preparing alignment test sets
  2. converting BPE level alignments to token level alignments
  3. symmetrizing bidirectional alignments
  4. evaluating alignments using AER metric can be found here

Citation

@inproceedings{garg2019jointly,
  title = {Jointly Learning to Align and Translate with Transformer Models},
  author = {Garg, Sarthak and Peitz, Stephan and Nallasamy, Udhyakumar and Paulik, Matthias},
  booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  address = {Hong Kong},
  month = {November},
  url = {https://arxiv.org/abs/1909.02074},
  year = {2019},
}