2019-07-29 04:40:05 +03:00
# <img src="fairseq_logo.png" width="30"> Introduction
2017-09-15 03:22:43 +03:00
2018-09-03 18:57:53 +03:00
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
2019-07-29 04:40:05 +03:00
modeling and other text generation tasks.
### What's New:
2019-12-08 22:07:10 +03:00
- November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models) ](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example )
2019-11-10 22:27:48 +03:00
- November 2019: [CamemBERT model and code released ](examples/camembert/README.md )
2019-11-09 08:00:06 +03:00
- November 2019: [BART model and code released ](examples/bart/README.md )
2019-11-06 02:01:26 +03:00
- November 2019: [XLM-R models and code released ](examples/xlmr/README.md )
2019-09-27 23:56:47 +03:00
- September 2019: [Nonautoregressive translation code released ](examples/nonautoregressive_translation/README.md )
2019-08-14 18:24:36 +03:00
- August 2019: [WMT'19 models released ](examples/wmt19/README.md )
2019-07-30 17:45:13 +03:00
- July 2019: fairseq relicensed under MIT license
2019-08-14 18:24:36 +03:00
- July 2019: [RoBERTa models and code released ](examples/roberta/README.md )
- June 2019: [wav2vec models and code released ](examples/wav2vec/README.md )
2019-07-29 04:40:05 +03:00
### Features:
Fairseq provides reference implementations of various sequence-to-sequence models, including:
2018-05-01 17:06:45 +03:00
- **Convolutional Neural Networks (CNN)**
2019-08-13 17:49:25 +03:00
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017) ](examples/language_model/conv_lm/README.md )
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017) ](examples/conv_seq2seq/README.md )
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018) ](https://github.com/pytorch/fairseq/tree/classic_seqlevel )
- [Hierarchical Neural Story Generation (Fan et al., 2018) ](examples/stories/README.md )
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019) ](examples/wav2vec/README.md )
2019-01-26 02:35:01 +03:00
- **LightConv and DynamicConv models**
2019-08-13 17:49:25 +03:00
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019) ](examples/pay_less_attention_paper/README.md )
2018-05-01 17:06:45 +03:00
- **Long Short-Term Memory (LSTM) networks**
2019-08-13 17:49:25 +03:00
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
2018-06-05 06:51:08 +03:00
- **Transformer (self-attention) networks**
2019-08-13 17:49:25 +03:00
- Attention Is All You Need (Vaswani et al., 2017)
- [Scaling Neural Machine Translation (Ott et al., 2018) ](examples/scaling_nmt/README.md )
- [Understanding Back-Translation at Scale (Edunov et al., 2018) ](examples/backtranslation/README.md )
- [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018) ](examples/language_model/transformer_lm/README.md )
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019) ](examples/translation_moe/README.md )
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019) ](examples/roberta/README.md )
2019-08-14 18:24:36 +03:00
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019) ](examples/wmt19/README.md )
2019-09-30 16:56:15 +03:00
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019) ](examples/joint_alignment_translation/README.md )
2019-09-27 23:56:47 +03:00
- **Non-autoregressive Transformers**
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- [Levenshtein Transformer (Gu et al., 2019) ](examples/nonautoregressive_translation/README.md )
2018-05-31 15:51:51 +03:00
2019-07-29 04:40:05 +03:00
**Additionally:**
2018-06-05 06:51:08 +03:00
- multi-GPU (distributed) training on one machine or across multiple machines
2019-01-05 07:00:49 +03:00
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
2019-07-29 20:34:35 +03:00
- sampling (unconstrained, top-k and top-p/nucleus)
2018-09-03 18:57:53 +03:00
- large mini-batch training even on a single GPU via delayed updates
2019-05-30 22:02:58 +03:00
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores ](https://developer.nvidia.com/tensor-cores ))
2019-01-05 07:00:49 +03:00
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
2018-06-05 06:51:08 +03:00
2019-11-26 00:38:42 +03:00
We also provide [pre-trained models for translation and language modeling ](#pre-trained-models-and-examples )
with a convenient `torch.hub` interface:
```python
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
```
See the PyTorch Hub tutorials for [translation ](https://pytorch.org/hub/pytorch_fairseq_translation/ )
and [RoBERTa ](https://pytorch.org/hub/pytorch_fairseq_roberta/ ) for more examples.
2017-09-15 03:22:43 +03:00
![Model ](fairseq.gif )
# Requirements and Installation
2019-03-14 21:39:03 +03:00
2019-09-27 23:56:47 +03:00
* [PyTorch ](http://pytorch.org/ ) version >= 1.2.0
2019-12-17 06:46:03 +03:00
* Python version >= 3.6
2018-05-01 17:06:45 +03:00
* For training new models, you'll also need an NVIDIA GPU and [NCCL ](https://github.com/NVIDIA/nccl )
2019-08-14 18:24:36 +03:00
* **For faster training** install NVIDIA's [apex ](https://github.com/NVIDIA/apex ) library with the `--cuda_ext` option
2017-09-15 03:22:43 +03:00
2019-08-14 18:24:36 +03:00
To install fairseq:
```bash
2019-02-09 09:00:46 +03:00
pip install fairseq
```
2019-08-14 18:24:36 +03:00
On MacOS:
```bash
2019-08-09 20:03:53 +03:00
CFLAGS="-stdlib=libc++" pip install fairseq
```
2019-08-14 18:24:36 +03:00
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run` .
2019-02-09 09:00:46 +03:00
**Installing from source**
To install fairseq from source and develop locally:
2019-08-14 18:24:36 +03:00
```bash
2019-02-09 09:00:46 +03:00
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
2017-09-15 03:22:43 +03:00
```
2018-09-03 18:57:53 +03:00
# Getting Started
2017-09-15 21:40:28 +03:00
2018-09-03 18:57:53 +03:00
The [full documentation ](https://fairseq.readthedocs.io/ ) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
2017-09-15 03:22:43 +03:00
2019-01-26 02:35:01 +03:00
# Pre-trained models and examples
2017-09-15 03:22:43 +03:00
2019-01-26 02:35:01 +03:00
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
2017-09-15 03:22:43 +03:00
2019-01-26 02:35:01 +03:00
- [Translation ](examples/translation/README.md ): convolutional and transformer models are available
2019-08-14 18:24:36 +03:00
- [Language Modeling ](examples/language_model/README.md ): convolutional and transformer models are available
2019-10-05 03:09:27 +03:00
- [wav2vec ](examples/wav2vec/README.md ): wav2vec large model is available
2017-09-15 03:22:43 +03:00
2019-01-26 02:35:01 +03:00
We also have more detailed READMEs to reproduce results from specific papers:
2019-09-30 16:56:15 +03:00
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019) ](examples/joint_alignment_translation/README.md )
2019-09-27 23:56:47 +03:00
- [Levenshtein Transformer (Gu et al., 2019) ](examples/nonautoregressive_translation/README.md )
2019-08-14 18:24:36 +03:00
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019) ](examples/wmt19/README.md )
2019-08-13 17:49:25 +03:00
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019) ](examples/roberta/README.md )
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019) ](examples/wav2vec/README.md )
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019) ](examples/translation_moe/README.md )
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019) ](examples/pay_less_attention_paper/README.md )
- [Understanding Back-Translation at Scale (Edunov et al., 2018) ](examples/backtranslation/README.md )
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018) ](https://github.com/pytorch/fairseq/tree/classic_seqlevel )
- [Hierarchical Neural Story Generation (Fan et al., 2018) ](examples/stories/README.md )
- [Scaling Neural Machine Translation (Ott et al., 2018) ](examples/scaling_nmt/README.md )
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017) ](examples/conv_seq2seq/README.md )
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017) ](examples/language_model/conv_lm/README.md )
2017-09-15 03:22:43 +03:00
# Join the fairseq community
* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users
# License
2019-07-30 17:45:13 +03:00
fairseq(-py) is MIT-licensed.
2017-09-15 03:22:43 +03:00
The license applies to the pre-trained models as well.
2018-05-01 17:06:45 +03:00
2019-04-29 17:58:19 +03:00
# Citation
Please cite as:
```bibtex
@inproceedings {ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}
```