Commit Graph

309 Commits

Author SHA1 Message Date
Myle Ott
fa113ff1de Add test for activation checkpointing (#1453)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1453

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D25108463

Pulled By: myleott

fbshipit-source-id: 3cebce9be7fe503401eabba3f483c26847e7a3c0
2020-11-20 12:42:33 -08:00
Myle Ott
94f59bb67b Remove unused train_masked_language_model helper (#1452)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1452

Test Plan: Imported from OSS

Reviewed By: lematt1991

Differential Revision: D25108462

Pulled By: myleott

fbshipit-source-id: 3c17a9937a4c3edb69f64130dfd866c5f42a4aaf
2020-11-20 12:42:29 -08:00
Yuqing Tang
d7dd683b3b Add option to skip virtual epoch
Summary:
The current translation_multi_simple_epoch will add extrac layer of virtual epoch abstracts to load part of data and start training earlier. However, for smaller dataset this is not necessary.

This diff makes it skip virtual epoch layer if --virtual-epoch-size is not specified.

Reviewed By: pipibjc

Differential Revision: D24962835

fbshipit-source-id: 7de4293a6996ed075a1ed0c1ff2de94c8ae3df14
2020-11-16 14:39:57 -08:00
Myle Ott
0a848245f3 Add Truncated BPTT example + TransformerXL (#1410)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1410

Test Plan:
- reproduced Transformer-XL results (see README)
- added integration test

Reviewed By: jingfeidu

Differential Revision: D24928966

Pulled By: myleott

fbshipit-source-id: 86376c17ab24d37e72e7c097b6dcec71b1a087a7
2020-11-15 19:47:42 -08:00
alexeib
b58f4f017e end to end hydra configs (#1393)
Summary:
this adds a hydra_train binary that uses hydra configs/command line overrides instead of argparse

use case 1: built in configs + overrides from command line

```
python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/transformer_lm_gpt task=language_modeling optimization.max_update=5000
```

use case 2: use an external config that is used instead of bundled configs (but dataclass defaults still work)

```
python fairseq_cli/hydra_train.py --config-path ~/fairseq-py-dev/lm --config-name wiki103
```

the config file contains this:

```
# package _group_

model:
  _name: transformer_lm
distributed_training:
  distributed_world_size: 1
dataset:
  batch_size: 2
task:
  _name: language_modeling
  data: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/
  add_bos_token: false
  max_target_positions: 1024
optimization:
  max_update: 50000
  lr: [ 0.25 ]
criterion: cross_entropy
optimizer: adam
lr_scheduler:
  _name: cosine
```

use case 3: use an external config directory that provides additional configs for e.g. models

python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/2_layers task=language_modeling optimization.max_update=5000 --config-dir ~/fairseq-py-dev/lm/hydra

where ~/fairseq-py-dev/lm/hydra has the following structure:

- model
-- transformer_lm
 --- 2_layers.yaml

and inside 2_layers.yaml is a copy of transformer_lm_gpt.yaml but with decoder_layers set to 2

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1393

Reviewed By: myleott

Differential Revision: D24722252

Pulled By: alexeib

fbshipit-source-id: 758ea431fa099cd7c0e4daf41eff680df1d3b841
2020-11-04 18:20:12 -08:00
Yuqing Tang
de859692ff Enable translation_multi_simple_epoch to have different source and target dictionaries
Summary: In past, we always use shared dictionary for multilingual experiments. This diff renables different dictionaries for source and target languages by changing the assertion criteria and reverts back to use specific languages to return source_dict and target_dict.

Reviewed By: chtran

Differential Revision: D24637682

fbshipit-source-id: a982e4f1e48395cc5bf10dc03b98fbe970062f8d
2020-10-30 18:25:25 -07:00
Myle Ott
a4356b1da2 Simplify --user-dir and require user-dir module name to be globally unique (#2815)
Summary:
This PR reverts recent changes that attempted to make `--user-dir` work with non-unique module names. But that new approach introduced other issues (e.g., poor compatibility with multiprocessing and Windows), so let's revert to the previous simpler implementation.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2815

Reviewed By: alexeib

Differential Revision: D24611571

Pulled By: myleott

fbshipit-source-id: cecfe28395585ca0401f844f10bd0d49d014c4d8
2020-10-29 17:08:20 -07:00
Myle Ott
1bc83c703a Misc fixes (#2786)
Summary:
- Rename type -> key in fairseq/tasks/sentence_prediction.py (fixes https://github.com/pytorch/fairseq/issues/2746)
- Update preprocessing docs (fixes https://github.com/pytorch/fairseq/issues/2565)
- Turn off logging in test_fp16_optimizer.TestGradientScaling
- Documentation updates
- Remove some unused code
- Fix noisychannel example (fixes https://github.com/pytorch/fairseq/issues/2213)

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2786

Reviewed By: shruti-bh

Differential Revision: D24515146

Pulled By: myleott

fbshipit-source-id: 86b0f5516c57610fdca801c60e58158ef052fc3a
2020-10-27 11:26:07 -07:00
alexeib
3b27ed7996 Enable Hydra configs in fairseq (#1343) (#1510)
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1510

this is the main pr that switches on hydra functionality in fairseq

we migrate "args" object into omegaconf "DictConfig" at all legacy entry points

in addition this migrates various components from secondary registries (like bpe encoders and tokenizers) to make the migration smoother

i am going through code that references migrated fairseq components and changing it to inherit from "Legacy*" components instead. hopefully tests will catch most of this

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1343

Reviewed By: myleott

Differential Revision: D23973928

Pulled By: alexeib

fbshipit-source-id: dd9554981fff51ea75c1ff343874d1d6e61793c9
2020-10-20 00:32:26 -07:00
Myle Ott
9b8b464070 Package config and examples with fairseq (#1356)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1356

Reviewed By: alexeib

Differential Revision: D24385688

Pulled By: myleott

fbshipit-source-id: 72c4a702d93d2854a6409d42913d7413207cb61e
2020-10-19 09:24:04 -07:00
Myle Ott
a48f235636 Apply black+isort (#1357)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357

Reviewed By: alexeib

Differential Revision: D24377772

fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c
2020-10-18 18:14:51 -07:00
Myle Ott
2d900bf308 Fix tests (#1352)
Summary:
We need to keep `--num-workers=0` during tests

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1352

Reviewed By: alexeib

Differential Revision: D24375411

Pulled By: myleott

fbshipit-source-id: 9975ed5405f3b19b4dd0877ca15ee3081b185942
2020-10-16 17:36:13 -07:00
Armen Aghajanyan
f2fa07106c RXF OS Implementation (#2455)
Summary:
## What does this PR do?
Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156

This code was used to generate all the results from the paper excluding probing results.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455

Reviewed By: myleott

Differential Revision: D23444863

Pulled By: AkshatSh

fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b
2020-10-16 14:32:12 -07:00
Xian Li
573c2f4b60 Opensource code for Deep Transformer with Latent Depth (#2703)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf).

New features and design choices made:

- New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`.
- Design choice: move  `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`.

- New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z.
- Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer.

- New feature: allow multilingual_translation task to train with latent depth (results in the paper).
- Design choice:
  - added additional arguments in the multilingual_translation task.
  - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder.
  - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703

Reviewed By: myleott

Differential Revision: D24155059

Pulled By: xianxl

fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668
2020-10-15 09:26:05 -07:00
Changhan Wang
1d1c145387 speech-to-text OSS
Summary:
Imported from https://github.com/fairinternal/fairseq-py/pull/1284. Updated according to PR comments.

Main changes:
* New task: `fairseq.tasks.speech_to_text`
  * Multilingual support: multiple train sub-splits, temperature-based sampling, language ID tokens
* New dataset: `fairseq.data.audio.speech_to_text_dataset`
* Added accuracy metrics and BOS prefix removal to label smoothed cross entropy
* New models: Transformer (`fairseq.models.speech_to_text.s2t_transformer`) and BLSTM (`fairseq.models.speech_to_text.berard`)
* Extended scorers:
  * Added a base scorer class: `fairseq.scorers.BaseScorer` (the parent class for all scorers except the BLEU scorer in CPP)
  * Added an evaluation tokenizer: `fairseq.scorers.eval_tokenizer` which leverages sacreBLEU's built-in tokenizers and allows character-level tokenization as well as punctuation removal (for WER scoring).
  * Added chrF scorer: `fairseq.scorers.chrf`
* Online Mel-filter bank speech feature extraction (via CPP-based pyKaldi or Python-based TorchAudio): `fairseq.data.audio.audio_utils`
* Online speech feature transforms: `fairseq.data.audio.feature_transforms.*`
* Fixed the subsampled sequence lengths in VGGTransformer (`examples.speech_recognition.models.vggtransformer`)
* Examples under `examples/speech_to_text`:
  * LibriSpeech (ASR): better results than VGGTransformer with smaller Transformer-based models
  * MuST-C (ST): comparable to [SOTA results](https://arxiv.org/pdf/2004.10234.pdf) but with less tricks

Reviewed By: jmp84

Differential Revision: D24065273

fbshipit-source-id: 5f842ca9c826f92d4af660705611885fe440a9ab
2020-10-14 12:30:05 -07:00
alexeib
e3c4282551 remove max_sentences from args, use batch_size instead (#1333)
Summary:
now that we are moving to using dataclasses to define fairseq configuration, having aliases for options is no longer practical. this pr removes "max-sentences" argument while keeping its alias "batch-size", which is more appropriate

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1333

Reviewed By: shruti-bh

Differential Revision: D24121305

Pulled By: alexeib

fbshipit-source-id: 34343cea54c8f2c8b059c38ef9f29b66e76df9fb
2020-10-05 19:09:01 -07:00
Myle Ott
7c292af66f Fix hub (#2687)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2687

Reviewed By: alexeib

Differential Revision: D24095130

Pulled By: myleott

fbshipit-source-id: 7d371bccb550ec68b2b9b39dfa4c0718356508d6
2020-10-02 19:02:01 -07:00
Seppo Enarvi
c049749c7a Fix full-context alignment with transformer_align model (#2675)
Summary:
Fixes https://github.com/pytorch/fairseq/issues/2673.

# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2673 (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2675

Reviewed By: ngoyal2707

Differential Revision: D24001793

Pulled By: myleott

fbshipit-source-id: 6b4e9270e5f5a31ba1b65ae2ae717019108af913
2020-10-01 12:37:16 -07:00
Myle Ott
caea771afa Fix tests (#2670)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2670

Reviewed By: ngoyal2707

Differential Revision: D23982491

Pulled By: myleott

fbshipit-source-id: 629b791d6c05dd67b63dcc2da0313c6799f777f8
2020-09-29 07:27:56 -07:00
Myle Ott
a524832d1d Publish Linformer to public fairseq
Summary: Initial open source release for Linformer

Reviewed By: madian9

Differential Revision: D22771263

fbshipit-source-id: bf08c64c5ecb899db9da00b79d09f6308347c915
2020-09-28 15:32:20 -07:00
Seppo Enarvi
3b7d85c91f Transformer with integrated pointer-generator network (#2529)
Summary:
This pull request implements a variant of the Transformer model that uses an attention distribution for pointing to input words. The attention distribution over the input words is interpolated with the normal output distribution over the vocabulary words, as in [See et al. (2017)](https://arxiv.org/abs/1704.04368). This allows the model to generate words that appear in the input, even if they don't appear in the vocabulary, helping especially with small vocabularies.

The mechanism for copying out-of-vocabulary words from the input has been implemented differently to See et al. In their [implementation](https://github.com/abisee/pointer-generator) they convey the word identities through the model in order to be able to produce out-of-vocabulary words. We wanted to minimize changes to the Fairseq code base and took a different approach, which I'll describe below. The entire implementation is contained in one file (plus there's one new test).

Copying out-of-vocabulary words is possible by pre-processing the input and post-processing the output. The user may add special words to the end of the vocabulary that can be used in place of `<unk>` tokens to identify different input positions (e.g. `<unk-0>`, `<unk-1>`, `<unk-2>`, ...). The number of these special words is given to the model with the `--source-position-markers` argument—the model simply maps all of these to the same word embedding as `<unk>`. With a simple post-processing the user may retrieve word at position N in the original text and use it in place of `<unk-N>`.

I didn't find a good place to document this usage of this model, so let me know if you think I should improve documentation somewhere.

This feature has not yet been discussed via a GitHub issue, but I'll open a new issue for discussion.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2529

Reviewed By: ngoyal2707

Differential Revision: D23398430

Pulled By: myleott

fbshipit-source-id: f2f26c8ce8802ae6cf95515637660348ff3fc457
2020-09-25 08:29:10 -07:00
Mu Tian
42c5dcbd18 hydra fairseq 3 - inherit from legacy for fairseq classes
Summary: hydra fairseq 3 - inherit from legacy for fairseq classes

Reviewed By: alexeib

Differential Revision: D23375457

fbshipit-source-id: ef9d19f2d02f2326eea44a70f1f6e1668b420840
2020-09-09 17:02:13 -07:00
Alex Xiao
e171c8d86a Account for checkpoint updates when calling take on CountingIterator
Summary:
Recently some of our runs are getting:

"RuntimeError: Mismatch between actual and expected iterable length. Please report this to the fairseq developers."

f214567466

We never ran into this before because this is a new check by fairseq to be more strict with iterators.

Fix is to:

1. Account for the offset (i.e. load from checkpoint mid epoch) when propagating `take`. This fixes the issue of `next` returning too many things, which is what causes the error.

2. Update the underlying iterator when calling `take` on `BufferedIterator` and the length of the `BufferedIterator`. Although this doesn't cause the error, it is necessary to maintain consistency.

Reviewed By: myleott

Differential Revision: D23443012

fbshipit-source-id: 73c26db8392e5508a61acfda7ca40a24df89fabb
2020-09-04 14:26:53 -07:00
Yuqing Tang
0cde6b4e50 Added shared dictionary check for translation_multi_simple_epoch task.
Summary: translation_multi_simple_epoch task only supports shared dictionary across all languages, so add the check in the task setup.

Reviewed By: pipibjc

Differential Revision: D23288388

fbshipit-source-id: 4236a096bcb75429b486ef8a9244e3ef0d5095f0
2020-08-28 10:11:30 -07:00
Alex Xiao
49940c8d25 fix mismatch length of counting iterator when truncated
Summary:
PySpeech integration training tests have recently been stuck at end of epoch.

Digging into it, it looks like this is because the end of epoch check relies on this (https://fburl.com/diffusion/xt09z6n9):

```
def end_of_epoch(self) -> bool:
     """Returns whether the most recent epoch iterator has been exhausted"""
     return not self._cur_epoch_itr.has_next()
```

which is implemented like this in CountingIterator:

    def has_next(self):
        """Whether the iterator has been exhausted."""
        return self.n < len(self)

It seems like D23172408 (110f9f0cc7) modified CountingIterator such that `len(self) > len(iter(self))` when `take()` is used. This mismatch causes `has_next` to return `True` for some  PySpeech processes even when all elements in `iter(self))` have been consumed, causing training to get stuck.

My proposed fix is to remove the `self.early_stop`  variable and just directly modify `self.total` and `self.iterable`, ensuring `len(self) == len(iter(self))`

Reviewed By: myleott

Differential Revision: D23250734

fbshipit-source-id: efb5a38216783bded67f501135b2f68b9246b9dd
2020-08-20 20:08:38 -07:00
Matt Post
bd1b35d9b7 Added constrained decoding (#1536) (#2402)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add *ordered constraints*, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).

### Usage and quick start

It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)):

```bash
echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \
  | [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \
  | [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \
  | PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \
  --bpe fastbpe \
  --bpe-codes $modeldir/bpecodes \
  --constraints \
  --constraints-both
  -s de -t en \
  --path $modeldir/model1.pt \
  --max-tokens 1000 \
  --beam 5 \
```

Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)

```text
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
C-0     hard
C-0     influence
H-0     -1.5333266258239746     Mach@@ ine trans@@ lation is hard to influence .
D-0     -1.5333266258239746     Machine translation is hard to influence .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
H-0     -0.3731671869754791     Mach@@ ine trans@@ lation is difficult to control .
D-0     -0.3731671869754791     Machine translation is difficult to control .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514
2020-07-31 12:17:55 | INFO | fairseq_cli.interactive | Total time: 12.803 seconds; translation time: 3.688
```

Note the new tags present in the output:

* `C-#` records active constraints (after applying preprocessing) for a sentence
* `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)

Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.

### Implementation notes

This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.

Addresses https://github.com/pytorch/fairseq/issues/1536.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402

Reviewed By: alexeib

Differential Revision: D23188945

Pulled By: myleott

fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6
2020-08-20 11:59:53 -07:00
Myle Ott
adbd89fd4b Misc fixes (#2492)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2492

Reviewed By: ngoyal2707

Differential Revision: D23177728

Pulled By: myleott

fbshipit-source-id: 32424f61cab57f759f87e16e8d5144d3eed5ae36
2020-08-20 06:42:10 -07:00
Jun Ru Anderson
68c87f0abf optimize mixed precision (#1248)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Implements the multiply_factor optimization used in memory efficient fp16 training to mixed precision training. The methods multiply_grads and clip_grad_norm do not touch each gradient, but rather a "multiply factor" that is then factored in when unscaling gradients.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1248

Reviewed By: myleott

Differential Revision: D23201396

Pulled By: andersonic

fbshipit-source-id: 6c6f64542893e0ecac72e132464bb334dcb9874d
2020-08-19 16:04:40 -07:00
Myle Ott
9831634946 Misc fixes (#2448)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2448

Reviewed By: ngoyal2707

Differential Revision: D23011193

Pulled By: myleott

fbshipit-source-id: 1a29481707108e4465aca78ec1581fb79f05efba
2020-08-14 10:24:51 -07:00
Yuqing Tang
0bb7bc3777 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: add finetuning options
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- Minor changes to
    - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Reviewed By: myleott

Differential Revision: D22483494

fbshipit-source-id: 733300fd6a4d185e561c793ea668047c96f616c6
2020-08-06 10:20:39 -07:00
Rakesh Chada
b040dae714 Fixes checkpoint_path while loading a model-parallel checkpoint (#2365)
Summary:
Fixes https://github.com/pytorch/fairseq/issues/2351

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2365

Reviewed By: pipibjc

Differential Revision: D22727384

Pulled By: myleott

fbshipit-source-id: e2ff703181a6b8f10df9b4ee7aa3f9e128c04b4e
2020-08-04 08:25:50 -07:00
Stanislau Hlebik
698e3b91ff remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:21:51 -07:00
Stanislau Hlebik
7ea5e3b341 remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:21:45 -07:00
Yuqing Tang
e52d071ee8 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new multiligual task
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py
- Minor changes to
    - fairseq/data/iterators.py to allow dynamic batch sampler
    - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Reviewed By: pipibjc

Differential Revision: D22483484

fbshipit-source-id: 283b67e538508f330b0968609b7dae64d26bea05
2020-07-16 09:34:29 -07:00
m_fomicheva
2887663811 Implemented applying dropout at inference time (#2308)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2308

Implemented Monte Carlo dropout. Added README to reproduce the results from our paper
that applies this idea for unsupervised quality estimation of NMT (joint work of Facebook AI and the University of Sheffield):

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia. Unsupervised Quality Estimation for Neural Machine Translation. Accepted to TACL

Retaining dropout at test time is not possible in the current code base. The statement
```
if not self.retain_dropout:
  model.eval()
```
in `SequenceGenerator` does not have any effect, since model `training` attribute is already set to False by the method `make_generate_fast_`, which is applied before initializing `SequenceGenerator` in `generate.py`. `make_generate_fast_` throws an exception when trying to set `training` to True after its application. Also, if I am not mistaken `self.training=True` can have other effects, so setting it to True only for the purpose of retaining dropout at test time might be confusing. I propose an alternative implementation where `retain_dropout` is an attribute of FairseqModel class.

# Before submitting

- [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [Y] Did you make sure to update the docs?
- [Y] Did you write any new necessary tests?

## What does this PR do?
New feature.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2151

Reviewed By: ngoyal2707

Differential Revision: D22048889

Pulled By: myleott

fbshipit-source-id: 0d0d4784a7314fc7a45b76341fd3b8232b3e2cf0
2020-07-08 13:06:13 -07:00
Mike Ruberry
320bf8cf96 Updates full to no longer use deprecated integer fill_value type inference
Summary:
In PyTorch 1.5 using an integer fill_value and not setting the dtype or out kwarg with torch.full was deprecated, and soon will throw a runtime error. In the future, torch.full will infer its dtype from the fill_value, and these would produce integer, not float, tensors. This update maintains the current behavior.

Created from Diffusion's 'Open in Editor' feature.

Reviewed By: myleott

Differential Revision: D22161456

fbshipit-source-id: b5d687e4de83dba6e76cae6e61b5106bf5b320db
2020-06-22 11:56:58 -07:00
Joshua Meier
152a3fe143 Support residual connections in LSTM models (#1103)
Summary:
Adds support for residual connections in LSTM models.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1103

Reviewed By: myleott

Differential Revision: D21639942

Pulled By: joshim5

fbshipit-source-id: a02ddfe080a847fd91a9c6a5074cb6dc782f7727
2020-06-05 12:15:10 -07:00
Myle Ott
5abc774eea Re-enable test_transformer_fp16 GPU test
Reviewed By: theweiho

Differential Revision: D21890628

fbshipit-source-id: 4088884dd2a82a831f1c129e675eb233c469242a
2020-06-05 06:06:20 -07:00
Wei Ho
ea092c2aa6 Split out fairseq GPU tests & add new deeplearning_fairseq_gpu contbuild using remote execution
Reviewed By: myleott

Differential Revision: D21472387

fbshipit-source-id: efde278baf6a05e8a81a9630b44c7e7e7c7fe7fc
2020-06-03 18:53:35 -07:00
Marco Gaido
5453e4355b Avoid NaN in speech_recognition with input having only 1 spec… (#1864)
Summary:
…trogram

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1863.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1864

Reviewed By: yqwangustc

Differential Revision: D21663642

Pulled By: myleott

fbshipit-source-id: f411c5c01c7505375bec6d47554e85fb70877e9c
2020-05-27 07:50:34 -07:00
Myle Ott
803c0a6d11 Update iterators to support counting, rename CountingIterator.count -> n and add tests (#1166)
Summary:
A few changes here:
- update GroupedIterator and ShardedIterator to support counting. This will be useful on TPUs, since the TPU dataloading threads may advance faster than we can process them.
- add tests for the above
- in CountingIterator, rename `count` -> `n`. This is needed because `count` is overloaded for iterables (e.g., `list` defines a different `count` method, which is actually a search function).
- in CountingIterator, rename `override_len` -> `total` to be more consistent with other iterators (e.g., tqdm). This functionality was unused previously (it's only needed for TPUs), so the rename is easy.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1166

Reviewed By: ngoyal2707

Differential Revision: D21373525

Pulled By: myleott

fbshipit-source-id: 102f3d50ed1a5163a7d1216ca5a179564a05dfe4
2020-05-14 13:57:04 -07:00
Myle Ott
9a718e2985 Various fixes (#2127)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2127

Reviewed By: ngoyal2707

Differential Revision: D21550962

Pulled By: myleott

fbshipit-source-id: ddbe3f287f170862378e0702fc378a4fe400793a
2020-05-14 10:23:34 -07:00
Myle Ott
6209d7d6b2 Fix eval_lm (fixes #2083) and a few other small things (#2100)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2100

Reviewed By: ngoyal2707

Differential Revision: D21456309

Pulled By: myleott

fbshipit-source-id: 291711589fca9f158e0fdbf01194da3e66fbd0aa
2020-05-11 12:43:14 -07:00
Marco Gaido
11345a7608 Pass all net_inputs in SequenceGenerator (#2090)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2022.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2090

Reviewed By: cndn

Differential Revision: D21385984

Pulled By: myleott

fbshipit-source-id: 1428e02e625b8625df71a83c05dcf933c3f899df
2020-05-10 06:13:06 -07:00
Myle Ott
89d18af127 Cleanup transformer (#1160)
Summary:
This also fixes https://github.com/pytorch/fairseq/issues/2079
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1160

Reviewed By: ngoyal2707

Differential Revision: D21338290

Pulled By: myleott

fbshipit-source-id: 266bda0921a42b218127f83ab7aa8cc8282582cd
2020-05-04 07:16:30 -07:00
Myle Ott
7a6519f84f Bugfixes (#1159)
Summary:
Several bugfixes to get tests passing on OSS master
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1159

Reviewed By: ngoyal2707

Differential Revision: D21331993

Pulled By: myleott

fbshipit-source-id: 327ae19f6797f92b8c6083a49d5f5edb0872223e
2020-05-01 04:09:37 -07:00
Halil Akin
c748571417 Add a test for fp16 to fairseq
Reviewed By: myleott

Differential Revision: D21315518

fbshipit-source-id: df17efeec6fb2b576371b124d78e9294cef3e74c
2020-04-30 10:35:15 -07:00
Ning Dong
b1af3e33d5 Modify gated unit tests to fix Fairseq OSS (#2059)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2059

test_ensemble_sequence_generator and test_export_ensemble_model are green on fbcode master but Pytorch 1.5 release cut happened before the TorchScript fix, so updating the gate to 1.6
Remove quantization test from fairseq as FBGEMMS is binded at OSS side. Will add the test back in fbtranslate but land this first to fix OSS side failures.

Reviewed By: myleott

Differential Revision: D21231873

fbshipit-source-id: 8a2ad7dbed118ca8e3f4c351c399a82fd9740445
2020-04-24 13:29:50 -07:00
Myle Ott
d502958b4d Fix LSTM LM unit tests (#2021)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2021

Reviewed By: cndn

Differential Revision: D21092383

Pulled By: myleott

fbshipit-source-id: c6074fe14cc977b3674d77c1c1bc8fb726108934
2020-04-21 10:48:37 -07:00
Angela Fan
1c8ab79ca5 quant noise code, readme, start of adding quantization (#1896)
Summary:
FUNCTIONALITY:
This diff provides two core pieces of functionality
- Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training
- Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training.

TODO:
-> Pierre, look at quantization code
-> int4 and int8 quantization will be added soon.

EVALUATED TEST CASES:

0. Training of LM and BERT models starts from scratch with no errors -> yes

1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise

2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes

3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline

4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes

5. Check wps during training and wps during inference, no large change from before -> yes

6. Check structured dropout isn't being applied at eval time -> yes

7. Works in combination with LayerDrop -> yes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896

Reviewed By: myleott

Differential Revision: D20609420

Pulled By: huihuifan

fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955
2020-04-21 09:28:56 -07:00
Marco Gaido
4ec169b988 Fix max_position resolution with tuples having len > 2 (#2028)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2027 .

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2028

Reviewed By: ngoyal2707

Differential Revision: D21134466

Pulled By: myleott

fbshipit-source-id: 070d7f971bc8d88ec1ca43d52797e2f0b07fb6af
2020-04-21 06:01:14 -07:00
Xianfeng Rui
57526c6343 Update Fairseq LSTM to jitable version (#2016)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2016

It is to update Fairseq LSTM to jitable version

Reviewed By: cndn

Differential Revision: D20937370

fbshipit-source-id: 26f677fcb58bbeaa507d303e9a81060ff78f0502
2020-04-16 15:49:56 -07:00
Nayan Singhal
89e75fa315 Fix BMUF using 1 GPU
Summary:
With 1 GPU, BMUF is no longer required. Instead, it just works like a simple model training.

Add unit test too for Single GPU BMUF

Reviewed By: jay-mahadeokar

Differential Revision: D21033060

fbshipit-source-id: 9030187c05d49548222c8d1e2fe9534a6c6c4389
2020-04-16 11:25:35 -07:00
James Cross
c4697e83cb TorchScript support for AANTransformer
Summary: Moving ``_test_save_and_load()` up top top-level for possible reuse across classes.

Reviewed By: cndn

Differential Revision: D20971566

fbshipit-source-id: b9d9c554d03f26cd43eee9f209e1c1367679af72
2020-04-10 18:23:50 -07:00
Ning Dong
b142b7d9ec Script _no_repeat_ngram in fb_simple_sequence_generator (#1963)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1963

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1128

It's a common issue that short inputs (< 5 tokens) get repeated due to default length constraint (max_len_a=1.1, max_len_b=5) https://fb.workplace.com/groups/2286753504877951/permalink/2674177509468880/.

In the future we want to use no_ngram_repeat to handle the issue. The functionality is in sequence generator but it needs to be scripted for production use.

Reviewed By: liuchen9494

Differential Revision: D20801865

fbshipit-source-id: c3085f19921adb85415636d16ce31e3826642335
2020-04-10 14:44:42 -07:00
Ning Dong
08691f8d0b Support quantization in Fairseq Sequence generator
Summary: The fix in MHA is suggested by driazati, to avoid JIT compilation for if branch in MHA forward when in scripting. Without this quantization wouldn't work. Details in https://fb.workplace.com/groups/2240361332735959/permalink/626166461295703/

Reviewed By: jhcross

Differential Revision: D20881076

fbshipit-source-id: b50347b45cd7dbdef02ac7b71316ba734019f57e
2020-04-08 17:48:54 -07:00
Chen Liu
d37529ed23 Script reoder_incremental_state in fairseq baseline model (#1127)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1127

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1953

Script the `reorder_incremental_states` in the base FairseqModel
Remove the overwrite scriptable `reorder_incremental_states` in the TransformerModel
Change the decoder_len, since len(Tuple) is supported in Script

Relanded reverted diff D20797390

Reviewed By: myleott

Differential Revision: D20896200

fbshipit-source-id: cc4ae34f89f16007656cce6ec6f7e01b13899278
2020-04-07 15:01:31 -07:00
Chen Liu
1b749f4a34 Deprecate the SequenceGenerator with the Scripted vision (#1120)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940

Deprecate the SequenceGenerator in Fairseq with the Scripted vision.

Pass all integration unit tests

- Copy ScriptSequenceGenerator to SequenceGenerator:
  - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment)
   - Add support for other EnsembleModels as input arg (marked with the inline comment)
 - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel
   - Add `test_fb_ensemble_model` to test folk/join feature
   - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross
- Modified SequenceGenerator initialization the interface
- Clear up the codes: delete unused functions `get_normalized_probs` and `_decode`

Reland reverted diff D20685075

Reviewed By: cndn

Differential Revision: D20895977

fbshipit-source-id: 424ee318e67d5d6ffed3edb92c7fa78485ba34af
2020-04-07 13:28:30 -07:00
Aapo Kyrola
966436403e Revert D20685075: Deprecate the SequenceGenerator with the Scripted vision
Differential Revision:
D20685075

Original commit changeset: 046b76874465

fbshipit-source-id: 7ec2a2ca3b90251a560e2323c22b52ec7436fecb
2020-04-07 00:59:53 -07:00
Aapo Kyrola
8a528888e4 Revert D20797390: Script reoder_incremental_state in fairseq baseline model
Differential Revision:
D20797390

Original commit changeset: ab29874973ad

fbshipit-source-id: efd2d720c96ee90d1e8dc36178e04f0bf5510278
2020-04-07 00:59:48 -07:00
Chen Liu
d369c88019 Script reoder_incremental_state in fairseq baseline model (#1127)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1127

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1953

Script the `reorder_incremental_states` in the base FairseqModel
Remove the overwrite scriptable `reorder_incremental_states` in the TransformerModel
Change the decoder_len, since len(Tuple) is supported in Script

Reviewed By: myleott

Differential Revision: D20797390

fbshipit-source-id: ab29874973adc5dbd556c591942a0e071c81fc52
2020-04-06 20:40:40 -07:00
Chen Liu
bc93681348 Deprecate the SequenceGenerator with the Scripted vision (#1120)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940

Deprecate the SequenceGenerator in Fairseq with the Scripted vision.

Pass all integration unit tests

- Copy ScriptSequenceGenerator to SequenceGenerator:
  - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment)
   - Add support for other EnsembleModels as input arg (marked with the inline comment)
 - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel
   - Add `test_fb_ensemble_model` to test folk/join feature
   - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross
- Modified SequenceGenerator initialization the interface
- Clear up the codes: delete unused functions `get_normalized_probs` and `_decode`

Reviewed By: myleott

Differential Revision: D20685075

fbshipit-source-id: 046b76874465a70d8118a97ad670311c6ce1d1c8
2020-04-06 17:47:47 -07:00
Louis MARTIN
18831f9f83 Fix validation happening twice at the end of epoch (#1934)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes validation happening twice at the end of epoch after refactor. Spotted by freewym
 here: b5dad3b7e0 (r38103577)

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1934

Reviewed By: myleott

Differential Revision: D20724205

Pulled By: louismartin

fbshipit-source-id: 8c26c39b9904508780e8542813797c8e1306ca80
2020-04-03 16:38:39 -07:00
Anchit Gupta
f6f092f489 Make TransformerDecoupled model scriptable (#1125)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1125

Pull Request resolved: https://github.com/pytorch/translate/pull/695

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1927

- Switches the model to the scripted sequence generator recently implemented in fairseq. Involved making the input/ouput format of this model to conform to that in Fairseq TransformerEncoder/Decoder
- Modify the `EncoderOut` format for fairseq transformer and added optional fields needed for copy ptr decoder
- Switches to using WordEmbedding directly instead of the non scriptable EmbeddingList for src/trg embedding layer
- Small assorted syntactic changes to make it jit scriptable
- Adds a torchscriptify method for this model. Preliminarily latency seems similar to the unexported model. Also verified that the outputs match
- Currently the Roberta decoupled model is not scriptable because the base TransformerSentenceEncoder it is based on is not scriptable. We can look at adding that later

Reviewed By: einolghozati

Differential Revision: D20687247

fbshipit-source-id: 8232972bba2f1b2df4100f3c1776b6bad08a71db
2020-04-01 17:53:49 -07:00
Myle Ott
f2ae57908b Fix tests (#1110)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1110

Reviewed By: ngoyal2707

Differential Revision: D20649232

Pulled By: myleott

fbshipit-source-id: 55bc18284ac792012aaa794d5102c877ff781f8c
2020-03-26 07:59:28 -07:00
James Cross
fd76cb5b41 TestEncoder to return type EncoderOut (#1894)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1894

Having a uniform return type for `FairseqEncoder` makes these test models function more similarly to real models.

Reviewed By: myleott, cndn

Differential Revision: D20596971

fbshipit-source-id: a744614c015af9b150f2b0ae8381b1368556f738
2020-03-23 16:10:02 -07:00
David Příhoda
42f65d6577 Support multiple regression targets in sentence prediction (#1831)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1830
Adds tests for RoBERTa (masked_lm, classification, single regression, multiple regression)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1831

Reviewed By: ngoyal2707

Differential Revision: D20446010

Pulled By: myleott

fbshipit-source-id: 9f37bcedf0910d85446245d71bc234bc74c62da5
2020-03-21 16:55:26 -07:00
Myle Ott
5028ed1b6b Reduce device-to-host transfers (#1082)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1082

Differential Revision: D20365765

Pulled By: myleott

fbshipit-source-id: 7b6c14303b46b42db1a1e279c84dbe9cb2cf72f2
2020-03-11 05:57:16 -07:00
Marco Gaido
431d604f69 Fix generation with encoder which return an output of shape different from the input (#1792)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1791.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1792

Reviewed By: jmp84

Differential Revision: D20322704

Pulled By: myleott

fbshipit-source-id: 3cfa1bddda06b966e9dc9bc8ff183009d844b23c
2020-03-10 11:51:08 -07:00
Myle Ott
937535dba0 Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073)
Summary:
[This commit](dd1298e15f) made it so that duplicate entries in a dictionary are ignored. Unfortunately the Camembert model depends on overwriting `<unk>`, `<s>` and `</s>`.

The proposed solution here is to allow the dictionary to have entries like:
```
<unk> 999 #fairseq:overwrite
<s> 999 #fairseq:overwrite
</s> 999 #fairseq:overwrite
, 999
▁de 999
. 999
(...)
```

These will preserve the old overwriting behavior. Thus we can release a new `camembert.v0.tar.gz` with a dictionary like above and it works.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1073

Reviewed By: kahne

Differential Revision: D20284569

Pulled By: myleott

fbshipit-source-id: bf78fbff13c94bf8a6485cbdda62305ddc30c056
2020-03-08 06:52:00 -07:00
Elijah Rippeth
46b773a393 refactor namespaces in criterion interface (#1729)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1672 in part (part 1: [context](https://github.com/pytorch/fairseq/pull/1714#issuecomment-587507040))

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1729

Differential Revision: D20049353

Pulled By: myleott

fbshipit-source-id: 732077a1cc339c9f7ebe26dae42a7e8d7b5a07b4
2020-03-04 16:43:59 -08:00
Myle Ott
aa79bb9c37 Use 1-based indexing for epochs everywhere (#1053)
Summary:
We are somewhat inconsistent in whether we're using 0-based or 1-based indexing for epochs. This should fix things to be 0-based internally, with logging and checkpoint naming still using 1-based indexing.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1053

Reviewed By: spencerp

Differential Revision: D20160715

Pulled By: myleott

fbshipit-source-id: 4ed94f9c371e1bfe29bcfa087fa6756507d6e627
2020-03-04 16:37:24 -08:00
alexeib
3335de5f44 add vq-wav2vec (#1029)
Summary:
sanitized vq-wav2vec implementation. i will also add docs to this. i have a fixed-up checkpoint that this code can load and verified that it produces same results as what we used in paper
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1029

Differential Revision: D20129246

Pulled By: alexeib

fbshipit-source-id: f72f455e0c309168e644ab86ec18c768c308da98
2020-02-29 18:25:34 -08:00
Chen Liu
fdfdbec9e2 Rewrite the unit test of sequence generator
Summary:
1. Overwrite the base class function `get_normalized_probs` in scriptable TransformerModel
2. Change the unit test setup to match the Transformer decoder output format
3. Initialze the buffer in the simple sequence generator [WIP]
   1. It is the initial step to script the sequence generator from simple scriptable version.
4. Refactor the unit test of simple sequence generator.
5. Change the input format of simple sequence generator and unit test.

Reviewed By: myleott

Differential Revision: D20017859

fbshipit-source-id: a3e93b57c22e49840e460469fa2b1c530346886d
2020-02-26 11:09:20 -08:00
Myle Ott
8845dcf5ff Move MoE files into examples (#1040)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1040

Differential Revision: D20030279

Pulled By: myleott

fbshipit-source-id: 76b48a62409020039225cf98e8fcf7a494d0b7f8
2020-02-21 14:13:37 -08:00
Myle Ott
12ab22e06c Fix deprecation warnings in unit tests (#1043)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1043

Differential Revision: D20030274

Pulled By: myleott

fbshipit-source-id: 34962c1caaf3879af8b527a266852e443b59ffe4
2020-02-21 11:44:28 -08:00
Ning Dong
3df10a9529 Add save and load tests to fairseq export test (#1653)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1653

Earlier we had some issues at pickling. Type information gets lost. Fixed in https://github.com/pytorch/pytorch/pull/32569.

These save_and_load tests are added for protection in the future.

Reviewed By: myleott

Differential Revision: D19435988

fbshipit-source-id: 560ea65ed3493bebcf394327818364b3fcd6fc92
2020-01-30 16:14:35 -08:00
Ning Dong
a07cb6f404 Script Fairseq transformer (#1011)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1011

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1620

Make Fairseq transformer scriptable. Discussion points on possible code refactoring:

(1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too.

(2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py)

Reviewed By: myleott

Differential Revision: D19234599

fbshipit-source-id: db3dd364ecf3ae14fb7ac8c0928bd0ebe250f19d
2020-01-30 15:59:15 -08:00
Myle Ott
61aad8f9cd Force certain optimizers to set --fp16-no-flatten-grads (#1010)
Summary:
When training with `--fp16` we usually flatten the grads since it's faster. But flat grads are not semantically equivalent for certain optimizers (e.g., Adafactor, LAMB), thus the user needed to be aware of this and set `--fp16-no-flatten-grads`. Let's raise a RuntimeError in this case instead.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1010

Differential Revision: D19575773

Pulled By: myleott

fbshipit-source-id: bac99c3026f9870e6127e0fa55f70e8a3e4507dc
2020-01-28 08:02:30 -08:00
Myle Ott
88185fcc3f Cleanup new incremental state API (#1005)
Summary:
* Now that we have `FairseqIncrementalState`, we can move `get_incremental_state` and `set_incremental_state` as methods in that class, instead of having the helper functions in `utils.py`. I think this will eventually help with type checking too.
* The incremental ID logic was overly complicated, we can just use `uuid` to generate a unique ID for every instance.
* Add missing `with_incremental_state` to light/dynamic conv modules.
* Add additional unit test: `test_incremental_state_multihead_attention`

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1005

Test Plan:
* unit tests

Also confirmed this matches master:
```
$ python generate.py ~/data/data-bin/wmt16_en_de_bpe32k --path /checkpoint/myleott/s3/models/wmt16.en-de.joined-dict.transformer/model.pt --beam 4 --lenpen 0.6 --remove-bpe --quiet
(...)
2020-01-22 09:53:38 | INFO | fairseq_cli.generate | Generate test with beam=4: BLEU4 = 29.28, 60.8/35.1/22.8/15.3 (BP=0.997, ratio=0.997, syslen=62859, reflen=63078)
```

Reviewed By: cndn

Differential Revision: D19517908

Pulled By: myleott

fbshipit-source-id: a406490e342d0d30a9231bf823d3350999bda4c0
2020-01-27 10:25:33 -08:00
Joshua Meier
9f4256edf6 Standalone LSTM decoder language model (#934)
Summary:
Currently, the LSTM models in Fairseq master can only be used in an encoder/decoder setting, for example, in `class LSTMModel(FairseqEncoderDecoderModel)`. This PR adds a standalone LSTM decoder language model.

Changes:
- adds support for `LSTMDecoder` in cases where an encoder is not present, for instance, where `encoder_output_units=0`.
- fixes bugs in `LSTMDecoder` that only become apparent when using it in a standalone fashion, for example, not handling `src_lengths` as an optional argument.
- adds `class LSTMLanguageModel(FairseqLanguageModel)` for training LSTM language models.
- tests for the `LSTMLanguageModel`. Changes to the `LSTMDecoder` are handled by existing test cases.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/934

Reviewed By: myleott

Differential Revision: D18816310

Pulled By: joshim5

fbshipit-source-id: 4773695a7f5d36aa773da8a45db2e02f76c968a9
2020-01-24 13:16:22 -08:00
Elijah Rippeth
f1d856e006 fix Windows build (#1007)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1007

# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1622

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1631

Differential Revision: D19555401

Pulled By: myleott

fbshipit-source-id: c62dfc109e09a7d732a9fc73ac6feef63a8dd341
2020-01-24 10:32:20 -08:00
Myle Ott
f4a9bc2ea6 Clean up tests
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1004

Differential Revision: D19517900

Pulled By: myleott

fbshipit-source-id: a588efeabd3119dd058067e82d1b21e4d81ae218
2020-01-22 11:29:20 -08:00
Ning Dong
89a2a0ccde Script SinusoidalPositionalEmbedding (#683)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/683

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1612

Make SinusoidalPositionalEmbedding scriptable. Mostly adding types. The only change that affects lots of downstream code is to have max_positions as member variable instead of method.

Reviewed By: myleott

Differential Revision: D18924939

fbshipit-source-id: 2b6486563e9ec5cc34bcf11acdff9054658f4674
2020-01-22 10:55:28 -08:00
Ning Dong
4e48c4ae5d Script MultiheadAttention (#1002)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1002

Pull Request resolved: https://github.com/pytorch/translate/pull/681

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1524

Make fairseq MultiheadAttention scriptable. Looking for feedbacks.

1. Add types
2. Move incremental state management logic from util functions to initializers. TorchScript in general doesn't support global dict. As a result modules with multihead attention in it would assign itself fairseq_instance_id in the initializer.
3. There might be opportunities to make assertions and annotations cleaner.

Reviewed By: myleott

Differential Revision: D18772594

fbshipit-source-id: 377aef4bbb7ef51da5b6bac9a87a6f7b03b16fe1
2020-01-21 18:35:28 -08:00
Myle Ott
9f961964aa Fix logging of training sets (fixes #1632) (#1634)
Summary:
* fix: mid-epoch validation metrics were previously polluting training metrics
* fix: mid-epoch metrics were not properly saved/restored in checkpoints
* added tests, both for metrics and for mid-epoch reproducibility
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1634

Differential Revision: D19470714

Pulled By: myleott

fbshipit-source-id: 491fa8d830b653cdd6a86095645aabcac758d214
2020-01-20 16:34:33 -08:00
Jiatao Gu
60fbf64f30 Add --eval-bleu for translation
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/989

Reviewed By: MultiPath

Differential Revision: D19411162

Pulled By: myleott

fbshipit-source-id: 74842f0174f58e39a13fb90f3cc1170c63bc89be
2020-01-17 12:17:46 -08:00
Myle Ott
b488e1fe56 Reverse symlinks in root and fairseq_cli (2/3)
Summary: This is needed to support other build environments (e.g., Windows)

Reviewed By: ngoyal2707

Differential Revision: D19409984

fbshipit-source-id: e970510781abf92f1b02d0961bc30e1210b524dd
2020-01-17 08:26:20 -08:00
Myle Ott
fb76dac1c4 Switch to Python logging (+ lint) (#1627)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1627

Python logging offers a number of benefits, such as logging timestamps, better
cross-library compatibility, ability to add multiple output handlers, etc.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/646

Reviewed By: spencerp

Differential Revision: D15815620

Pulled By: myleott

fbshipit-source-id: 5e64e9929b5e4b9dd5bb49bcdf7c510631907134
2020-01-16 16:14:45 -08:00
Aleksandra Piktus
fab2e86e51 Add a diverse beam search variant to sequence_generator.py (#953)
Summary:
This PR implements a new generation strategy that we experimented with in project Pinocchio (https://github.com/fairinternal/Pinocchio), see the paper submission in: https://fburl.com/hduj2me7.

Specifically in this PR:
- added a Diverse Beam Search variant as described in https://arxiv.org/abs/1611.08562
- moved the Search object generation out of `sequence_generation.py`, which allows for limiting the number of kwargs passes around
- made sure the above changes are backward compatible based on grep - P124083926
- added test cases covering these scenarios
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/953

Test Plan:
- `python -m unittest tests.test_binaries -v`- including added test cases, see issues below for some details
- `python -m unittest tests.test_sequence_generator -v` - including added test cases
- tested locally in conjunction with the Pinocchio repo
- grepped for all instantiations of `SequenceGeneration`, made sure they're backward compatible

# Issues
- when I try to run all tests with `python -m unittest tests.test_binaries -v` command, the execution gets stuck on `test_binaries.TestTranslation.test_generation` - the test otherwise passes without problems when ran individually. Is this a known problem?
- discovered T59235948 - assigned to fairseq oncall

Reviewed By: myleott, fabiopetroni

Differential Revision: D19142394

Pulled By: ola13

fbshipit-source-id: d24543424c14a9537e7b6485951d9f841da62b07
2020-01-06 08:24:02 -08:00
Myle Ott
fb2d29d2aa Fix multilingual translation
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/972

Differential Revision: D19265750

Pulled By: myleott

fbshipit-source-id: 4c432b0d3616a6194c2c0f61f97012937d22db6f
2020-01-06 07:13:10 -08:00
Peng-Jen Chen
4c5934ac61 Fix multilingual translation errors and add unit test
Summary:
- Fix github issue [1393](https://github.com/pytorch/fairseq/issues/1393), [1315](https://github.com/pytorch/fairseq/issues/1315).
- Add unit test to cover training, validation and generation for multilingual model to make sure they can run without problem. (didn't test the correctness)

Reviewed By: lematt1991

Differential Revision: D19149575

fbshipit-source-id: 9ec9000d037cc5c3bd8457feb527f2305375a442
2019-12-19 07:08:59 -08:00
Myle Ott
dfde36bc66 Create build.yml
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1515

Differential Revision: D19151562

Pulled By: myleott

fbshipit-source-id: 426eca1e449cac914d49877678323a6487c0adbe
2019-12-17 20:45:11 -08:00
Sujit Verma
28b131359b Added unit test for PathManager file io (with or without fvcore).
Summary: Added unit test for PathManager file io (with or without fvcore).

Reviewed By: theweiho

Differential Revision: D18880067

fbshipit-source-id: 969c2be90415d22041b8276b7a5ff264571561d0
2019-12-09 14:19:51 -08:00
Jiatao Gu
b0fb74f143 REFACTOR: NAT Implementation (#925)
Summary:
This diff mainly first contains the implementation for NAT-CRF model:
- Fast Structured Decoding for Sequence Models (NAT-CRF, Sun et al., 2019)

We implemented a dynamic CRF module and incorporate it into the implementation of vanilla NAT model. In order to reproduce the performance on paper.

We implemented the length beam as well as reranking from a learned autoregressive model in the iterative-refinement-generator;
We also implemented a new ensemble code which enables to do ensemble for all NAT models, not only Levenshtein Transformer itself. We refactor all the codes and move the models into ``fairseq/models/nat``.

Finally, we updated the README.md for NAT models.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/925

Differential Revision: D18738085

Pulled By: MultiPath

fbshipit-source-id: 4e421c5d52d2456fbe99e7863d715c756b1fd49b
2019-12-03 18:39:28 -08:00
Myle Ott
1c56594001 Fix lightconv_lm and add test (#932)
Summary:
Fixes https://github.com/fairinternal/fairseq-py/issues/536
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/932

Differential Revision: D18783032

Pulled By: myleott

fbshipit-source-id: a520faccc20be78296a228214923ee1495fb536f
2019-12-03 09:25:52 -08:00
Myle Ott
cb6c67bcdb Make torch.hub interface automatically apply tokenization and BPE
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926

Differential Revision: D18685772

Pulled By: myleott

fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0
2019-11-26 07:49:37 -08:00
Myle Ott
e26ee47a8c Fix LM generation and add unit test
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/896

Differential Revision: D18250948

Pulled By: myleott

fbshipit-source-id: 7a515311e18795670b29f5e24eeba7619a625da7
2019-11-13 14:37:12 -08:00
Myle Ott
27568a7ebe Merge TracingCompliantTransformer and regular Transformer, fix NAT tests
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/899

Differential Revision: D18373060

Pulled By: myleott

fbshipit-source-id: bb5510ec15799a0a10a7c0669e76d8200e1ba479
2019-11-13 09:12:13 -08:00
Spencer Poff
68dd3e171b Fixing key padding mask during transformer generation
Summary:
https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.

This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.

Reviewed By: myleott

Differential Revision: D18224313

fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6
2019-11-05 06:50:53 -08:00
Nayan Singhal
b5f41f828b Add Unit test cases for BMUF
Summary:
This unit test guards the bmuf code.

change:
1. distributed_init assumes we are always using cuda device which is not the case if you are using "gloo" backend on CPU machine.

Reviewed By: jay-mahadeokar

Differential Revision: D17821391

fbshipit-source-id: 28e1bb39f7a4889b1dc6bd636b7c499e55bfc69a
2019-10-15 09:59:36 -07:00
Sarthak Garg
1c66792948 Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in  "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859
2019-09-30 06:57:32 -07:00
Stephan Peitz
4ac2c5f2cc Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097)
Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c
2019-09-29 05:09:42 -07:00
Changhan Wang
86857a58bf Levenshtein Transformer paper code
Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7
2019-09-27 13:58:45 -07:00
Jerry Ma
a8a85c2676 Add dataset class for weighted sampling with replacement. (#861)
Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658
2019-09-19 10:36:00 -07:00
Myle Ott
6ce55e4b01 Small fixes
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a
2019-08-19 15:08:25 -07:00
Myle Ott
7c89e13f64 Fix tests
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822

Differential Revision: D16800078

Pulled By: myleott

fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7
2019-08-13 20:36:00 -07:00
Myle Ott
d015d23a1f Add fairseq-validate
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765

Differential Revision: D16763357

Pulled By: myleott

fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503
2019-08-13 13:07:04 -07:00
Dmytro Okhonko
72f9364cc6 Asr initial push (#810)
Summary:
Initial code for speech recognition task.
Right now only one ASR model added - https://arxiv.org/abs/1904.11660

unit test testing:
python -m unittest discover tests

also run model training with this code and obtained
5.0 test_clean | 13.4 test_other
on librispeech with pytorch/audio features
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810

Reviewed By: cpuhrsch

Differential Revision: D16706659

Pulled By: okhonko

fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514
2019-08-08 02:46:12 -07:00
Myle Ott
4abadbdf77 Fix sampling with beam>1
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792

Differential Revision: D16591987

Pulled By: myleott

fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694
2019-08-01 07:34:06 -07:00
Myle Ott
e75cff5f2c Relicense fairseq under MIT license (#786)
Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034
2019-07-30 07:48:23 -07:00
Sara Hanson
a03fe6faf3 Implement sparse transformer fixed attention pattern (#804)
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
2019-07-22 16:42:55 -07:00
Myle Ott
47fd985269 Move Masked LM components to legacy/ -- new ones are coming
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f
2019-07-21 19:38:00 -07:00
Xing Zhou
e46b924dea Nucleus (top-P) sampling (#710)
Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
2019-07-17 06:21:33 -07:00
Myle Ott
efb4345042 Fix resuming training when using --memory-efficient-fp16
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396
2019-06-23 14:19:16 -07:00
Bairen Yi
a8f28ecb63 Python3.5 compat (#794)
Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b
2019-06-11 04:10:08 -07:00
Matt Le
fa7791df9a Change encoder_learned_pos default back to True for xlm_base
Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde
2019-06-06 07:38:17 -07:00
Matt Le
5408bc0821 Fix loading XLM pretraining
Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm.  Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb
2019-06-04 15:36:55 -07:00
Myle Ott
ffc3bb5806 Add --reset-dataloader
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d
2019-05-30 11:41:40 -07:00
Yongqiang Wang
8ce2c35d8e Implement reducing footprint of average checkpoint correctly (#747)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522
2019-05-24 12:12:24 -07:00
Ning Dong
ee28411f76 Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/730

Pull Request resolved: https://github.com/pytorch/translate/pull/528

Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch.

Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset.

Reviewed By: liezl200

Differential Revision: D15260872

fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f
2019-05-20 11:31:53 -07:00
Myle Ott
3bfbb49ba5 Clean up sharded train iterator
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da
2019-05-16 21:03:08 -07:00
Myle Ott
dffb167449 Updates to model API (#561)
Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
2019-05-15 07:12:41 -07:00
Myle Ott
7432130eb0 rm default_key from MultiCorpusSampledDataset
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9
2019-05-14 16:45:21 -07:00
Dmytro Okhonko
cd1e5c09fa Move save/load checkpoint functions to utils
Summary:
Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py
Move `get_perplexity` from train.py to utils.py.
This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library.

Reviewed By: myleott

Differential Revision: D15289607

fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1
2019-05-14 12:57:12 -07:00
Jingfei Du
93ec8d0bc6 expose arguments for bias_kv and zero_attn for masked_lm
Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.

Reviewed By: myleott

Differential Revision: D15266154

fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8
2019-05-08 17:48:29 -07:00
Davide Caroselli
a1c997bd9a Memory-Mapped IndexedDataset implementation (#589)
Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
2019-05-07 07:13:52 -07:00
Myle Ott
e4edf27a97 Improve init speed of TokenBlockDataset and EpochBatchIterator
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/704

Differential Revision: D15221549

Pulled By: myleott

fbshipit-source-id: b0021acdc2d7792ce51421f1432e1f2bd8218f7b
2019-05-07 07:08:53 -07:00
Naman Goyal
0add50c2e0 allowing sharded dataset (#696)
Summary:
Co-authored-by: myleott <myleott@fb.com>

Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.

For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.

myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/696

Differential Revision: D15214049

fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013
2019-05-06 15:27:17 -07:00
Myle Ott
96ac28d33d Fix and generalize --temperature option (#508)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0
2019-05-04 16:39:32 -07:00
Myle Ott
d45db80431 Merge internal changes (#654)
Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
2019-04-29 19:50:58 -07:00
Liezl Puzon
57b6a6dbfb Fix fairseq unittest timeouts (#667)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/667

Use smaller models so that unittests won't timeout

Reviewed By: pipibjc

Differential Revision: D15056894

fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013
2019-04-25 08:39:36 -07:00
Liezl Puzon
5008fd4e5a XLM for NMT: option to only load encoder or decoder (#666)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/666

Option to load the XLM weights into only the encoder or the decoder

Reviewed By: pipibjc

Differential Revision: D14881004

fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac
2019-04-25 05:57:02 -07:00
Liezl Puzon
8da9b1c530 Load a XLM model into transformer encoder / decoder for MT training (#629)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15
2019-04-25 05:57:02 -07:00
Ning Dong
90d6eac2b3 Enable custom sampling strategy in MultiCorpusSampledDataset (#639)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/639

Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously.

Reviewed By: liezl200

Differential Revision: D14965774

fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e
2019-04-16 23:29:02 -07:00
Myle Ott
e12e1d254c Simplify and generalize utils.make_positions
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625

Differential Revision: D14822123

Pulled By: myleott

fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103
2019-04-15 07:32:11 -07:00
Peng-Jen Chen
d7e19573fa Back translation + denoising in MultilingualTranslation task (#620)
Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7
2019-04-10 10:56:51 -07:00
Dmytro Okhonko
860010e907 Handle 3+ dimensional input in sequence_generator + nits
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
2019-03-12 15:12:21 -07:00
Dmytro Okhonko
d17fa85135 Adadelta optimizer
Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta

Reviewed By: myleott

Differential Revision: D14418635

fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f
2019-03-12 15:12:21 -07:00
Vladimir Karpukhin
f296824f40 Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541

Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo

Differential Revision: D14251048

fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8
2019-02-28 09:19:12 -08:00
Myle Ott
bc919276a1 Add test for mixture of experts
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543

Differential Revision: D14259481

Pulled By: myleott

fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e
2019-02-28 08:56:24 -08:00
Myle Ott
44d27e645b Add Tensorboard support (#530)
Summary:
Enable with the `--tensorboard-logdir` option.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/530

Differential Revision: D14218430

Pulled By: myleott

fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053
2019-02-25 18:40:18 -08:00
Myle Ott
b65c579bed Modularize generate.py (#351)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/351

This makes it easier for tasks to plugin to generate.py/interactive.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/520

Differential Revision: D14183881

Pulled By: myleott

fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33
2019-02-22 10:08:52 -08:00
Davide Caroselli
bbb4120b00 Support custom Dictionary implementations in 'preprocess.py' (#448)
Summary:
The `preprocess.py` script has been refactored in order to:

1. Use the `options` module for command line arguments  parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/448

Differential Revision: D13674819

Pulled By: myleott

fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
2019-02-01 09:45:59 -08:00
Myle Ott
3dce7c9fc0 Add --input option to interactive.py to support reading from file
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484

Differential Revision: D13880636

Pulled By: myleott

fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e
2019-01-30 09:46:05 -08:00
Myle Ott
42be3ebd41 Merge internal changes (#483)
Summary:
Changelog:
- `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
- `0d76427`: fix assertion error when training language model with dataset containing empty sentences
- minor bug and style fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/483

Differential Revision: D13867899

Pulled By: myleott

fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda
2019-01-30 09:01:10 -08:00
Myle Ott
b41c74dc5b Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473)
Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473

Differential Revision: D13819717

Pulled By: myleott

fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
2019-01-25 15:40:26 -08:00
Myle Ott
7633129ba8 Merge internal changes (#283)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
2019-01-04 20:03:19 -08:00
Myle Ott
3c19878f71 Refactor BacktranslationDataset to be more reusable (#354)
Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
2018-11-25 21:26:03 -08:00
Myle Ott
0864a9c49d Fix build for docs
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372

Differential Revision: D13114426

Pulled By: myleott

fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3
2018-11-18 08:32:59 -08:00