Commit Graph

101 Commits

Author SHA1 Message Date
Alex Liu
a0ceabc287 include wav2vec-u 2.0 (#2826)
Summary:
# Before submitting

- [    ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ x ] Did you make sure to update the docs?
- [ x ] Did you write any new necessary tests?

## What does this PR do?
include wav2vec-u 2.0

!!! TODO !!! update title/link of paper in readme

X-link: https://github.com/fairinternal/fairseq-py/pull/2826

Reviewed By: michaelauli

Differential Revision: D37162174

Pulled By: alexeib

fbshipit-source-id: b985ebb9bb94c25d30b6fc53d8c79088cb9798f9
2022-06-14 21:54:56 -07:00
dianaml0
3a72168bd8 Add CircleCI status badge (#4473)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/facebookresearch/fairseq/pull/4473

Reviewed By: cbalioglu

Differential Revision: D37052250

Pulled By: dianaml0

fbshipit-source-id: e5e4c38a9108c769953ef2202c7adb8aa335771a
2022-06-10 06:55:45 -07:00
dianaml0
51478ad3a1 xformer integration (#2263)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
This PR is a cleaned up version of https://github.com/fairinternal/fairseq-py/issues/2138. It is based on the `main` branch instead of the `gshard` branch. Removed call to xFormers MultiHeadDispatch, only using xFormers Attention.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/2263

Reviewed By: blefaudeux

Differential Revision: D33800377

Pulled By: dianaml0

fbshipit-source-id: 658d52214c782212b12881b30c4d908a763b4cf2
2022-05-04 09:15:36 -07:00
Dmitry Vinnik
592c1227f4 docs: add social button in support of Ukraine (#4249)
Summary:
Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4249

Reviewed By: arbabu123

Differential Revision: D34635479

Pulled By: dmitryvinn-fb

fbshipit-source-id: 488d30f0967ae9542ead968c5cb951ecf0e02a64
2022-03-04 16:28:09 -08:00
Ann Lee
5cd7a21cc1 S2ST oss (#2756)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Releasing code, model & recipe for the work "Direct speech-to-speech translation with discrete units".
Main changes:
1. examples/speech_to_speech
2. tasks/speech_to_speech
3. data/audio/speech_to_speech_dataset
4. models/speech_to_speech
5. criterions/speech_to_speech_criterion

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2756

Reviewed By: sravyapopuri388, kahne

Differential Revision: D32923969

Pulled By: an918tw

fbshipit-source-id: 838ba42457f4684e9767d15b5b514681a9572b39
2021-12-28 08:07:55 -08:00
Arun Babu
7105d7f4b1 attempt5 (#2658)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2658

Reviewed By: ngoyal2707

Differential Revision: D32520446

Pulled By: arbabu123

fbshipit-source-id: a4cbc12624c9c8c1b5bc3d64eb47c2fdec01eb87
2021-11-17 20:56:37 -08:00
Sam Shleifer
c5ff181125 NormFormer: flags and docs (#2460)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2460

Reviewed By: myleott

Differential Revision: D31731798

Pulled By: sshleifer

fbshipit-source-id: 938456c17aa004cacffdcdd124aebe390da83d5f
2021-10-19 17:13:04 -07:00
Hu Xu
862efab86f MMPT bug fixes (#2428)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes argument for `lr_scheduler.total_num_update`; missing import of `dsprocessor` for COIN; `vmasks` on demo inference; update README.md of fairseq for examples/MMPT.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2428

Reviewed By: berniebear

Differential Revision: D31528947

Pulled By: howardhsu

fbshipit-source-id: 1fecf34bdab82cbf6001e3905a532e4e6eb38e01
2021-10-10 02:21:43 -07:00
Qiantong Xu
6639842016 zero-shot model release (#2407)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?

zero-shot model release

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2407

Reviewed By: alexeib

Differential Revision: D31417241

Pulled By: xuqiantong

fbshipit-source-id: 576644694638d3b2606f1751b74feb0531b50eb7
2021-10-05 18:31:47 -07:00
Diana Liskovich
5adfeaccf9 Rename references from master -> main in preparation for branch name change (#2297)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2297

Reviewed By: alexeib

Differential Revision: D30906090

Pulled By: dianaml0

fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
2021-09-20 08:29:38 -07:00
dianaml0
f6abcc2a67 update on branch renaming (#3879)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3879

Reviewed By: myleott

Differential Revision: D30969142

Pulled By: dianaml0

fbshipit-source-id: 902154c03fd68ae6645d3e0ac07b7d729dfc7934
2021-09-16 10:03:02 -07:00
alexeib
2513524a16 add finetuned robust w2v models and update readme (#2196)
Summary:
adds finetuned robust w2v models and updates readme

fixes #3721

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2196

Reviewed By: wnhsu

Differential Revision: D30367999

Pulled By: alexeib

fbshipit-source-id: 616b373bf31265c89f694fba7dccce2961d394f3
2021-08-17 06:56:49 -07:00
Ann Lee
7f2fb5caa8 Release code for the paper "Discriminative Reranking for Neural Machine Translation" (#2044)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Release the code for the paper "Discriminative Reranking for Neural Machine Translation"

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2044

Reviewed By: michaelauli

Differential Revision: D29628590

Pulled By: an918tw

fbshipit-source-id: 7a52602d495b736573187cc721829aa545d24770
2021-07-09 17:26:02 -07:00
alexeib
d18e44a289 add robust w2v model (#2046)
Summary:
add robust wav2vec model

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2046

Reviewed By: wnhsu

Differential Revision: D29628639

Pulled By: alexeib

fbshipit-source-id: 296cd2da579a969a71a0f9ffe1062002b73a8d86
2021-07-08 19:26:53 -07:00
Naman Goyal
2fd9d8a972 released xlmr xl and xxl model weights (#1944)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1944

Reviewed By: jingfeidu

Differential Revision: D28944206

fbshipit-source-id: 583837f7dd387341574d27dd9acc145455d640a8
2021-06-07 15:05:53 -07:00
Myle Ott
00d5b7adbe Add README/tutorial for Fully Sharded Data Parallel (#3327)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/3327

Reviewed By: sshleifer

Differential Revision: D26899416

Pulled By: myleott

fbshipit-source-id: bbb493a5c4e0a51f3b26fe8f94e3962b6206d6f6
2021-03-09 06:31:53 -08:00
Myle Ott
cfbf0dddbc Small changes to make tests more reliable (#1572)
Summary:
After this, `python setup.py test` should be more reliable (including when multiple GPUs are present)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1572

Reviewed By: alexeib

Differential Revision: D25984113

Pulled By: myleott

fbshipit-source-id: 7fef27ae90c079c07f592ed9fb350ccf8b56d23d
2021-01-21 07:33:54 -08:00
Myle Ott
336942734c Better support for WandB (#1530)
Summary:
Logs full config

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1530

Reviewed By: sshleifer

Differential Revision: D25723894

Pulled By: myleott

fbshipit-source-id: bb4b168c774bef498e336bbb3ba92eda4b08df3b
2021-01-02 10:24:03 -08:00
Dexter Ju
d740093bac Porting adaptive span to fairseq (#1428)
Summary:
## What does this PR do?
1. We add an enwiki8 character level LM task sweep for transformer XL, which lands at 1.05 matches the performance (1.06): https://github.com/kimiyoung/transformer-xl/tree/master/pytorch
Eval with
```
 PYTHONPATH=. python fairseq_cli/eval_lm.py /private/home/daju/data/enwik8/eos-data-bin/ --path /checkpoint/daju/2020-11-19/enwiki8.transformer_xl.fp16.transformer_xl.adam.cl0.25.cosine.lr0.00025.s2.ngpu4/checkpoint_best.pt --user-dir examples/truncated_bptt/ --task truncated_bptt_lm --batch-size 8 --tokens-per-sample 80  --model-overrides '{"mem_len":2100,"clamp_len":820,"same_length":True}'
```
2. Impalements adaptive span in fairseq code. It reproduces the enwiki8 result at 1.03 comparing to 1.02 (for the 12 L model) reported in https://github.com/facebookresearch/adaptive-span, which is a consistent improvement over the transformer XL baseline listed above with a smaller model.
You can evaluate the example run with:
```
PYTHONPATH=. python fairseq_cli/eval_lm.py /private/home/daju/data/enwik8/eos-data-bin/ --path  /checkpoint/daju/2020-11-20/enwiki8.adaptivespan.headwise.adaptive_span.adagrad_with_grad_clip.ag_cl0.03.fixed.wu32000.lr0.07.s2.loss5e-07.ngpu4/checkpoint_best.pt --user-dir examples/truncated_bptt/ --task truncated_bptt_lm --batch-size 8 --tokens-per-sample 512 --gen-subset test
```

Paper: https://arxiv.org/abs/1905.07799

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1428

Reviewed By: myleott

Differential Revision: D25495754

Pulled By: dexterju

fbshipit-source-id: 15a875a5f82d506a4964dea934a374132ce39f8b
2020-12-14 11:37:18 -08:00
Raphael Scheible
f3d5045a71 add German RoBERTa model (GottBERT) (#2992)
Summary:
# Before submitting

- There is no related issue for this pull request.
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- We did not see any necessity for tests.

## What does this PR do?
Add German RoBERTa model (GottBERT)

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2992

Reviewed By: alexeib

Differential Revision: D25494927

Pulled By: myleott

fbshipit-source-id: b6790124d7c3c8dc387c141706cd8a527cc950ab
2020-12-11 19:10:49 -08:00
Myle Ott
0a848245f3 Add Truncated BPTT example + TransformerXL (#1410)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1410

Test Plan:
- reproduced Transformer-XL results (see README)
- added integration test

Reviewed By: jingfeidu

Differential Revision: D24928966

Pulled By: myleott

fbshipit-source-id: 86376c17ab24d37e72e7c097b6dcec71b1a087a7
2020-11-15 19:47:42 -08:00
alexeib
bd2e804b9c add and link hydra docs (#1405)
Summary:
updates hydra integration doc and links to it

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1405

Reviewed By: myleott

Differential Revision: D24808779

Pulled By: alexeib

fbshipit-source-id: a50160e196e469e30e39d6ee47440a569c0154bd
2020-11-07 17:25:18 -08:00
Myle Ott
b4d57c6d49 Move TPU grad reductions out of Trainer into TPUDistributedDataParallel (#1397)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1397

Data parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001`

Data parallel before:
```
2020-11-04 08:20:13 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:20:13 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:20:13 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last.pt
2020-11-04 08:20:13 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:20:14 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:20:14 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:20:14 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:20:19 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=5
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      3 / 3587 loss=16.721, ppl=108002, wps=160870, ups=4.91, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      4 / 3587 loss=16.07, ppl=68785.8, wps=517232, ups=15.77, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      5 / 3587 loss=15.714, ppl=53741.4, wps=537322, ups=16.38, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      6 / 3587 loss=15.441, ppl=44492.1, wps=540488, ups=16.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      7 / 3587 loss=15.199, ppl=37603.2, wps=543411, ups=16.57, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:19 | INFO | train_inner | epoch 001:      8 / 3587 loss=14.984, ppl=32414, wps=540359, ups=16.47, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:20 | INFO | train_inner | epoch 001:      9 / 3587 loss=14.7, ppl=26622.2, wps=533446, ups=16.26, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:20:20 | INFO | train_inner | epoch 001:     10 / 3587 loss=14.482, ppl=22875.4, wps=539734, ups=16.46, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6
```

Data parallel after:
```
2020-11-04 08:14:02 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:14:02 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:14:02 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last.pt
2020-11-04 08:14:02 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:14:03 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:14:03 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:14:03 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:14:08 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      3 / 3587 loss=16.721, ppl=108002, wps=157099, ups=4.79, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      4 / 3587 loss=16.07, ppl=68785.8, wps=560049, ups=17.08, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      5 / 3587 loss=15.714, ppl=53741.4, wps=558507, ups=17.03, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      6 / 3587 loss=15.441, ppl=44492.1, wps=514194, ups=15.68, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:08 | INFO | train_inner | epoch 001:      7 / 3587 loss=15.199, ppl=37603.2, wps=552676, ups=16.85, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:09 | INFO | train_inner | epoch 001:      8 / 3587 loss=14.984, ppl=32414, wps=546402, ups=16.66, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:09 | INFO | train_inner | epoch 001:      9 / 3587 loss=14.7, ppl=26622.2, wps=508472, ups=15.5, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:09 | INFO | train_inner | epoch 001:     10 / 3587 loss=14.482, ppl=22875.4, wps=552493, ups=16.84, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6
```

Data parallel command (no_c10d): `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001 --dp-backend no_c10d`

Data parallel before:
```
2020-11-04 08:19:25 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:19:25 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:19:25 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last.pt
2020-11-04 08:19:25 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:19:25 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:19:26 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:19:26 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:19:31 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:19:31 | INFO | train_inner | epoch 001:      2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      3 / 3587 loss=16.721, ppl=108001, wps=141659, ups=4.32, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      4 / 3587 loss=16.07, ppl=68785.9, wps=503762, ups=15.36, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      5 / 3587 loss=15.714, ppl=53741.5, wps=488599, ups=14.9, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      6 / 3587 loss=15.441, ppl=44492, wps=507855, ups=15.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      7 / 3587 loss=15.199, ppl=37603, wps=503270, ups=15.34, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=7
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      8 / 3587 loss=14.984, ppl=32414, wps=467778, ups=14.26, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=7
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:      9 / 3587 loss=14.7, ppl=26622.2, wps=503800, ups=15.36, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=7
2020-11-04 08:19:32 | INFO | train_inner | epoch 001:     10 / 3587 loss=14.482, ppl=22875.3, wps=468486, ups=14.28, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7
```

Data parallel after:
```
2020-11-04 08:14:50 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:14:50 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:14:50 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last.pt
2020-11-04 08:14:50 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:14:50 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:14:51 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:14:51 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:14:56 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      3 / 3587 loss=16.721, ppl=108001, wps=137677, ups=4.2, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      4 / 3587 loss=16.07, ppl=68785.9, wps=519541, ups=15.84, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      5 / 3587 loss=15.714, ppl=53741.5, wps=517063, ups=15.76, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      6 / 3587 loss=15.441, ppl=44492, wps=490728, ups=14.95, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      7 / 3587 loss=15.199, ppl=37603, wps=505262, ups=15.41, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:56 | INFO | train_inner | epoch 001:      8 / 3587 loss=14.984, ppl=32414, wps=508874, ups=15.52, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:57 | INFO | train_inner | epoch 001:      9 / 3587 loss=14.7, ppl=26622.2, wps=518028, ups=15.79, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6
2020-11-04 08:14:57 | INFO | train_inner | epoch 001:     10 / 3587 loss=14.482, ppl=22875.3, wps=515996, ups=15.73, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7
```

Model parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm_megatron --decoder-layers 4 --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --model-parallel-size 2 --share-decoder-input-output-embed --lr 0.0001`

Model parallel before:
```
2020-11-04 08:18:38 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:18:38 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:18:38 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt
2020-11-04 08:18:38 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:18:38 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:18:39 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:18:39 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:18:44 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:18:45 | INFO | train_inner | epoch 001:      2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=7
2020-11-04 08:18:45 | INFO | train_inner | epoch 001:      3 / 7173 loss=28.372, ppl=3.47501e+08, wps=48371.7, ups=2.95, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:18:46 | INFO | train_inner | epoch 001:      4 / 7173 loss=15.855, ppl=59276.8, wps=72422.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:18:46 | INFO | train_inner | epoch 001:      5 / 7173 loss=14.713, ppl=26858.7, wps=72933.5, ups=4.45, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:18:46 | INFO | train_inner | epoch 001:      6 / 7173 loss=13.901, ppl=15299.7, wps=71974.8, ups=4.39, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:18:46 | INFO | train_inner | epoch 001:      7 / 7173 loss=13.312, ppl=10169.5, wps=72897.8, ups=4.45, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:18:47 | INFO | train_inner | epoch 001:      8 / 7173 loss=12.914, ppl=7720.21, wps=73044.6, ups=4.46, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:18:47 | INFO | train_inner | epoch 001:      9 / 7173 loss=12.56, ppl=6036.72, wps=73453.1, ups=4.48, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:18:47 | INFO | train_inner | epoch 001:     10 / 7173 loss=12.116, ppl=4437.77, wps=73442.6, ups=4.48, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=9
```

Model parallel after:
```
2020-11-04 08:12:09 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-11-04 08:12:09 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
2020-11-04 08:12:09 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt
2020-11-04 08:12:09 | INFO | fairseq.trainer | loading train data for epoch 1
2020-11-04 08:12:09 | INFO | fairseq.data.data_utils | loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train
2020-11-04 08:12:10 | INFO | fairseq.optim.adam | using FusedAdam
2020-11-04 08:12:10 | INFO | fairseq.trainer | begin training epoch 1
2020-11-04 08:12:16 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 64.0
2020-11-04 08:12:17 | INFO | train_inner | epoch 001:      2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=8
2020-11-04 08:12:17 | INFO | train_inner | epoch 001:      3 / 7173 loss=28.372, ppl=3.47501e+08, wps=53097, ups=3.24, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:12:17 | INFO | train_inner | epoch 001:      4 / 7173 loss=15.855, ppl=59276.8, wps=72355.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8
2020-11-04 08:12:17 | INFO | train_inner | epoch 001:      5 / 7173 loss=14.713, ppl=26858.7, wps=70526.4, ups=4.3, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:12:18 | INFO | train_inner | epoch 001:      6 / 7173 loss=13.901, ppl=15299.7, wps=73063.5, ups=4.46, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:12:18 | INFO | train_inner | epoch 001:      7 / 7173 loss=13.312, ppl=10169.5, wps=73559.4, ups=4.49, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:12:18 | INFO | train_inner | epoch 001:      8 / 7173 loss=12.914, ppl=7720.21, wps=72693.2, ups=4.44, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:12:18 | INFO | train_inner | epoch 001:      9 / 7173 loss=12.56, ppl=6036.72, wps=73531.2, ups=4.49, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9
2020-11-04 08:12:19 | INFO | train_inner | epoch 001:     10 / 7173 loss=12.116, ppl=4437.77, wps=73187.6, ups=4.47, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=10
```

Test Plan: Imported from OSS

Reviewed By: ngoyal2707

Differential Revision: D24729295

Pulled By: myleott

fbshipit-source-id: beee8bdece3eaa0419a2e813990420411e507c75
2020-11-05 15:29:33 -08:00
Myle Ott
dd52ed0f38 Small fixes (#1392)
Summary:
- Set default value of clip-norm back to 0.0 (disabled)
- Add comment explaining that we divide loss by log(2) to covert the base
- Fix `--zero-optimizer=os` (fixes #2811)
- Update requirements to PyTorch >= 1.5
- Fix bug in fixed LR schedule

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1392

Reviewed By: alexeib

Differential Revision: D24714231

Pulled By: myleott

fbshipit-source-id: 63dc8cfc74683bbccbf05b44228014eb12ddbfc7
2020-11-03 20:45:06 -08:00
Armen Aghajanyan
f2fa07106c RXF OS Implementation (#2455)
Summary:
## What does this PR do?
Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156

This code was used to generate all the results from the paper excluding probing results.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455

Reviewed By: myleott

Differential Revision: D23444863

Pulled By: AkshatSh

fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b
2020-10-16 14:32:12 -07:00
Xian Li
573c2f4b60 Opensource code for Deep Transformer with Latent Depth (#2703)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf).

New features and design choices made:

- New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`.
- Design choice: move  `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`.

- New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z.
- Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer.

- New feature: allow multilingual_translation task to train with latent depth (results in the paper).
- Design choice:
  - added additional arguments in the multilingual_translation task.
  - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder.
  - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703

Reviewed By: myleott

Differential Revision: D24155059

Pulled By: xianxl

fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668
2020-10-15 09:26:05 -07:00
Chau Tran
a2d0be4989 Add CRISS README and code to fairseq (#1344)
Summary:
# Before submitting

- [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [Y] Did you make sure to update the docs?
- [N/A] Did you write any new necessary tests?

## What does this PR do?
Add code to reproduce results from Cross-lingual Retrieval for Iterative Self-supervised Training.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1344

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

See https://github.com/fairinternal/fairseq-py/tree/criss_pr/examples/criss

Reviewed By: myleott

Differential Revision: D24268469

Pulled By: chtran

fbshipit-source-id: d4dd36b22bde3c364ce6e935bd39baf8f96e0735
2020-10-14 10:34:51 -07:00
Myle Ott
a524832d1d Publish Linformer to public fairseq
Summary: Initial open source release for Linformer

Reviewed By: madian9

Differential Revision: D22771263

fbshipit-source-id: bf08c64c5ecb899db9da00b79d09f6308347c915
2020-09-28 15:32:20 -07:00
Myle Ott
703fd48bb1 Fix README and #2496 (#2505)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2505

Reviewed By: shruti-bh

Differential Revision: D23247882

Pulled By: myleott

fbshipit-source-id: 1cfc9e0128e1aa55a1aca31d8dd30f231558e70f
2020-08-20 15:44:17 -07:00
Matt Post
bd1b35d9b7 Added constrained decoding (#1536) (#2402)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add *ordered constraints*, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).

### Usage and quick start

It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)):

```bash
echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \
  | [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \
  | [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \
  | PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \
  --bpe fastbpe \
  --bpe-codes $modeldir/bpecodes \
  --constraints \
  --constraints-both
  -s de -t en \
  --path $modeldir/model1.pt \
  --max-tokens 1000 \
  --beam 5 \
```

Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)

```text
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
C-0     hard
C-0     influence
H-0     -1.5333266258239746     Mach@@ ine trans@@ lation is hard to influence .
D-0     -1.5333266258239746     Machine translation is hard to influence .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
H-0     -0.3731671869754791     Mach@@ ine trans@@ lation is difficult to control .
D-0     -0.3731671869754791     Machine translation is difficult to control .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514
2020-07-31 12:17:55 | INFO | fairseq_cli.interactive | Total time: 12.803 seconds; translation time: 3.688
```

Note the new tags present in the output:

* `C-#` records active constraints (after applying preprocessing) for a sentence
* `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)

Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.

### Implementation notes

This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.

Addresses https://github.com/pytorch/fairseq/issues/1536.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402

Reviewed By: alexeib

Differential Revision: D23188945

Pulled By: myleott

fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6
2020-08-20 11:59:53 -07:00
Myle Ott
9831634946 Misc fixes (#2448)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2448

Reviewed By: ngoyal2707

Differential Revision: D23011193

Pulled By: myleott

fbshipit-source-id: 1a29481707108e4465aca78ec1581fb79f05efba
2020-08-14 10:24:51 -07:00
alexeib
621e834103 wav2vec 2.0 (#1220)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1220

Test Plan: Please see examples/wav2vec/README.md for instructions

Reviewed By: edunov

Differential Revision: D22707565

Pulled By: alexeib

fbshipit-source-id: 0c0d4ca7acc933ef7c0062f8dce550b94e414680
2020-08-04 14:19:56 -07:00
Myle Ott
f0a61a2774 Miscellaneous fixes (#1196)
Summary:
Incorporate several fixes, incl. from OSS contributors:
- fix model argument in sequence generator in semisupervised_translation.py
- fix aggregate logging in semisupervised_translation.py
- Fix EOS token in multilingual_denoising
- Handle missing eos_idx in data_utils.collate_tokens
- Better OOM handling for single-GPU training
- fix prepend_bos argument in translation_from_pretrained_bart.py …
- Fix eos_idx in multilingual_denoising
- Small logging fixes
- Fix fb_hub on PyTorch 1.6
- Better variable names
- Add support for model parallel to interactive.py
- Use `//` operator to fix Integer division warning
- Set default `--clip-norm=0.0`
- Cleanup some binaries in root directory

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196

Reviewed By: ngoyal2707

Differential Revision: D22162202

Pulled By: myleott

fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d
2020-06-24 10:08:53 -07:00
Myle Ott
145bc9de12 Several small fixes (incl. set default --data-buffer-size=10) (#2163)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2163

Reviewed By: ngoyal2707

Differential Revision: D21665601

Pulled By: myleott

fbshipit-source-id: 47673ff7f07acf0002c4e28380aa08ff917618ee
2020-05-26 15:59:59 -07:00
Xutai Ma
12d5e6ff16 Monotonic multihead attention (#1707)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Add code for published paper from FB

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

*Still WIP*
jmp84
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1707

Reviewed By: jmp84

Differential Revision: D21304498

Pulled By: xutaima

fbshipit-source-id: 073d522e0eeef3e02c83e4617b8e5b697ff6979b
2020-04-30 13:26:32 -07:00
Angela Fan
1c8ab79ca5 quant noise code, readme, start of adding quantization (#1896)
Summary:
FUNCTIONALITY:
This diff provides two core pieces of functionality
- Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training
- Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training.

TODO:
-> Pierre, look at quantization code
-> int4 and int8 quantization will be added soon.

EVALUATED TEST CASES:

0. Training of LM and BERT models starts from scratch with no errors -> yes

1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise

2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes

3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline

4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes

5. Check wps during training and wps during inference, no large change from before -> yes

6. Check structured dropout isn't being applied at eval time -> yes

7. Works in combination with LayerDrop -> yes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896

Reviewed By: myleott

Differential Revision: D20609420

Pulled By: huihuifan

fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955
2020-04-21 09:28:56 -07:00
Naman Goyal
78a995db2f adding readme and releasing megatron big model (#1124)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1124

Reviewed By: myleott

Differential Revision: D20749898

fbshipit-source-id: 42bca96d8d65158ae858ceaa7386afedf1696ebb
2020-04-03 12:03:56 -07:00
Myle Ott
5065077dfc Use cross entropy from apex for improved memory efficiency (#1122)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1122

Reviewed By: ngoyal2707

Differential Revision: D20745717

Pulled By: myleott

fbshipit-source-id: 877a1185f17952461ef204d8ad7f05b8d37b1fd9
2020-03-31 08:59:59 -07:00
Changhan Wang
fdac9bbce1 Byte-Level BPE paper code
Summary:
Implemented byte-level BPE described in ["Neural Machine Translation with Byte-Level Subwords"](https://arxiv.org/abs/1909.03341)
* Added bytes/characters/byte-level BPE tokenizers to fairseq.data.encoder
* Added detokenization option to generate.py
* Added an example under examples/byte_level_bpe
* Implemented Transformer model with Bi-GRU embedding contextualization: `examples/byte_level_bpe/gru_transformer.py`

Reviewed By: myleott

Differential Revision: D20600963

fbshipit-source-id: 3eca4d046056c07f65333123416017a4eac04c8a
2020-03-24 07:59:05 -07:00
Myle Ott
11cc356395 fairseq requires PyTorch >= 1.4 (#1844)
Summary:
Fixes https://github.com/pytorch/fairseq/issues/1843
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1844

Differential Revision: D20468391

Pulled By: myleott

fbshipit-source-id: 0b2e2ba35c94eeb49d0e6bb05a8fefa4b847f46d
2020-03-16 07:45:50 -07:00
Stefan Schweter
3dd221c90f Readme: Fix link to mBART documentation (#1789)
Summary:
Hi,

this PR updates the link to mBART documentation in main readme.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1789

Differential Revision: D20322673

Pulled By: myleott

fbshipit-source-id: b59c94f49176ba5bbd664791818b5b8ce7402698
2020-03-07 10:04:46 -08:00
Yinhan Liu
5e79322b3a open source mbart (#1033)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1033

Differential Revision: D20122520

Pulled By: yinhanliu

fbshipit-source-id: e2fd93e2fa9b7a8e276acc4316a176ba3ceae4ed
2020-02-27 08:30:43 -08:00
Myle Ott
37bd90845c Update README (#1051)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1051

Differential Revision: D20119560

Pulled By: myleott

fbshipit-source-id: caf089341931990777393b916c846a332b76e9dc
2020-02-26 09:13:47 -08:00
Pengcheng YIN
9090dad8d1 Update README for apex (#1563)
Summary:
Recent releases of apex removed the `fused_adam_cuda` function used in 3f4fc50163/fairseq/optim/adam.py (L220). Users need to use the `--deprecated_fused_adam` option to isntall `fused_adam_cuda`

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1563

Differential Revision: D19260517

Pulled By: myleott

fbshipit-source-id: 69af015f3ef1fa85b98d138c28876ada194c9437
2020-01-02 11:37:30 -08:00
Myle Ott
6b4700cea4 Update README.md
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1521

Differential Revision: D19159323

Pulled By: myleott

fbshipit-source-id: 7e3fefc29229a90bcffe8bb1c5cff3507712d94c
2019-12-18 07:04:19 -08:00
Myle Ott
dfde36bc66 Create build.yml
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1515

Differential Revision: D19151562

Pulled By: myleott

fbshipit-source-id: 426eca1e449cac914d49877678323a6487c0adbe
2019-12-17 20:45:11 -08:00
Myle Ott
05514f8a82 Update README to indicate we only support Python >= 3.6 (fixes #1317)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/952

Differential Revision: D19133348

Pulled By: myleott

fbshipit-source-id: 51f96ddb13386143fe0088f19f7cb0674755811f
2019-12-16 19:46:53 -08:00
Changhan Wang
7612eefc6d add VizSeq to README
Summary: add VizSeq to README

Reviewed By: MultiPath

Differential Revision: D18877679

fbshipit-source-id: f1de226e37b19ec967dfcec91216521d4e5b6e22
2019-12-08 11:08:46 -08:00
Myle Ott
cb6c67bcdb Make torch.hub interface automatically apply tokenization and BPE
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926

Differential Revision: D18685772

Pulled By: myleott

fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0
2019-11-26 07:49:37 -08:00
Louis Martin
b31849aa92 Camembert model and code (#904)
Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.

I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904

Reviewed By: ngoyal2707

Differential Revision: D18418345

Pulled By: louismartin

fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17
2019-11-10 11:29:07 -08:00