* fix imports referencing moved metrics.py file
* Make representation computation branchless in TransformerEncoderBase (#4818)
Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
* Fix Torchscript typing in transformer_encoder.py (#4847)
* Add Generative Spoken Dialogue Language Modeling (#4879)
* Update deprecated torch.qr in glow.py example (#4685)
torch.qr is deprecated for a long time and is being removed by https://github.com/pytorch/pytorch/pull/70989.
This PR makes the example compatible with new and old PyTorch versions.
* Emotion Conversion Paper Open Source (#4895)
* data2vec v2.0 (#4903)
data2v2c 2.0
Co-authored-by: Arun Babu <arbabu@fb.com>
Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu>
* remove missing config entries when loading task from checkpoint (#4905)
* make apex optional (#4906)
* Add file to generate manifests for stop dataset. (#4891)
* Update STOP dataset README to include proper link. (#4892)
* Update README.md (#4893)
* using foreach to reduce kernel (#4904)
* using foreach to reduce kernel
* set reproducibility to looser threshold
* revert optimzer
* update
* update
* update
* update
* update
* update
* update
Co-authored-by: juntengjia <juntengjia@fb.com>
* Update README.md to add data2vec blog post (#4913)
* Update README.md
* Update config to fix circleci failure (#4949)
https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767
* Generative Spoken Dialogue Language Modeling Paper Open Source (#4957)
* wav2vec2_laser (#4968)
* ASR BLEU tool copied from ust branch into main (#4914)
* Add transcript option for asr-bleu (#4981)
---------
Co-authored-by: zhxchen17 <zhxchen17@outlook.com>
Co-authored-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Felix Kreuk <felixkreuk@gmail.com>
Co-authored-by: Alexei Baevski <alexei.b@gmail.com>
Co-authored-by: padentomasello <pdtomasello@gmail.com>
Co-authored-by: Junteng Jia <juntengjia@hotmail.com>
Co-authored-by: juntengjia <juntengjia@fb.com>
Co-authored-by: arbabu123 <arbabu@fb.com>
Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com>
Co-authored-by: Pierre Andrews <mortimer@fb.com>
Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu>
Co-authored-by: Xutai Ma <xutaima@gmail.com>
Summary:
Pull Request resolved: https://github.com/facebookresearch/fairseq/pull/4513
With some fixes to torchscript using dual copies.
Reland this diff.
Reviewed By: erichan1
Differential Revision: D37371293
fbshipit-source-id: 4fcfc4083955b6f5fc4ef8600f1b517b6ba69aae
Summary:
Context: https://fburl.com/7vdj7vhl
Backing out due to breaking our TorchScript test:
```
RuntimeError:
method cannot be used as a value:
File "/dev/shm/uid-30041/54641b26-seed-nspid4026533396_cgpid7154327-ns-4026533393/fairseq/modules/transformer_layer.py", line 307
self.in_proj_weight,
self.in_proj_bias,
self.self_attn.out_proj.weight,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
self.self_attn.out_proj.bias,
self.activation_relu_or_gelu == 2,
Stack trace:
Exception type: torch::jit::ErrorReport
```
https://fburl.com/sandcastle/4pzqemf5
Original commit changeset: 984266f850fc
Original Phabricator Diff: D37082681 (3a757d7ab2)
Differential Revision: D37303846
fbshipit-source-id: 1757ea5dae98be5beb4d08f70b0c3001d6ea336f
Summary:
Pull Request resolved: https://github.com/facebookresearch/fairseq/pull/4480
as titled and depends on D36057338
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.
In summary:
Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss
Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test.
With batch size=64
For V100, the speedup reaches to 1.23x
For A100, the speedup reaches to 1.38x
After enable nested tensor,
For V100, the speedup reaches to 2.46x
Reviewed By: mikekgfb
Differential Revision: D37082681
fbshipit-source-id: 984266f850fc30603e48be56e41ac2c67da080f5
Summary:
1. Add joint pre-training scripts
2. Replace prepend_tgt_lang_tag_no_change with prepend_tgt_lang_tag_as_bos
3. Add readme for the joint pre-training
4. Add test case for the Librispeech model
Reviewed By: hygong-fb
Differential Revision: D36300953
fbshipit-source-id: cb749689787ed97c1250d122bdefb7f7a2252292
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
- [x] formatting fix
- [x] optional import of xFormers
- [x] enabled doc building as part of CI
- [x] remove mask arguments for attentions that do not support them
- [x] remove masks for blocksparse tests, no longer supported
- [ ] use pytest instead of deprecated `setup.py test`
- [ ] CircleCI xFormers tests
Will submit without the last two done to unblock people using the repo
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
X-link: https://github.com/fairinternal/fairseq-py/pull/3362
Reviewed By: blefaudeux
Differential Revision: D36169572
Pulled By: dianaml0
fbshipit-source-id: 3b20ae5f377144a0854e016771af703f0d0d694b
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
## What does this PR do?
This PR is a cleaned up version of https://github.com/fairinternal/fairseq-py/issues/2138. It is based on the `main` branch instead of the `gshard` branch. Removed call to xFormers MultiHeadDispatch, only using xFormers Attention.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
X-link: https://github.com/fairinternal/fairseq-py/pull/2263
Reviewed By: blefaudeux
Differential Revision: D33800377
Pulled By: dianaml0
fbshipit-source-id: 658d52214c782212b12881b30c4d908a763b4cf2
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
X-link: https://github.com/fairinternal/fairseq-py/pull/3350
Reviewed By: shruti-bh
Differential Revision: D36009526
Pulled By: dianaml0
fbshipit-source-id: 9cdc3d53086b8d40a780bcb64cfe28108091ab98
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Pulling out some changes from https://github.com/fairinternal/fairseq-py/pull/2263 unrelated to xformers to make the PR cleaner
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
X-link: https://github.com/fairinternal/fairseq-py/pull/3068
Reviewed By: blefaudeux
Differential Revision: D34149016
Pulled By: dianaml0
fbshipit-source-id: 6442a5f451d56cc47106227298a624516b19a9ad
Summary:
# Before submitting
- [X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [X] Did you make sure to update the docs?
- [X] Did you write any new necessary tests?
## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/4300
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Big time!
Note:
I had to update `black` because of [this known issue](https://github.com/psf/black/issues/2964):
```
black....................................................................Failed
- hook id: black
- exit code: 1
Traceback (most recent call last):
File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
patch_click()
File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/4344
Reviewed By: zhengwy888
Differential Revision: D35691648
Pulled By: dianaml0
fbshipit-source-id: 4bdf408bc9d9cca76c9c08e138cf85b1d00d14d4
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3107
Reviewed By: cndn
Differential Revision: D34354339
Pulled By: sravyapopuri388
fbshipit-source-id: 50888706123d246c13d2cbb22d0e043740ff6bf5
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3065
Reviewed By: Mortimerp9
Differential Revision: D34144674
Pulled By: dianaml0
fbshipit-source-id: 842b0d29c9c85d4b56b640f2823fcb4e3f912f98
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3059
Reviewed By: kahne
Differential Revision: D34083178
Pulled By: sravyapopuri388
fbshipit-source-id: a33af1696570be4826973b19fe34177bcf851e06
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3001
Reviewed By: kahne
Differential Revision: D33904550
Pulled By: sravyapopuri388
fbshipit-source-id: f55f8121d83e5abebdfcf7ac90dcba39f65cafaf
Summary: The GPU test was broken after D33809223 (1b61bbad32)
Reviewed By: cruvadom
Differential Revision: D33931570
fbshipit-source-id: 37962a437d8e25b1dafc58db0efa55c1afa5f3ee
Summary:
This is the same as https://github.com/fairinternal/fairseq-py/issues/3003 but for main instead of gshard.
the lint test will run the latest version of black, which is 22.1.0 right now and seems to be incompatible with the 21.12b0 version that is setup in pre-commit. This means that some files were with valid format in the past, but are not anymore...
This PR formats these files with 22.1.0 and autoupdates pre-commit config to use that black version too.
(note: this is the second time it happens. a solution would be to pin the lint test to the same version as the one in the pre-commit hook and that was used to format everything clean so that we have a stable formating)
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3004
Reviewed By: dianaml0
Differential Revision: D33917490
Pulled By: Mortimerp9
fbshipit-source-id: d55e800b976f94545cdab4132daa7c45cbd0e34c
Summary: EMA broken since D33649708 (995c204337) due to indentation error.
Reviewed By: cruvadom
Differential Revision: D33809223
fbshipit-source-id: c6c4d0d327443bfea787817040e1832eef0f50e4
Summary:
Preliminaries for data2vec release, include some minor improvements and bug fixes
Most important change is that we now default to raising an exception when fields in config do not have a corresponding field in the model dataclass
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2929
Reviewed By: wnhsu
Differential Revision: D33649708
Pulled By: alexeib
fbshipit-source-id: 629bdb4c361550740b451c570c2005bb956c6fcb
Summary:
Support FFN prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--ffn-blocks-to-remove 1024". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly.
The idea of prune can be summarized as
Fine tune model (e.g. roberta encoder) on a certain datasets with regularization
After the model is trained. User could use _get_fc_rank and _prune_fc_layer functions to get the top X blocks with most importance in each transformer layer. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually.
User will fine tune the the new roberta encoder via the ckpt saved above
Reviewed By: dianaml0
Differential Revision: D33525055
fbshipit-source-id: 5087140ee891d6ec9266726e3a477947c233412c
Summary:
This is the equivalent to PR https://github.com/fairinternal/fairseq-py/issues/2697 but on top of main instead of gshard (cherry-picked and merged the squash):
* reorganize preprocess.py code a bit
* use Binarizers objects in the multiprocess code
* clean up the make_binary
* multiprocess logic
* learn to count
* format and doc string
* add basic test for vocab binarizer
* generalize to one line
* move multiprocess in binarizer
Testing:
```
python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.cherry --workers 20
python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.main --workers 20
```
```
md5sum ~/fixathon/small_vocab_test/data-bin.cherry/train.bin == md5sum ~/fixathon/small_vocab_test/data-bin.main/train.bin
```
```
diff ~/fixathon/small_vocab_test/data-bin.main/dict.txt ~/fixathon/small_vocab_test/data-bin.cherry/dict.tx
```
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2738
Reviewed By: sshleifer, dianaml0
Differential Revision: D32830875
Pulled By: Mortimerp9
fbshipit-source-id: e7463d5cdd96a877691bf39666daa319ebb3dcb8
Summary:
Support multihead attention prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--mha-heads-to-keep 8". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly.
The idea of prune can be summarized as
1. Fine tune model (e.g. roberta encoder) on a certain datasets with regularization
2. After the model is trained. User could use get_reserve_head_index and _adaptive_prune_heads functions to get the top X heads with most importance. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually.
3. User will fine tune the the new roberta encoder via the ckpt saved above
To get rid of registering different pruned version of Roberta, I use the argument --mha-heads-to-keep to prune the Roberta model into a pruned version which matches the pruned ckpt.
Reviewed By: dianaml0
Differential Revision: D32449003
fbshipit-source-id: a952fd9ad723a6dbc5c2af574c42f2e9a1fa27dc
Summary:
**This PR**
- Adds conformer layer based on https://arxiv.org/pdf/2005.08100.pdf.
- Conformer implementation supports multihead attention based on 3 different positional embedding types - absolute positional embedding, relative positional encoding and rotational positional embedding.
- Adds conformer encoder with conv1d subsampling, positional embedding followed by N conformer layers
- Adds S2T_Conformer model based on the conformer encoder and transformer decoder.
- Add conformer support in Wav2Vec2
- Add unit tests for core modules
**Verfication**
- Verified the set up on MUST-C En-De S2T, Covost2 Es-En S2T, Librispeech ASR to ensure the implementation is correct.
- For S2T setups, the performance is either similar to the transformer based models or better.
- Wav2vec2 pretraining and finetuning based on librispeech showed improvements over corresponding transformer baselines.
- [WIP] Experiment log: https://docs.google.com/document/d/1QI-ROWVenUEXPJoHTaKD85Fq7T8ZXNc8bc54MzgwJjA/edit#
**Next steps**
- Add regression tests
- Add README and open source checkpoints
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2859
Reviewed By: kahne
Differential Revision: D33434092
Pulled By: sravyapopuri388
fbshipit-source-id: 62f22b917a332481370750e04a439e05832a2282
Summary: Add test for DualInputS2TTransformerModel at examples/speech_text_joint_to_text/models/s2t_dualinputtransformer.py
Reviewed By: kahne
Differential Revision: D33284188
fbshipit-source-id: c02b697fc7734425661e00bbb606852b5d94a587
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Applied `black` and `isort` to fix failing CI
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2834
Reviewed By: vedanuj
Differential Revision: D33262876
Pulled By: dianaml0
fbshipit-source-id: 03215c276fcddda9f7c78971bf6ed7c5ac21b2ee
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Add readme and task for xglm models.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2808
Reviewed By: punitkoura
Differential Revision: D33237928
Pulled By: xianxl
fbshipit-source-id: 7773cf56e896210dab1f4311ae69f0e00c6d9aff
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
fix `black` failures
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2816
Reviewed By: alexeib
Differential Revision: D33172615
Pulled By: dianaml0
fbshipit-source-id: 36b141f42941670f1bfa981041d878042feb0428
Summary: Adding integration test (based on test set scores on pre-trained checkpoints) for fastspeech2
Reviewed By: yuntang
Differential Revision: D33143301
fbshipit-source-id: dca0841b43dd1cb2933ce5c652ed3cdff0fc4a52
Summary:
Adding the first batch of speech integration tests (based on test set scores on pre-trained checkpoints) for
- S2T transformer
- TTS transformer
Reviewed By: yuntang
Differential Revision: D33050653
fbshipit-source-id: fb5bb9f46e8e17cb705971ca1990c8e1cb99d5f9
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
reverting to fix issue mentioned [here](https://github.com/pytorch/fairseq/issues/3913). Having another PR for fixing the original issue later.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2763
Reviewed By: myleott
Differential Revision: D33000411
Pulled By: jingfeidu
fbshipit-source-id: 95a54cbdc612129a0eab4b5e6aa576a5bcf00588
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
- [x] applies flake8 fixes to main branch (https://github.com/fairinternal/fairseq-py/issues/2546) - still more to be fixed
Fix GPU tests:
- [x] when torch.ao.quantization import doesn't work use torch.quantization
- [x] build apex from earlier commit in circleci so that its compatible with pytorch 1.8 and 1.9
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2570
Reviewed By: Mortimerp9
Differential Revision: D32955312
Pulled By: dianaml0
fbshipit-source-id: e163cbd4998f171f819e31b0682c1c0f1986f9e1
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2678
Reviewed By: Mortimerp9
Differential Revision: D32653381
Pulled By: dianaml0
fbshipit-source-id: 2810d14867cd7d64f4d340740e2b590b82de47fe
Summary:
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
## What does this PR do?
SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo.
Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](https://github.com/facebookresearch/fairscale/pull/378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/3996
Reviewed By: dianaml0
Differential Revision: D32280163
Pulled By: vtantia
fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70
Summary:
CPLTaskImpl provides implementation to augment existing tasks to take additional input of ema_model in its train_step and valid_step for continous pseudo-labeling (CPL) during training. It passes this ema_model to the criterion.
See Kaizen semi-supervised training paper for more details https://arxiv.org/abs/2106.07759.
This implementation also supports using CPLDataset which enables using unsupervised data only for `cpl_finetune_epoch > epochs >= cpl_start_epoch`. CPLDataset is like MultiCorpusDataset but ignores the unsupervised datasets while sampling.
Another addition in this diff is to skip dataset in MultiCorpusDataset if the sampling probability is 0.
Reviewed By: cruvadom
Differential Revision: D30701536
fbshipit-source-id: 1d840eacfd538ed7aed3baaefc8b254390642b45
Summary:
Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759
1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint.
2. `ema.ema_start_update` to control when the EMA starts accumulating
3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False)
4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this.
```
This module has the EMA class used to store a copy of the exponentially decayed
model params.
Typical usage of EMA class involves initializing an object using an existing
model (random or from a seed model) and setting the config like ema_decay,
ema_start_update which determine how the EMA model is updated. After every
update of the model i.e. at the end of the train_step, the EMA should be updated
by passing the new model to the EMA.step function. The EMA model state dict
can be stored in the extra state under the key of "ema" and dumped
into a checkpoint and loaded. The EMA object can be passed to tasks
by setting task.uses_ema property.
EMA is a smoothed/ensemble model which might have better performance
when used for inference or further fine-tuning. EMA class has a
reverse function to load the EMA params into a model and use it
like a regular model.
```
Reviewed By: cruvadom
Differential Revision: D24238379
fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2