Commit Graph

2148 Commits

Author SHA1 Message Date
Gor Arakelyan
06c65c8297 Add Aim support for logging (#4311)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Enables logging of params and metrics with Aim. Aim is an open-source experiment tracker - https://github.com/aimhubio/aim

1. Added two arguments to CommonConfig:
- aim_repo: defines Aim repository location, can be set to remote URL as well(i.e. `aim://<ip>:<port>`)
- aim_run_hash: defines run hash. If skipped, run will be created or continued based on `save_dir` argument. If there is an existing run which has the same `save_dir`, it will be reopened/continued, otherwise a new run will be created.

2. Implemented AimProgressBarWrapper class to handle logging

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4311

Reviewed By: ArmenAg

Differential Revision: D35177412

Pulled By: dianaml0

fbshipit-source-id: 287afe3a77e1048e497a4e1bdc42efd46ec9c2fe
2022-03-29 10:38:10 -07:00
Diana Liskovich
7d72f28db5 formatting fix (#4313)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4313

Reviewed By: shruti-bh

Differential Revision: D35200613

Pulled By: dianaml0

fbshipit-source-id: c011f89f4a7ee9404bec61728b52fcea8640d292
2022-03-29 07:20:46 -07:00
Angela Fan
54ea689ac5 adding readme (#4314)
Summary:
adding for generating biographies paper

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4314

Reviewed By: edunov

Differential Revision: D35205567

Pulled By: huihuifan

fbshipit-source-id: 7698672dcffbdb8a10bfea4f72920e1f508a4104
2022-03-29 07:06:16 -07:00
Diana Liskovich
fef5006caa formatting fix (#4310)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fix issue with `black` causing build error.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4310

Reviewed By: shruti-bh

Differential Revision: D35151101

Pulled By: dianaml0

fbshipit-source-id: 63d80b848fdd3c004d784add3bf74e4c5281e952
2022-03-28 15:13:02 -07:00
Ann Lee
c8a8e2c392 release pre-trained models (#3245)
Summary:
Releasing pre-trained mHuBERT, vocoder, speech normalizer for the paper "Textless Speech-to-Speech Translation on Real Data"

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/3245

Reviewed By: sravyapopuri388

Differential Revision: D35135891

Pulled By: an918tw

fbshipit-source-id: 96e0a6354dc61d5cbfce9943893bebadfb21b642
2022-03-28 13:26:07 -07:00
Changhan Wang
52658402c5 add CTC auxiliary loss to S2T Transformer
Summary: Add CTC auxiliary loss to S2T Transformer

Reviewed By: sravyapopuri388

Differential Revision: D33305481

fbshipit-source-id: d866a924e39beb03a2f8a59f7051b6c81980ad35
2022-03-25 18:01:44 -07:00
Vineel Pratap
e9b89525b5 Fix an indentation issue for decoder sweep config
Summary:
As per title

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: arbabu123

Differential Revision: D35151134

fbshipit-source-id: bb97ae583542c8e7983b9d9042d8a3084b8fbef5
2022-03-25 16:18:47 -07:00
Sravya Popuri
7b9118bd93 Open source code for "Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation" (#3233)
Summary:
OSS "Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation" paper code
- Update xm_transformer to add a new arguments called encoder_proj (which ensures the encoder embedding dim and decoder embedding dim are matched) and max_positions (related to embedding size of conformer).
- Add documentation and pretrained models related to the paper

X-link: https://github.com/fairinternal/fairseq-py/pull/3233

Reviewed By: pipibjc

Differential Revision: D35119604

Pulled By: sravyapopuri388

fbshipit-source-id: bbe517c4803c5808f8cce0e5d16cf5ffa96f425c
2022-03-25 11:52:07 -07:00
Wei Ho
f71c03fba8 Don't fsdp_wrap transformer encoder & decoder
Summary:
Per anj-s 's suggestion - this seems to fix the
```
      assert len(self.flat_params) == 1, "Incorrect access to flat_param"
  AssertionError: Incorrect access to flat_param
```
error when training transformer models w/ large number of params

~~(not sure why the number of params affect fairscale FSDP wrapping???)~~ Did this maybe only manifest when the encoder/decoder individually had > 1e8 params due to the default of `min_params_to_wrap`?

Looking at D26771144 (656d7e5779) & https://github.com/fairinternal/fairseq-py/pull/1667 where this code was added - it's unclear why wrapping was specifically necessary when share_all_embeddings=False? Is it OK to just delete this code?

(And did the gshard model avoid this issue b/c it used share_all_embeddings=True?)

Reviewed By: huihuifan

Differential Revision: D35084649

fbshipit-source-id: ad5b394c9920e3bea2767a0771f6de36aecb3687
2022-03-24 11:33:07 -07:00
Hongyu Gong
b554f5ec90 replace "prepend-tgt-lang-tag" with "prepend-tgt-lang-tag-as-bos" to avoid confusion
Summary: Replace "prepend-tgt-lang-tag" with "prepend-tgt-lang-tag-as-bos" in s2s data loading and s2s task.

Reviewed By: yuntang

Differential Revision: D34912239

fbshipit-source-id: 654d0eafafc275be6c2470b08a323f57a4f9b9cb
2022-03-15 22:55:21 -07:00
Hongyu Gong
f9d07a9209 support tgt_lang_tag in speech-to-speech (#3187)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Support tgt-lang-tag in speech-to-speech task.
1. If we set prepend_tgt_lang_tag: true, a dictionary with units and lang tags would be loaded from vocab_filename; otherwise, a dictionary is created with units only in setup_task.
2. prepend_tgt_lang_tag would add the target language token to the beginning of prev_output_tokens during data loading.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/3187

Reviewed By: yuntang

Differential Revision: D34768755

Pulled By: hygong-fb

fbshipit-source-id: fa395c3319907221f95333283689671b194f3ccc
2022-03-14 09:06:49 -07:00
Sravya Popuri
d03f4e7714 Minor fixes (#3198)
Summary:
- Fix error introduced in e55e094b96 in the case where net_input doesn't have prev_output_tokens key
- Fix typo in covost README.

X-link: https://github.com/fairinternal/fairseq-py/pull/3198

Reviewed By: cndn, kahne

Differential Revision: D34810092

Pulled By: sravyapopuri388

fbshipit-source-id: 9be6e6f06586cd2a2d44415ebf7c3596a5334b81
2022-03-11 09:23:12 -08:00
Wei Ho
0f078de343 Re-land D34058196 [sacrebleu==2.0.0] buckification [Back out D34503161]
Reviewed By: shreyanb98

Differential Revision: D34541824

fbshipit-source-id: 1dffc28bca971310920e1b1fdfe4016cc1aa1ceb
2022-03-07 17:04:45 -08:00
Dmitry Vinnik
592c1227f4 docs: add social button in support of Ukraine (#4249)
Summary:
Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4249

Reviewed By: arbabu123

Differential Revision: D34635479

Pulled By: dmitryvinn-fb

fbshipit-source-id: 488d30f0967ae9542ead968c5cb951ecf0e02a64
2022-03-04 16:28:09 -08:00
Wei-Ning Hsu
2e0b961a0e fix import_user_module (#3144)
Summary:
## What does this PR do?
Avoid throwing ValueError when attempting to load a user defined module from common.user_dir that has the same module name and same module path as some loaded module. This occurs when a job is preempted and restarts using submitit_slurm

X-link: https://github.com/fairinternal/fairseq-py/pull/3144

Reviewed By: Abdel-rahmanMohamed

Differential Revision: D34521450

Pulled By: wnhsu

fbshipit-source-id: eed00d4238a66dc524eee400a55ad2c011e1543c
2022-03-02 23:58:52 -08:00
Ann Lee
1479d311d5 update s2st vocoder training instructions (#3156)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
release training instructions for unit-based HiFi-GAN vocoder with duration prediction

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/3156

Reviewed By: sravyapopuri388

Differential Revision: D34582951

Pulled By: an918tw

fbshipit-source-id: 2e575fb15aa8cd5444272c3c31426ac64da84e97
2022-03-02 12:19:20 -08:00
Igor Shalyminov
e55e094b96 AddTargetDataset now first adds EOS then pads target sequences (#4243)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

https://groups.google.com/g/fairseq-users/c/YoSm5J2To1A

- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/4242

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4243

Reviewed By: arbabu123

Differential Revision: D34538164

Pulled By: alexeib

fbshipit-source-id: cf2fdaa7663bee34571fb3d3bd9bdaf79d756206
2022-02-28 20:43:50 -08:00
Hetarth Chopra
a24fdf2d1b Fixing and error related to Floor Division (#4221)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/4058
While using the library the following warnings are shown which sometimes hinder the workflow. The warnings are

`<USER_PATH>/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  beams_buf = indices_buf // vocab_size`

`<USER_PATH>/fairseq/sequence_generator.py:666: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  unfin_idx = bbsz_idx // beam_size`

The methodology was simple, instead of using the `//`, it was replaced by `torch.div(arg1, arg2, rounding_mode='trunc')` and the variable alues do not change for both before and after, just the warning is resolved.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Yes, I did! Thanks!

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4221

Reviewed By: arbabu123

Differential Revision: D34538147

Pulled By: alexeib

fbshipit-source-id: 143897a249129a163b6a30ba9b5cf5595ef42330
2022-02-28 20:21:53 -08:00
Sravya Popuri
d421749323 Add s2s_conformer model to support conformer encoder in S2UT model (#3113)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/3113

Reviewed By: an918tw, kahne

Differential Revision: D34365606

Pulled By: sravyapopuri388

fbshipit-source-id: aa4f0ab24ca191101b9eca0f5e08dcbedf9fadbb
2022-02-28 09:49:11 -08:00
Igor Shalyminov
41847528fb Best metric is now only logged for the first of all the validation subsets (#4180)
Summary:
Best metric is now only logged for the first of all the validation subsets

# Before submitting

- [x ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  https://groups.google.com/g/fairseq-users/c/7nk3rJmvlg8
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/4162

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4180

Reviewed By: michaelauli

Differential Revision: D34365416

Pulled By: alexeib

fbshipit-source-id: 872f77da2cbf064ed838ebc7959365b0b33fe723
2022-02-25 14:29:43 -08:00
spopuri
5175fd5c26 update readme for conformer based models (#3104)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

X-link: https://github.com/fairinternal/fairseq-py/pull/3104

Reviewed By: kahne

Differential Revision: D34323889

Pulled By: sravyapopuri388

fbshipit-source-id: da7216bc5918fd0e57e10395044088a555af2e07
2022-02-23 15:49:12 -08:00
eugene-kharitonov
0c0ef06780 Prosody-aware Generative Spoken Language Modelling (#3063)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3063

Reviewed By: eugene-kharitonov

Differential Revision: D34323605

Pulled By: wnhsu

fbshipit-source-id: 9dc779a6c399cda710863596e0880b9277ff2919
2022-02-23 00:30:22 -08:00
spopuri
420136acd2 fix failing convtransformer test (#3107)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3107

Reviewed By: cndn

Differential Revision: D34354339

Pulled By: sravyapopuri388

fbshipit-source-id: 50888706123d246c13d2cbb22d0e043740ff6bf5
2022-02-22 11:24:11 -08:00
Sravya Popuri
5b87224417 Open source conformer models and update documentation
Summary: TSIA

Reviewed By: kahne

Differential Revision: D34115270

fbshipit-source-id: aa5a226dae4539afc0aed9b7d43ba1fa2e40ae70
2022-02-17 10:34:41 -08:00
Sravya Popuri
67eaecd2fc Add regression test for SimulConvTransformerModel (#3031)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3031

Reviewed By: kahne

Differential Revision: D34018108

Pulled By: sravyapopuri388

fbshipit-source-id: 4db96653658a998b15c0cdbc2e588198d951a420
2022-02-16 09:32:21 -08:00
Victoria Lin
cfc4d8475c add missing transformer arch and update PadDataset (#4212)
Summary:
Fix issue https://github.com/pytorch/fairseq/issues/4209 #4210

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4212

Reviewed By: sshleifer

Differential Revision: D34208212

Pulled By: todpole3

fbshipit-source-id: 64a4777b8721b692ad339df0fc0495d823d58c07
2022-02-16 05:26:11 -08:00
dianaml0
5f2515e676 Fix failing test (#3065)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3065

Reviewed By: Mortimerp9

Differential Revision: D34144674

Pulled By: dianaml0

fbshipit-source-id: 842b0d29c9c85d4b56b640f2823fcb4e3f912f98
2022-02-10 12:17:47 -08:00
Alban Desmaison
5551a1995b Change ParameterList and ParameterDict to be able to contain any kind of objects (#70499)
Summary:
The only difference with plain list/dict now is that nn.Parameters are
handled specially and registered as parameters properly.

test_nn and parametrization works locally.
Will see in CI if DP is fixed as well.

Tentative fix for https://github.com/pytorch/pytorch/issues/36035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70499

Reviewed By: jbschlosser, alexeib

Differential Revision: D34005332

Pulled By: albanD

fbshipit-source-id: 7e76b0873d0fec345cb537e2a6ecba0258e662b9
2022-02-09 10:47:56 -08:00
Sravya Popuri
8b02f00e8a fix s2s test - disable multitasking by setting multitask_config_yaml to None (#3059)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3059

Reviewed By: kahne

Differential Revision: D34083178

Pulled By: sravyapopuri388

fbshipit-source-id: a33af1696570be4826973b19fe34177bcf851e06
2022-02-09 10:05:22 -08:00
alexeib
327cff24a5 Create a separate EMA implementation for in-model tracking (#3036)
Summary:
ema.py initially used by data2vec was actually created for trainer-level ema tracking

since data2vec creates and uses ema tracking within the model, we will split ema into a different module-level implementation

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3036

Reviewed By: wnhsu

Differential Revision: D34034479

Pulled By: alexeib

fbshipit-source-id: f8c65552d446f1104c36380f5d1ff22a75e6e405
2022-02-07 15:38:52 -08:00
Sravya Popuri
11b2830d29 Refactor speech tests and add missing regression tests (#3001)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3001

Reviewed By: kahne

Differential Revision: D33904550

Pulled By: sravyapopuri388

fbshipit-source-id: f55f8121d83e5abebdfcf7ac90dcba39f65cafaf
2022-02-04 14:35:02 -08:00
Vimal Manohar
6b7a7d6457 Fix EMA GPU test
Summary: The GPU test was broken after D33809223 (1b61bbad32)

Reviewed By: cruvadom

Differential Revision: D33931570

fbshipit-source-id: 37962a437d8e25b1dafc58db0efa55c1afa5f3ee
2022-02-04 09:10:06 -08:00
Changhan Wang
53cc55c9c8 add BERTScore scorer
Summary: add BERTScore scorer

Reviewed By: yuntang

Differential Revision: D33881724

fbshipit-source-id: 89f7f7b71a9def28cd8b0366f540e445e74efabb
2022-02-03 14:39:09 -08:00
Wei-Ning Hsu
272c4c5197 Fix hubert (#3019)
Summary:
## PR review
1. Update HuBERT to work with the TransformerEncoder wav2vec2.py
2. Remove dictionary loading issue when loading fine-tuned HuBERT checkpoints to make the checkpoints self-contained
3. Add unit-test for HuBERT fine-tuned checkpoints
4. Avoid divide-by-zero error in infer.py when inference time is zero (e.g., when inferring just one utterance)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3019

Reviewed By: andrewyeh

Differential Revision: D33970620

Pulled By: wnhsu

fbshipit-source-id: c523dd6ddb0f6a496be8b0b4b56f0c32c1d3dbc5
2022-02-03 10:17:10 -08:00
Pierre Andrews
f591cc94ca upgrade black for lints (#3004)
Summary:
This is the same as https://github.com/fairinternal/fairseq-py/issues/3003 but for main instead of gshard.

the lint test will run the latest version of black, which is 22.1.0 right now and seems to be incompatible with the 21.12b0 version that is setup in pre-commit. This means that some files were with valid format in the past, but are not anymore...

This PR formats these files with 22.1.0 and autoupdates pre-commit config to use that black version too.

(note: this is the second time it happens. a solution would be to pin the lint test to the same version as the one in the pre-commit hook and that was used to format everything clean so that we have a stable formating)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3004

Reviewed By: dianaml0

Differential Revision: D33917490

Pulled By: Mortimerp9

fbshipit-source-id: d55e800b976f94545cdab4132daa7c45cbd0e34c
2022-02-02 04:31:33 -08:00
Wei-Ning Hsu
5d2be954bb add defaults again after importing user_module (#3007)
Summary:
## What does this PR do?
Default values for the configs imported from `user_dir` was not added properly.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3007

Reviewed By: alexeib

Differential Revision: D33926315

Pulled By: wnhsu

fbshipit-source-id: 914eecec769964686342d66c96d6ba76f12e1277
2022-02-01 21:42:15 -08:00
Victoria X Lin
6b770134a2 Add citation details and other wording fixes to model card (#4172)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4172

Reviewed By: punitkoura

Differential Revision: D33911169

Pulled By: todpole3

fbshipit-source-id: d3e111ab4b9a646e1799ad9335c70ec1ee8d25a4
2022-02-01 10:57:35 -08:00
Victoria X Lin
790f3be15a Add XGLM pre-training data format explaination (#4158)
Summary:
1. Add XGLM pre-training data format explanation
2. Add back pointer to pre-print

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4158

Reviewed By: xianxl

Differential Revision: D33825440

Pulled By: todpole3

fbshipit-source-id: 379aa55d55ef3c9016987d1f05de023b7a7aee04
2022-02-01 10:30:07 -08:00
Changhan Wang
d839d84f1e Miscellaneous S2T & S2 bug fixes
Summary: Miscellaneous S2T & S2 bug fixes

Reviewed By: yuntang

Differential Revision: D33469556

fbshipit-source-id: 430c2cad01dd7ea862a6c1564ad609887d66b788
2022-01-31 20:44:43 -08:00
Vimal Manohar
1b61bbad32 Fix broken EMA in fairseq
Summary: EMA broken since D33649708 (995c204337) due to indentation error.

Reviewed By: cruvadom

Differential Revision: D33809223

fbshipit-source-id: c6c4d0d327443bfea787817040e1832eef0f50e4
2022-01-27 13:02:58 -08:00
Wei-Ning Hsu
4a7835b794 Hubert unit test (#2766)
Summary:
## What does this PR do?
- Add unit test for HuBERT
- update model arg to comply with wav2vec to TranformerEncoder

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2766

Reviewed By: Abdel-rahmanMohamed

Differential Revision: D32965218

Pulled By: wnhsu

fbshipit-source-id: 036a1644179c35b875c9ba30d75b4ef039fb328f
2022-01-24 16:24:50 -08:00
Victoria X Lin
509e83e432 Add XGLM downstream task evaluation examples (#4154)
Summary:
1. Add XGLM downstream task evaluation examples
2. Add bibtex citation of XGLM arXiv paper

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4154

Reviewed By: xianxl

Differential Revision: D33748846

Pulled By: todpole3

fbshipit-source-id: ce4dfce2fccf92742f124f12a0d9a388280320fa
2022-01-24 16:24:47 -08:00
Tony Bruguier
5fd38e3d5b Fix breakage from D33649708
Summary: https://www.internalfb.com/diff/D33649708 (995c204337)?src_version_fbid=1030479880843010&dst_version_fbid=247617347518523&transaction_fbid=1601081576900014

Reviewed By: alexeib

Differential Revision: D33696937

fbshipit-source-id: 9a17610e3f4eb3dd2b2131a3f9fb42732a31b47f
2022-01-21 09:27:49 -08:00
alexeib
fc758bbf79 fix readme (#2939)
Summary:
minor fix

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2939

Reviewed By: michaelauli

Differential Revision: D33685330

Pulled By: alexeib

fbshipit-source-id: 4d6c6edb1fab9d0d56a6e03c0a2b43a864f1d07a
2022-01-20 08:48:41 -08:00
alexeib
c71870f370 Data2vec (#2936)
Summary:
new data2vec models

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2936

Reviewed By: jacobkahn

Differential Revision: D33674643

Pulled By: alexeib

fbshipit-source-id: 2c2b4fae541974587b50a78a44d34033e9b5192d
2022-01-20 08:28:47 -08:00
alexeib
995c204337 Data2vec prelim (#2929)
Summary:
Preliminaries for data2vec release, include some minor improvements and bug fixes

Most important change is that we now default to raising an exception when fields in config do not have a corresponding field in the model dataclass

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2929

Reviewed By: wnhsu

Differential Revision: D33649708

Pulled By: alexeib

fbshipit-source-id: 629bdb4c361550740b451c570c2005bb956c6fcb
2022-01-20 00:02:16 -08:00
Hongyu Gong
a59cea5944 attn head selection
Summary:
Add scripts for multihead attention selection in multilingual and multil-domain training from the following paper:
"Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling", NeurIPS 2021.

Reviewed By: yuntang

Differential Revision: D31802221

fbshipit-source-id: 8c69b89bda29e6857bd3af02979c07e1b5cf49f1
2022-01-18 21:15:34 -08:00
Vimal Manohar
a075481d0d Decode using EMA model in IPL recipe
Summary: Add option to use the EMA model for decoding in transducer IPL recipe by passing --ipl-decode-ema. Note EMA should be enabled as in the diff D24238379 (8feccf9441) using options --store-ema --ema-start-update and --ema-decay.

Reviewed By: cruvadom

Differential Revision: D31983366

fbshipit-source-id: 2bf63b3f7d1b5fa8804b3a7e9bfab71a463ca957
2022-01-18 19:29:52 -08:00
Hongyu Gong
40eb7310be Code cleanup
Summary:
Add scripts for multihead attention selection in multilingual and multil-domain training from the following paper:
"Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling", NeurIPS 2021.

Reviewed By: yuntang

Differential Revision: D31781212

fbshipit-source-id: 8e1a596826f682f80730c251ec31c68df0de6516
2022-01-18 16:50:41 -08:00
Liang Tan
1575f30dd0 Add ffn prune to fairseq
Summary:
Support FFN prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--ffn-blocks-to-remove 1024". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly.
The idea of prune can be summarized as
Fine tune model (e.g. roberta encoder) on a certain datasets with regularization
After the model is trained. User could use _get_fc_rank and _prune_fc_layer functions to get the top X blocks with most importance in each transformer layer. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually.
User will fine tune the the new roberta encoder via the ckpt saved above

Reviewed By: dianaml0

Differential Revision: D33525055

fbshipit-source-id: 5087140ee891d6ec9266726e3a477947c233412c
2022-01-14 16:26:59 -08:00