Commit Graph

1 Commits

Author SHA1 Message Date
Sravya Popuri
40ff55abbe conformer (#2859)
Summary:
**This PR**

- Adds conformer layer based on https://arxiv.org/pdf/2005.08100.pdf.
- Conformer implementation supports multihead attention based on 3 different positional embedding types - absolute positional embedding, relative positional encoding  and rotational positional embedding.
- Adds conformer encoder with conv1d subsampling, positional embedding followed by N conformer layers
- Adds S2T_Conformer model based on the conformer encoder and transformer decoder.
- Add conformer support in Wav2Vec2
- Add unit tests for core modules

**Verfication**

- Verified the set up on MUST-C En-De S2T, Covost2 Es-En S2T, Librispeech ASR to ensure the implementation is correct.
- For S2T setups, the performance is either similar to the transformer based models or better.
- Wav2vec2 pretraining and finetuning based on librispeech showed improvements over corresponding transformer baselines.
- [WIP] Experiment log: https://docs.google.com/document/d/1QI-ROWVenUEXPJoHTaKD85Fq7T8ZXNc8bc54MzgwJjA/edit#

**Next steps**
- Add regression tests
- Add README and open source checkpoints

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2859

Reviewed By: kahne

Differential Revision: D33434092

Pulled By: sravyapopuri388

fbshipit-source-id: 62f22b917a332481370750e04a439e05832a2282
2022-01-10 16:18:38 -08:00