Commit Graph

11 Commits

Author SHA1 Message Date
Changhan Wang
d839d84f1e Miscellaneous S2T & S2 bug fixes
Summary: Miscellaneous S2T & S2 bug fixes

Reviewed By: yuntang

Differential Revision: D33469556

fbshipit-source-id: 430c2cad01dd7ea862a6c1564ad609887d66b788
2022-01-31 20:44:43 -08:00
Gerard I. Gállego
98ebe4f1ad Fix bugs in MuST-C preprocessing (#3887)
Summary:
# Before submitting

- [X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/3882
Fixes https://github.com/pytorch/fairseq/issues/3884

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3887

Reviewed By: yuntang

Differential Revision: D33152073

Pulled By: kahne

fbshipit-source-id: 7f5c90a9876320e7c5c406ed032681452c7c5056
2021-12-21 18:18:16 -08:00
Juan Miguel Pino
72bb4447d7 Bug fix for speech translation data preparation (#3921)
Summary:
Bug introduced in d974c709bf I believe.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3921

Reviewed By: kahne

Differential Revision: D31296530

Pulled By: jmp84

fbshipit-source-id: cd24728ef06575853579496a9062c3dbd5dd2e93
2021-10-01 19:21:17 -07:00
Changhan Wang
d974c709bf update S2T
Summary: [fairseq-py] update S2T

Reviewed By: wnhsu

Differential Revision: D30720434

fbshipit-source-id: dc4e46b0cc3dec24943baeabe59424dabd5be38f
2021-09-12 22:22:09 -07:00
Xutai Ma
dd74992d0d Several updates for simul speech transition example (#1703)
Summary:
Fix sever issues in simul speech transition example, including
- Load pretrained encoder with when loading model.
- Generating broken config.yaml when using gcvm.
- Fix the preprocessed databin.
- Fix some errors in the instructions.
- Add detailed instructions on evaluation a pretrained model.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1703

Reviewed By: jmp84

Differential Revision: D27071600

Pulled By: xutaima

fbshipit-source-id: bfe72005190d7936caeef4f805bd99c8d2cf9c37
2021-03-15 23:45:49 -07:00
Changhan Wang
05255f9641 update audio_utils and fix mTEDx example
Summary:
update audio_utils and fix mTEDx example
- Updated `audio_utils`
  - Added support for OGG Vorbis (the only supported lossy compressed format)
  - Added a separate `convert_to_mono()` helper function
  - Updated `get_waveform()`
    - added new arguments `frames` and `start` for reading part of audios
    - added new argument `mono` for auto conversion to mono-channel audio
    - unified returned waveform shape to channels x length (same as torchaudio default)
- Updated mTEDx and MUST-C data prep scripts
  - Replaced `torchaudio.info()` with `soundfile.info()` (the latter is faster and the former has incompatible interface between <0.8 and the latest 0.8)
  - Replaced `torchaudio.load()` with `get_waveform` for auto conversion to mono channel

Reviewed By: jmp84

Differential Revision: D26901114

fbshipit-source-id: fa9560c9714d51a91157d5141564574d4eee454d
2021-03-09 16:27:58 -08:00
Xutai Ma
12e21b9a6e Add global cmvn for mustc data preparation (#1660)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1660

Reviewed By: jmp84, kahne

Differential Revision: D26708521

Pulled By: xutaima

fbshipit-source-id: c53e9052298c559706ceffeb359dadfede2f1a09
2021-03-02 13:30:25 -08:00
Changhan Wang
a9f5741f58 update S2T examples and small fixes for S2T
Summary:
- Update S2T examples: documentation (rendered version: https://github.com/fairinternal/fairseq-py/tree/v2_fairseq_s2t/examples/speech_to_text), bug fixes and pre-trained models
- Revert `--share-decoder-input-output-embed`'s default value to `False` (for s2t_transformer)

Reviewed By: yuntang

Differential Revision: D25821616

fbshipit-source-id: 3dba2eb5566bff39305d0056daf1b9f5adf1a926
2021-01-08 15:27:07 -08:00
Changhan Wang
ee450dde19 S2T multilingual example + bug fix
Summary:
* S2T multilingual example on MuST-C
* A bug fix for `speech_to_text_dataset` (for multilingual setting)

Reviewed By: jmp84

Differential Revision: D24339394

fbshipit-source-id: ef0c0be08137884897b532e45ebc56551d20be48
2020-10-21 08:10:47 -07:00
Myle Ott
a48f235636 Apply black+isort (#1357)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357

Reviewed By: alexeib

Differential Revision: D24377772

fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c
2020-10-18 18:14:51 -07:00
Changhan Wang
1d1c145387 speech-to-text OSS
Summary:
Imported from https://github.com/fairinternal/fairseq-py/pull/1284. Updated according to PR comments.

Main changes:
* New task: `fairseq.tasks.speech_to_text`
  * Multilingual support: multiple train sub-splits, temperature-based sampling, language ID tokens
* New dataset: `fairseq.data.audio.speech_to_text_dataset`
* Added accuracy metrics and BOS prefix removal to label smoothed cross entropy
* New models: Transformer (`fairseq.models.speech_to_text.s2t_transformer`) and BLSTM (`fairseq.models.speech_to_text.berard`)
* Extended scorers:
  * Added a base scorer class: `fairseq.scorers.BaseScorer` (the parent class for all scorers except the BLEU scorer in CPP)
  * Added an evaluation tokenizer: `fairseq.scorers.eval_tokenizer` which leverages sacreBLEU's built-in tokenizers and allows character-level tokenization as well as punctuation removal (for WER scoring).
  * Added chrF scorer: `fairseq.scorers.chrf`
* Online Mel-filter bank speech feature extraction (via CPP-based pyKaldi or Python-based TorchAudio): `fairseq.data.audio.audio_utils`
* Online speech feature transforms: `fairseq.data.audio.feature_transforms.*`
* Fixed the subsampled sequence lengths in VGGTransformer (`examples.speech_recognition.models.vggtransformer`)
* Examples under `examples/speech_to_text`:
  * LibriSpeech (ASR): better results than VGGTransformer with smaller Transformer-based models
  * MuST-C (ST): comparable to [SOTA results](https://arxiv.org/pdf/2004.10234.pdf) but with less tricks

Reviewed By: jmp84

Differential Revision: D24065273

fbshipit-source-id: 5f842ca9c826f92d4af660705611885fe440a9ab
2020-10-14 12:30:05 -07:00