fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-21 06:13:31 +03:00

Author	SHA1	Message	Date
Changhan Wang	d839d84f1e	Miscellaneous S2T & S2 bug fixes Summary: Miscellaneous S2T & S2 bug fixes Reviewed By: yuntang Differential Revision: D33469556 fbshipit-source-id: 430c2cad01dd7ea862a6c1564ad609887d66b788	2022-01-31 20:44:43 -08:00
Gerard I. Gállego	98ebe4f1ad	Fix bugs in MuST-C preprocessing (#3887 ) Summary: # Before submitting - [X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/3882 Fixes https://github.com/pytorch/fairseq/issues/3884 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3887 Reviewed By: yuntang Differential Revision: D33152073 Pulled By: kahne fbshipit-source-id: 7f5c90a9876320e7c5c406ed032681452c7c5056	2021-12-21 18:18:16 -08:00
Juan Miguel Pino	72bb4447d7	Bug fix for speech translation data preparation (#3921 ) Summary: Bug introduced in `d974c709bf` I believe. Pull Request resolved: https://github.com/pytorch/fairseq/pull/3921 Reviewed By: kahne Differential Revision: D31296530 Pulled By: jmp84 fbshipit-source-id: cd24728ef06575853579496a9062c3dbd5dd2e93	2021-10-01 19:21:17 -07:00
Changhan Wang	d974c709bf	update S2T Summary: [fairseq-py] update S2T Reviewed By: wnhsu Differential Revision: D30720434 fbshipit-source-id: dc4e46b0cc3dec24943baeabe59424dabd5be38f	2021-09-12 22:22:09 -07:00
Xutai Ma	dd74992d0d	Several updates for simul speech transition example (#1703 ) Summary: Fix sever issues in simul speech transition example, including - Load pretrained encoder with when loading model. - Generating broken config.yaml when using gcvm. - Fix the preprocessed databin. - Fix some errors in the instructions. - Add detailed instructions on evaluation a pretrained model. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1703 Reviewed By: jmp84 Differential Revision: D27071600 Pulled By: xutaima fbshipit-source-id: bfe72005190d7936caeef4f805bd99c8d2cf9c37	2021-03-15 23:45:49 -07:00
Changhan Wang	05255f9641	update audio_utils and fix mTEDx example Summary: update audio_utils and fix mTEDx example - Updated `audio_utils` - Added support for OGG Vorbis (the only supported lossy compressed format) - Added a separate `convert_to_mono()` helper function - Updated `get_waveform()` - added new arguments `frames` and `start` for reading part of audios - added new argument `mono` for auto conversion to mono-channel audio - unified returned waveform shape to channels x length (same as torchaudio default) - Updated mTEDx and MUST-C data prep scripts - Replaced `torchaudio.info()` with `soundfile.info()` (the latter is faster and the former has incompatible interface between <0.8 and the latest 0.8) - Replaced `torchaudio.load()` with `get_waveform` for auto conversion to mono channel Reviewed By: jmp84 Differential Revision: D26901114 fbshipit-source-id: fa9560c9714d51a91157d5141564574d4eee454d	2021-03-09 16:27:58 -08:00
Xutai Ma	12e21b9a6e	Add global cmvn for mustc data preparation (#1660 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1660 Reviewed By: jmp84, kahne Differential Revision: D26708521 Pulled By: xutaima fbshipit-source-id: c53e9052298c559706ceffeb359dadfede2f1a09	2021-03-02 13:30:25 -08:00
Changhan Wang	a9f5741f58	update S2T examples and small fixes for S2T Summary: - Update S2T examples: documentation (rendered version: https://github.com/fairinternal/fairseq-py/tree/v2_fairseq_s2t/examples/speech_to_text), bug fixes and pre-trained models - Revert `--share-decoder-input-output-embed`'s default value to `False` (for s2t_transformer) Reviewed By: yuntang Differential Revision: D25821616 fbshipit-source-id: 3dba2eb5566bff39305d0056daf1b9f5adf1a926	2021-01-08 15:27:07 -08:00
Changhan Wang	ee450dde19	S2T multilingual example + bug fix Summary: * S2T multilingual example on MuST-C * A bug fix for `speech_to_text_dataset` (for multilingual setting) Reviewed By: jmp84 Differential Revision: D24339394 fbshipit-source-id: ef0c0be08137884897b532e45ebc56551d20be48	2020-10-21 08:10:47 -07:00
Myle Ott	a48f235636	Apply black+isort (#1357 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357 Reviewed By: alexeib Differential Revision: D24377772 fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c	2020-10-18 18:14:51 -07:00
Changhan Wang	1d1c145387	speech-to-text OSS Summary: Imported from https://github.com/fairinternal/fairseq-py/pull/1284. Updated according to PR comments. Main changes: * New task: `fairseq.tasks.speech_to_text` * Multilingual support: multiple train sub-splits, temperature-based sampling, language ID tokens * New dataset: `fairseq.data.audio.speech_to_text_dataset` * Added accuracy metrics and BOS prefix removal to label smoothed cross entropy * New models: Transformer (`fairseq.models.speech_to_text.s2t_transformer`) and BLSTM (`fairseq.models.speech_to_text.berard`) * Extended scorers: * Added a base scorer class: `fairseq.scorers.BaseScorer` (the parent class for all scorers except the BLEU scorer in CPP) * Added an evaluation tokenizer: `fairseq.scorers.eval_tokenizer` which leverages sacreBLEU's built-in tokenizers and allows character-level tokenization as well as punctuation removal (for WER scoring). * Added chrF scorer: `fairseq.scorers.chrf` * Online Mel-filter bank speech feature extraction (via CPP-based pyKaldi or Python-based TorchAudio): `fairseq.data.audio.audio_utils` * Online speech feature transforms: `fairseq.data.audio.feature_transforms.` Fixed the subsampled sequence lengths in VGGTransformer (`examples.speech_recognition.models.vggtransformer`) * Examples under `examples/speech_to_text`: * LibriSpeech (ASR): better results than VGGTransformer with smaller Transformer-based models * MuST-C (ST): comparable to [SOTA results](https://arxiv.org/pdf/2004.10234.pdf) but with less tricks Reviewed By: jmp84 Differential Revision: D24065273 fbshipit-source-id: 5f842ca9c826f92d4af660705611885fe440a9ab	2020-10-14 12:30:05 -07:00

11 Commits