fairseq/examples/audio_nlp/nlu
2022-12-14 09:54:14 -08:00
..
configs STOP Dataset Release and Experiment Reproduction (#4578) 2022-07-18 15:47:30 -07:00
create_dict_stop.sh Fix Linting Errors (#4611) 2022-07-26 17:25:47 -04:00
generate_manifests.py Add file to generate manifests for stop dataset. (#4891) 2022-12-14 09:53:29 -08:00
README.md Update README.md (#4893) 2022-12-14 09:54:14 -08:00

End-to-end NLU

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device.

This page releases the code for reproducing the results in STOP: A dataset for Spoken Task Oriented Semantic Parsing

The dataset can be downloaded here: download link

The low-resource splits can be downloaded here: download link

Pretrained models end-to-end NLU Models

Speech Pretraining ASR Pretraining Test EM Accuracy Tesst EM-Tree Accuracy Link
None None 36.54 57.01 link
Wav2Vec None 68.05 82.53 link
HuBERT None 68.40 82.85 link
Wav2Vec STOP 68.70 82.78 link
HuBERT STOP 69.23 82.87 link
Wav2Vec Librispeech 68.47 82.49 link
HuBERT Librispeech 68.70 82.78 link

Pretrained models ASR Models

Speech Pre-training ASR Dataset STOP Eval WER STOP Test WER dev_other WER dev_clean WER test_clean WER test_other WER Link
HuBERT Librispeech 8.47 2.99 3.25 8.06 25.68 26.19 link
Wav2Vec Librispeech 9.215 3.204 3.334 9.006 27.257 27.588 link
HuBERT STOP 46.31 31.30 31.52 47.16 4.29 4.26 link
Wav2Vec STOP 43.103 27.833 28.479 28.479 4.679 4.667 link
HuBERT Librispeech + STOP 9.015 3.211 3.372 8.635 5.133 5.056 link
Wav2Vec Librispeech + STOP 9.549 3.537 3.625 9.514 5.59 5.562 link

Creating the fairseq datasets from STOP

First, create the audio file manifests and label files:

python examples/audio_nlp/nlu/generate_manifests.py --stop_root $STOP_DOWNLOAD_DIR/stop --output $FAIRSEQ_DATASET_OUTPUT/

Run ./examples/audio_nlp/nlu/create_dict_stop.sh $FAIRSEQ_DATASET_OUTPUT to generate the fairseq dictionaries.

Training an End-to-end NLU Model

Download a wav2vec or hubert model from link or link

python fairseq_cli/hydra-train  --config-dir examples/audio_nlp/nlu/configs/  --config-name nlu_finetuning task.data=$FAIRSEQ_DATA_OUTPUT model.w2v_path=$PRETRAINED_MODEL_PATH