fairseq/examples/rxf
Franz Nowak 0338cdc309
fix imports referencing moved metrics.py file (#4840)
* fix imports referencing moved metrics.py file

* Make representation computation branchless in TransformerEncoderBase (#4818)

Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix Torchscript typing in transformer_encoder.py (#4847)

* Add Generative Spoken Dialogue Language Modeling (#4879)

* Update deprecated torch.qr in glow.py example (#4685)

torch.qr is deprecated for a long time and is being removed by https://github.com/pytorch/pytorch/pull/70989.

This PR makes the example compatible with new and old PyTorch versions.

* Emotion Conversion Paper Open Source (#4895)

* data2vec v2.0 (#4903)

data2v2c 2.0
Co-authored-by: Arun Babu <arbabu@fb.com>
Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu>

* remove missing config entries when loading task from checkpoint (#4905)

* make apex optional (#4906)

* Add file to generate manifests for stop dataset. (#4891)

* Update STOP dataset README to include proper link. (#4892)

* Update README.md (#4893)

* using foreach to reduce kernel (#4904)

* using foreach to reduce kernel

* set reproducibility to looser threshold

* revert optimzer

* update

* update

* update

* update

* update

* update

* update

Co-authored-by: juntengjia <juntengjia@fb.com>

* Update README.md to add data2vec blog post (#4913)

* Update README.md

* Update config to fix circleci failure (#4949)

https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767

* Generative Spoken Dialogue Language Modeling Paper Open Source (#4957)

* wav2vec2_laser (#4968)

* ASR BLEU tool copied from ust branch into main (#4914)

* Add transcript option for asr-bleu (#4981)

---------

Co-authored-by: zhxchen17 <zhxchen17@outlook.com>
Co-authored-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Felix Kreuk <felixkreuk@gmail.com>
Co-authored-by: Alexei Baevski <alexei.b@gmail.com>
Co-authored-by: padentomasello <pdtomasello@gmail.com>
Co-authored-by: Junteng Jia <juntengjia@hotmail.com>
Co-authored-by: juntengjia <juntengjia@fb.com>
Co-authored-by: arbabu123 <arbabu@fb.com>
Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com>
Co-authored-by: Pierre Andrews <mortimer@fb.com>
Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu>
Co-authored-by: Xutai Ma <xutaima@gmail.com>
2023-02-23 16:18:36 -05:00
..
rxf_src fix imports referencing moved metrics.py file (#4840) 2023-02-23 16:18:36 -05:00
__init__.py Simplify --user-dir and require user-dir module name to be globally unique (#2815) 2020-10-29 17:08:20 -07:00
README.md Simplify --user-dir and require user-dir module name to be globally unique (#2815) 2020-10-29 17:08:20 -07:00

Better Fine-Tuning by Reducing Representational Collapse

This repo contains the code to replicate all experiments from the Better Fine-Tuning by Reducing Representational Collapse paper excluding the probing results.

The R3F sentence prediction criterion is registered as sentence_prediction_r3f while the label smoothing version of it is implemented as label_smoothed_cross_entropy_r3f. The R4F version of the sentence prediction criterion can be achieved by applying spectral norm to the classification head via the --spectral-norm-classification-head parameter.

Hyper-parameters

Our methods introduce 3 new hyper-parameters; --eps which sets the standard deviation or range of the distribution we're sampling from, --r3f-lambda which controls the combining of logistic loss and noisy KL loss and --noise-type which controls which parametric distribution we use ('normal', 'uniform').

For example to run R3F on RTE from GLUE

TOTAL_NUM_UPDATES=3120
WARMUP_UPDATES=187
LR=1e-05
NUM_CLASSES=2
MAX_SENTENCES=8        # Batch size.
ROBERTA_PATH=/path/to/roberta/model.pt

CUDA_VISIBLE_DEVICES=0 fairseq-train RTE-bin \
    --restore-file $ROBERTA_PATH \
    --max-positions 512 \
    --max-sentences $MAX_SENTENCES \
    --max-tokens 4400 \
    --task sentence_prediction \
    --reset-optimizer --reset-dataloader --reset-meters \
    --required-batch-size-multiple 1 \
    --init-token 0 --separator-token 2 \
    --arch roberta_large \
    --criterion sentence_prediction_r3f \
    --num-classes $NUM_CLASSES \
    --dropout 0.1 --attention-dropout 0.1 \
    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
    --clip-norm 0.0 \
    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
    --max-epoch 10 \
    --find-unused-parameters \
    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
    --noise-type uniform --r3f-lambda 0.7 \
    --user-dir examples/rxf/rxf_src

Citation

@article{aghajanyan2020better,
  title={Better Fine-Tuning by Reducing Representational Collapse},
  author={Aghajanyan, Armen and Shrivastava, Akshat and Gupta, Anchit and Goyal, Naman and Zettlemoyer, Luke and Gupta, Sonal},
  journal={arXiv preprint arXiv:2008.03156},
  year={2020}
}