mirror of
https://github.com/facebookresearch/fairseq.git
synced 2024-11-12 21:52:01 +03:00
54423d3b22
Summary:
This is long overdue, but finally deprecating the RobertaEncoder components and just using TransformerEncoder directly. This will make it easier for some upcoming online backtranslation changes, and will eventually make migrating it to dataclasses/Hydra easier too. It also fixes some longstanding inconsistencies in layernorm placement in the model parallel roberta code.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1560
Test Plan:
- confirmed that training gives identical losses as before:
https://gist.github.com/myleott/9a4d213fb88a02b00094ea074f5a2e2d
- confirmed that old roberta models can be loaded and produce identical results
- confirmed that old linformer models can be loaded and produce identical results (reran commands from D25938236 (
|
||
---|---|---|
.. | ||
linformer_src | ||
README.md |
Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
This example contains code to train Linformer models as described in our paper Linformer: Self-Attention with Linear Complexity.
Training a new Linformer RoBERTa model
You can mostly follow the RoBERTa pretraining README,
updating your training command with --user-dir examples/linformer/linformer_src --arch linformer_roberta_base
.
Citation
If you use our work, please cite:
@article{wang2020linformer,
title={Linformer: Self-Attention with Linear Complexity},
author={Wang, Sinong and Li, Belinda and Khabsa, Madian and Fang, Han and Ma, Hao},
journal={arXiv preprint arXiv:2006.04768},
year={2020}
}