mirror of https://github.com/facebookresearch/fairseq.git synced 2024-11-12 21:52:01 +03:00

History

Myle Ott 54423d3b22 refactor RobertaEncoder (#1560 ) Summary: This is long overdue, but finally deprecating the RobertaEncoder components and just using TransformerEncoder directly. This will make it easier for some upcoming online backtranslation changes, and will eventually make migrating it to dataclasses/Hydra easier too. It also fixes some longstanding inconsistencies in layernorm placement in the model parallel roberta code. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1560 Test Plan: - confirmed that training gives identical losses as before: https://gist.github.com/myleott/9a4d213fb88a02b00094ea074f5a2e2d - confirmed that old roberta models can be loaded and produce identical results - confirmed that old linformer models can be loaded and produce identical results (reran commands from D25938236 (`bf54551caf`)) - confirmed that old model parallel models can be loaded and produce identical results: ``` python -m fairseq_cli.validate --path checkpoint.mp1/checkpoint_last.pt --task dummy_masked_lm --criterion masked_lm --max-sentences 8 --dataset-size 100 --model-parallel-size 2 --distributed-world-size 2 before: 2021-01-19 19:04:14 \| INFO \| valid \| \| valid on 'valid' subset \| loss 14.62 \| ppl 25174.3 \| wps 0 \| wpb 53248 \| bsz 104 after: 2021-01-19 19:06:59 \| INFO \| valid \| \| valid on 'valid' subset \| loss 14.62 \| ppl 25174.3 \| wps 0 \| wpb 53248 \| bsz 104 ``` Reviewed By: gwenzek, ngoyal2707 Differential Revision: D25937145 Pulled By: myleott fbshipit-source-id: 1ce0bc93e28e03fb926534ea4134684a49232599	2021-02-16 15:52:20 -08:00
..
linformer_src	refactor RobertaEncoder (#1560 )	2021-02-16 15:52:20 -08:00
README.md	Simplify --user-dir and require user-dir module name to be globally unique (#2815 )	2020-10-29 17:08:20 -07:00

Myle Ott 54423d3b22 refactor RobertaEncoder (#1560 )

Summary:
This is long overdue, but finally deprecating the RobertaEncoder components and just using TransformerEncoder directly. This will make it easier for some upcoming online backtranslation changes, and will eventually make migrating it to dataclasses/Hydra easier too. It also fixes some longstanding inconsistencies in layernorm placement in the model parallel roberta code.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1560

Test Plan:
- confirmed that training gives identical losses as before:
https://gist.github.com/myleott/9a4d213fb88a02b00094ea074f5a2e2d
- confirmed that old roberta models can be loaded and produce identical results
- confirmed that old linformer models can be loaded and produce identical results (reran commands from D25938236 (bf54551caf))
- confirmed that old model parallel models can be loaded and produce identical results:
```
python -m fairseq_cli.validate --path checkpoint.mp1/checkpoint_last.pt --task dummy_masked_lm --criterion masked_lm --max-sentences 8 --dataset-size 100 --model-parallel-size 2 --distributed-world-size 2

before:
2021-01-19 19:04:14 | INFO | valid |  | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104

after:
2021-01-19 19:06:59 | INFO | valid |  | valid on 'valid' subset | loss 14.62 | ppl 25174.3 | wps 0 | wpb 53248 | bsz 104
```

Reviewed By: gwenzek, ngoyal2707

Differential Revision: D25937145

Pulled By: myleott

fbshipit-source-id: 1ce0bc93e28e03fb926534ea4134684a49232599

2021-02-16 15:52:20 -08:00

linformer_src refactor RobertaEncoder (#1560 ) 2021-02-16 15:52:20 -08:00

README.md Simplify --user-dir and require user-dir module name to be globally unique (#2815 ) 2020-10-29 17:08:20 -07:00

README.md

Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)

This example contains code to train Linformer models as described in our paper Linformer: Self-Attention with Linear Complexity.

Training a new Linformer RoBERTa model

You can mostly follow the RoBERTa pretraining README, updating your training command with --user-dir examples/linformer/linformer_src --arch linformer_roberta_base.

Citation

If you use our work, please cite:

@article{wang2020linformer,
  title={Linformer: Self-Attention with Linear Complexity},
  author={Wang, Sinong and Li, Belinda and Khabsa, Madian and Fang, Han and Ma, Hao},
  journal={arXiv preprint arXiv:2006.04768},
  year={2020}
}