mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-21 06:13:31 +03:00

History

Shagun Sodhani fa7663c314 Update commands, checkpoints and contact info for shuffled word order paper (#4129 ) Summary: # Before submitting - [ x ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ x ] Did you make sure to update the docs? - [ x ] Did you write any new necessary tests? ## What does this PR do? Update commands, checkpoints and contact info. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/4129 Reviewed By: dianaml0 Differential Revision: D33556233 Pulled By: shruti-bh fbshipit-source-id: 3bad45b3e154fa11d4b13776d97408ce1a166113	2022-01-12 17:43:58 -08:00
..
README.finetuning.md	Releasing models for our paper "Masked Language Modeling and the Distributional Hypothesis" (#1930 )	2021-09-01 13:28:52 -07:00
README.md	Update commands, checkpoints and contact info for shuffled word order paper (#4129 )	2022-01-12 17:43:58 -08:00

Shagun Sodhani fa7663c314 Update commands, checkpoints and contact info for shuffled word order paper (#4129 )

Summary:
# Before submitting

- [ x ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ x ] Did you make sure to update the docs?
- [ x ] Did you write any new necessary tests?

## What does this PR do?

Update commands, checkpoints and contact info.

## PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4129

Reviewed By: dianaml0

Differential Revision: D33556233

Pulled By: shruti-bh

fbshipit-source-id: 3bad45b3e154fa11d4b13776d97408ce1a166113

2022-01-12 17:43:58 -08:00

README.finetuning.md

Releasing models for our paper "Masked Language Modeling and the Distributional Hypothesis" (#1930 )

2021-09-01 13:28:52 -07:00

README.md

Update commands, checkpoints and contact info for shuffled word order paper (#4129 )

2022-01-12 17:43:58 -08:00

README.md

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

https://arxiv.org/abs/2104.06644

Introduction

In this work, we pre-train RoBERTa base on various word shuffled variants of BookWiki corpus (16GB). We observe that a word shuffled pre-trained model achieves surprisingly good scores on GLUE, PAWS and several parametric probing tasks. Please read our paper for more details on the experiments.

Pre-trained models

Model	Description	Download
`roberta.base.orig`	RoBERTa (base) trained on natural corpus	roberta.base.orig.tar.gz
`roberta.base.shuffle.n1`	RoBERTa (base) trained on n=1 gram sentence word shuffled data	roberta.base.shuffle.n1.tar.gz
`roberta.base.shuffle.n2`	RoBERTa (base) trained on n=2 gram sentence word shuffled data	roberta.base.shuffle.n2.tar.gz
`roberta.base.shuffle.n3`	RoBERTa (base) trained on n=3 gram sentence word shuffled data	roberta.base.shuffle.n3.tar.gz
`roberta.base.shuffle.n4`	RoBERTa (base) trained on n=4 gram sentence word shuffled data	roberta.base.shuffle.n4.tar.gz
`roberta.base.shuffle.512`	RoBERTa (base) trained on unigram 512 word block shuffled data	roberta.base.shuffle.512.tar.gz
`roberta.base.shuffle.corpus`	RoBERTa (base) trained on unigram corpus word shuffled data	roberta.base.shuffle.corpus.tar.gz
`roberta.base.shuffle.corpus_uniform`	RoBERTa (base) trained on unigram corpus word shuffled data, where all words are uniformly sampled	roberta.base.shuffle.corpus_uniform.tar.gz
`roberta.base.nopos`	RoBERTa (base) without positional embeddings, trained on natural corpus	roberta.base.nopos.tar.gz

Results

GLUE (Wang et al, 2019) & PAWS (Zhang et al, 2019) (dev set, single model, single-task fine-tuning, median of 5 seeds)

name	CoLA	MNLI	MRPC	PAWS	QNLI	QQP	RTE	SST-2
`roberta.base.orig`	61.4	86.11	89.19	94.46	92.53	91.26	74.64	93.92
`roberta.base.shuffle.n1`	35.15	82.64	86	89.97	89.02	91.01	69.02	90.47
`roberta.base.shuffle.n2`	54.37	83.43	86.24	93.46	90.44	91.36	70.83	91.79
`roberta.base.shuffle.n3`	48.72	83.85	86.36	94.05	91.69	91.24	70.65	92.02
`roberta.base.shuffle.n4`	58.64	83.77	86.98	94.32	91.69	91.4	70.83	92.48
`roberta.base.shuffle.512`	12.76	77.52	79.61	84.77	85.19	90.2	56.52	86.34
`roberta.base.shuffle.corpus`	0	71.9	70.52	58.52	71.11	85.52	53.99	83.35
`roberta.base.shuffle.corpus_random`	9.19	72.33	70.76	58.42	77.76	85.93	53.99	84.04
`roberta.base.nopos`	0	63.5	72.73	57.08	77.72	87.87	54.35	83.24

For more results on probing tasks, please refer to our paper.

Example Usage

Follow the same usage as in RoBERTa to load and test your models:

# Download roberta.base.shuffle.n1 model
wget https://dl.fbaipublicfiles.com/unnatural_pretraining/roberta.base.shuffle.n1.tar.gz
tar -xzvf roberta.base.shuffle.n1.tar.gz
# Copy the dictionary files
cd roberta.base.shuffle.n1.tar.gz
wget -O dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt && wget -O encoder.json https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json && wget -O vocab.bpe https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
cd ..

# Load the model in fairseq
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('/path/to/roberta.base.shuffle.n1', checkpoint_file='model.pt')
roberta.eval()  # disable dropout (or leave in train mode to finetune)

We have also provided a Google Colab notebook to demonstrate the loading of the model. The models were trained on top of Fairseq from the following commit: 62cff008ebeeed855093837507d5e6bf52065ee6.

Note: The model trained without positional embeddings (roberta.base.nopos) is a modified RoBERTa model, where the positional embeddings are not used. Thus, the typical from_pretrained method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new RoBERTaModel object by setting the flag use_positional_embeddings to False (or in the latest code, set no_token_positional_embeddings to True), and then load the individual weights.

Fine-tuning Evaluation

We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to finetuning details for the parameters of these models. Follow RoBERTa instructions to evaluate these models.

Model	MNLI M Dev Accuracy	Link
`roberta.base.orig.mnli`	86.14	Download
`roberta.base.shuffle.n1.mnli`	82.55	Download
`roberta.base.shuffle.n2.mnli`	83.21	Download
`roberta.base.shuffle.n3.mnli`	83.89	Download
`roberta.base.shuffle.n4.mnli`	84.00	Download
`roberta.base.shuffle.512.mnli`	77.22	Download
`roberta.base.shuffle.corpus.mnli`	71.88	Download
`roberta.base.shuffle.corpus_uniform.mnli`	72.46	Download

Citation

@misc{sinha2021masked,
      title={Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little},
      author={Koustuv Sinha and Robin Jia and Dieuwke Hupkes and Joelle Pineau and Adina Williams and Douwe Kiela},
      year={2021},
      eprint={2104.06644},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

For questions and comments, please reach out to Koustuv Sinha (koustuv.sinha@mail.mcgill.ca).