fairseq/examples/shuffled_word_order
Shagun Sodhani fa7663c314 Update commands, checkpoints and contact info for shuffled word order paper (#4129)
Summary:
# Before submitting

- [ x ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ x ] Did you make sure to update the docs?
- [ x ] Did you write any new necessary tests?

## What does this PR do?

Update commands, checkpoints and contact info.

## PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4129

Reviewed By: dianaml0

Differential Revision: D33556233

Pulled By: shruti-bh

fbshipit-source-id: 3bad45b3e154fa11d4b13776d97408ce1a166113
2022-01-12 17:43:58 -08:00
..
README.finetuning.md Releasing models for our paper "Masked Language Modeling and the Distributional Hypothesis" (#1930) 2021-09-01 13:28:52 -07:00
README.md Update commands, checkpoints and contact info for shuffled word order paper (#4129) 2022-01-12 17:43:58 -08:00

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

https://arxiv.org/abs/2104.06644

Introduction

In this work, we pre-train RoBERTa base on various word shuffled variants of BookWiki corpus (16GB). We observe that a word shuffled pre-trained model achieves surprisingly good scores on GLUE, PAWS and several parametric probing tasks. Please read our paper for more details on the experiments.

Pre-trained models

Model Description Download
roberta.base.orig RoBERTa (base) trained on natural corpus roberta.base.orig.tar.gz
roberta.base.shuffle.n1 RoBERTa (base) trained on n=1 gram sentence word shuffled data roberta.base.shuffle.n1.tar.gz
roberta.base.shuffle.n2 RoBERTa (base) trained on n=2 gram sentence word shuffled data roberta.base.shuffle.n2.tar.gz
roberta.base.shuffle.n3 RoBERTa (base) trained on n=3 gram sentence word shuffled data roberta.base.shuffle.n3.tar.gz
roberta.base.shuffle.n4 RoBERTa (base) trained on n=4 gram sentence word shuffled data roberta.base.shuffle.n4.tar.gz
roberta.base.shuffle.512 RoBERTa (base) trained on unigram 512 word block shuffled data roberta.base.shuffle.512.tar.gz
roberta.base.shuffle.corpus RoBERTa (base) trained on unigram corpus word shuffled data roberta.base.shuffle.corpus.tar.gz
roberta.base.shuffle.corpus_uniform RoBERTa (base) trained on unigram corpus word shuffled data, where all words are uniformly sampled roberta.base.shuffle.corpus_uniform.tar.gz
roberta.base.nopos RoBERTa (base) without positional embeddings, trained on natural corpus roberta.base.nopos.tar.gz

Results

GLUE (Wang et al, 2019) & PAWS (Zhang et al, 2019) (dev set, single model, single-task fine-tuning, median of 5 seeds)

name CoLA MNLI MRPC PAWS QNLI QQP RTE SST-2
roberta.base.orig 61.4 86.11 89.19 94.46 92.53 91.26 74.64 93.92
roberta.base.shuffle.n1 35.15 82.64 86 89.97 89.02 91.01 69.02 90.47
roberta.base.shuffle.n2 54.37 83.43 86.24 93.46 90.44 91.36 70.83 91.79
roberta.base.shuffle.n3 48.72 83.85 86.36 94.05 91.69 91.24 70.65 92.02
roberta.base.shuffle.n4 58.64 83.77 86.98 94.32 91.69 91.4 70.83 92.48
roberta.base.shuffle.512 12.76 77.52 79.61 84.77 85.19 90.2 56.52 86.34
roberta.base.shuffle.corpus 0 71.9 70.52 58.52 71.11 85.52 53.99 83.35
roberta.base.shuffle.corpus_random 9.19 72.33 70.76 58.42 77.76 85.93 53.99 84.04
roberta.base.nopos 0 63.5 72.73 57.08 77.72 87.87 54.35 83.24

For more results on probing tasks, please refer to our paper.

Example Usage

Follow the same usage as in RoBERTa to load and test your models:

# Download roberta.base.shuffle.n1 model
wget https://dl.fbaipublicfiles.com/unnatural_pretraining/roberta.base.shuffle.n1.tar.gz
tar -xzvf roberta.base.shuffle.n1.tar.gz
# Copy the dictionary files
cd roberta.base.shuffle.n1.tar.gz
wget -O dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt && wget -O encoder.json https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json && wget -O vocab.bpe https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
cd ..

# Load the model in fairseq
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('/path/to/roberta.base.shuffle.n1', checkpoint_file='model.pt')
roberta.eval()  # disable dropout (or leave in train mode to finetune)

We have also provided a Google Colab notebook to demonstrate the loading of the model. The models were trained on top of Fairseq from the following commit: 62cff008ebeeed855093837507d5e6bf52065ee6.

Note: The model trained without positional embeddings (roberta.base.nopos) is a modified RoBERTa model, where the positional embeddings are not used. Thus, the typical from_pretrained method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new RoBERTaModel object by setting the flag use_positional_embeddings to False (or in the latest code, set no_token_positional_embeddings to True), and then load the individual weights.

Fine-tuning Evaluation

We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to finetuning details for the parameters of these models. Follow RoBERTa instructions to evaluate these models.

Model MNLI M Dev Accuracy Link
roberta.base.orig.mnli 86.14 Download
roberta.base.shuffle.n1.mnli 82.55 Download
roberta.base.shuffle.n2.mnli 83.21 Download
roberta.base.shuffle.n3.mnli 83.89 Download
roberta.base.shuffle.n4.mnli 84.00 Download
roberta.base.shuffle.512.mnli 77.22 Download
roberta.base.shuffle.corpus.mnli 71.88 Download
roberta.base.shuffle.corpus_uniform.mnli 72.46 Download

Citation

@misc{sinha2021masked,
      title={Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little},
      author={Koustuv Sinha and Robin Jia and Dieuwke Hupkes and Joelle Pineau and Adina Williams and Douwe Kiela},
      year={2021},
      eprint={2104.06644},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

For questions and comments, please reach out to Koustuv Sinha (koustuv.sinha@mail.mcgill.ca).