Rename references from master -> main in preparation for branch name change (#2297)

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2297 Reviewed By: alexeib Differential Revision: D30906090 Pulled By: dianaml0 fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
2024-08-16 20:10:40 +03:00 · 2021-09-20 08:04:06 -07:00 · 2021-09-20 08:04:06 -07:00 · 5adfeaccf9
commit 5adfeaccf9
parent f6abcc2a67
23 changed files with 57 additions and 57 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -19,7 +19,7 @@ Steps to reproduce the behavior (**always include the command you ran**):


 #### Code sample
-<!-- Ideally attach a minimal code sample to reproduce the decried issue. 
+<!-- Ideally attach a minimal code sample to reproduce the decried issue.
 Minimal means having the shortest code but still preserving the bug. -->

 ### Expected behavior
@ -28,7 +28,7 @@ Minimal means having the shortest code but still preserving the bug. -->

 ### Environment

- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
 - PyTorch Version (e.g., 1.0)
 - OS (e.g., Linux):
 - How you installed fairseq (`pip`, source):
--- a/.github/ISSUE_TEMPLATE/how-to-question.md
+++ b/.github/ISSUE_TEMPLATE/how-to-question.md
@ -6,9 +6,9 @@ labels: 'question, needs triage'

 ## ❓ Questions and Help

-### Before asking:   
-1. search the issues.   
-2. search the docs.    
+### Before asking:
+1. search the issues.
+2. search the docs.

 <!-- If you still can't find what you need: -->

@ -16,13 +16,13 @@ labels: 'question, needs triage'

 #### Code

-<!-- Please paste a code snippet if your question requires it! -->   
+<!-- Please paste a code snippet if your question requires it! -->

 #### What have you tried?

 #### What's your environment?

- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
 - PyTorch Version (e.g., 1.0)
 - OS (e.g., Linux):
 - How you installed fairseq (`pip`, source):
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -1,15 +1,15 @@
 # Before submitting

 - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?   
- [ ] Did you write any new necessary tests?  
+- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
+- [ ] Did you make sure to update the docs?
+- [ ] Did you write any new necessary tests?

 ## What does this PR do?
 Fixes # (issue).

-## PR review    
-Anyone in the community is free to review the PR once the tests have passed.     
+## PR review
+Anyone in the community is free to review the PR once the tests have passed.
 If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

 ## Did you have fun?
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@ -1,10 +1,10 @@
 name: build

 on:
-  # Trigger the workflow on push to master or any pull request
+  # Trigger the workflow on push to main or any pull request
  push:
    branches:
-      - master
+      - main
  pull_request:

 jobs:
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -5,7 +5,7 @@ possible.
 ## Pull Requests
 We actively welcome your pull requests.

-1. Fork the repo and create your branch from `master`.
+1. Fork the repo and create your branch from `main`.
 2. If you've added code that should be tested, add tests.
 3. If you've changed APIs, update the documentation.
 4. Ensure the test suite passes.
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
  <img src="docs/fairseq_logo.png" width="150">
  <br />
  <br />
-  <a href="https://github.com/pytorch/fairseq/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
+  <a href="https://github.com/pytorch/fairseq/blob/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
  <a href="https://github.com/pytorch/fairseq/releases"><img alt="Latest Release" src="https://img.shields.io/github/release/pytorch/fairseq.svg" /></a>
  <a href="https://github.com/pytorch/fairseq/actions?query=workflow:build"><img alt="Build Status" src="https://github.com/pytorch/fairseq/workflows/build/badge.svg" /></a>
  <a href="https://fairseq.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/fairseq/badge/?version=latest" /></a>
@ -48,7 +48,7 @@ We provide reference implementations of various sequence modeling papers:
  + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
  + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
  + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
-  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979) 
+  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979)
  + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027)
  + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https://arxiv.org/abs/2105.11084)
 * **Non-autoregressive Transformers**
@ -93,7 +93,7 @@ We provide reference implementations of various sequence modeling papers:
 * April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)
 * March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
 * February 2020: [mBART model and code released](examples/mbart/README.md)
-* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
+* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/main/examples/backtranslation#training-your-own-model-wmt18-english-german)
 * December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)
 * November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
 * November 2019: [CamemBERT model and code released](examples/camembert/README.md)
--- a/docs/conf.py
+++ b/docs/conf.py
@ -55,7 +55,7 @@ project = "fairseq"
 copyright = "Facebook AI Research (FAIR)"
 author = "Facebook AI Research (FAIR)"

-github_doc_root = "https://github.com/pytorch/fairseq/tree/master/docs/"
+github_doc_root = "https://github.com/pytorch/fairseq/tree/main/docs/"

 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
--- a/examples/adaptive_span/README.md
+++ b/examples/adaptive_span/README.md
@ -4,7 +4,7 @@ Adaptive Span is a novel self-attention mechanism that can learn its optimal
 attention span. This allows us to extend significantly the maximum context size
 used in Transformer, while maintaining control over their memory footprint
 and computational time. It uses the Truncated BPTT technique for training,
-as in [transformerXL](https://github.com/pytorch/fairseq/blob/master/examples/truncated_bptt/README.md).
+as in [transformerXL](https://github.com/pytorch/fairseq/blob/main/examples/truncated_bptt/README.md).

 Adaptive Span was introduced by paper:
 [Adaptive Attention Span in Transformers](https://arxiv.org/abs/1905.07799),
--- a/examples/constrained_decoding/README.md
+++ b/examples/constrained_decoding/README.md
@ -12,7 +12,7 @@ Constrained search is enabled by adding the command-line argument `--constraints
 Constraints are appended to each line of input, separated by tabs. Each constraint (one or more tokens)
 is a separate field.

-The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md),
+The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/main/examples/wmt19/README.md),
 translates the sentence *Die maschinelle Übersetzung ist schwer zu kontrollieren.* with the constraints
 "hard" and "to influence".

--- a/examples/discriminative_reranking_nmt/README.md
+++ b/examples/discriminative_reranking_nmt/README.md
@ -38,7 +38,7 @@ source_sentence_L_hypo_1
 source_sentence_L_hypo_N
 ```

-2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/master/examples/xlmr#pre-trained-models).
+2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/main/examples/xlmr#pre-trained-models).
 ```
 wget https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz
 tar zxvf xlmr.base.tar.gz
--- a/examples/fast_noisy_channel/README.md
+++ b/examples/fast_noisy_channel/README.md
@ -29,9 +29,9 @@ This framework provides a great way to utlize strong target language models trai

 ### Training Translation Models and Language Models

-For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/translation)
+For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/translation)

-For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)

 ### Generation with Language Model for German-English translation with fairseq

--- a/examples/layerdrop/README.md
+++ b/examples/layerdrop/README.md
@ -126,9 +126,9 @@ This model override command overrides the training parameters and updates the mo

 Looking to reproduce the results in the paper?

-1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/scaling_nmt/README.md)
-2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
-3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/scaling_nmt/README.md)
+2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta)
+3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)


 ## Tips
--- a/examples/m2m_100/README.md
+++ b/examples/m2m_100/README.md
@ -82,7 +82,7 @@ fairseq-preprocess \

 3. **Training Scripts**

-To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/master/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
+To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/main/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).

 4. **Generation**

--- a/examples/multilingual/README.md
+++ b/examples/multilingual/README.md
@ -17,9 +17,9 @@ This work is for training multilingual translation models with multiple bitext d
  - --finetune-from-model to specify the path from which to load the pretrained model

 ## Preprocessing data
-Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/master/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
+Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/main/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.

-You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation).
+You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/main/examples/translation#multilingual-translation).

 ## Training

@ -49,7 +49,7 @@ fairseq-train $path_2_data \
 ```

 ## Finetuning
-We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/master/examples/mbart).
+We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/main/examples/mbart).
 ```bash
 lang_pairs=<language pairs to be trained, e.g. "en-cs,cs-en">
 path_2_data=<set to data path>
--- a/examples/quant_noise/README.md
+++ b/examples/quant_noise/README.md
@ -33,7 +33,7 @@ Unlike the section [Iterative Product Quantization](#iterative-product-quantizat

 #### Training

-Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
+Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).

 To train a model with Quant-Noise, add the following flag:
 ```
@ -49,7 +49,7 @@ When evaluating a network, all quantized modules and activation hooks automatica
 #### Integration with your own code

 Looking to quantize your own models with Quant-Noise + Scalar Quantization?
- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
+- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
 - Then, perform your training as usual. Note that in `eval()` mode, the network is always fully quantized (weights and activations) by default (`p=1`).


@ -66,12 +66,12 @@ To train a model with Quant-Noise, add the following flags:
 --quant-noise-pq 0.1 --quant-noise-pq-block-size 8
 ```
 `quant-noise-pq` controls how much dropout is applied to the blocks of the weight matrix. `quant-noise-pq-block-size` controls the size of the weight matrix blocks.
-We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
+We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.

-We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
+We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
 In the Transformer architectures, quant-noise is applied to the input and output embeddings, the attention, and the FFN.

-Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
+Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/main/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.

 #### Quantization

@ -84,8 +84,8 @@ For the particular case of PQ, quantization is made sequentially. We recommend f
 #### Integration with your own code

 Looking to quantize your own models with Quant-Noise + iPQ?
- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
+- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
+- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
 Note that we tried our approach only on Transformers and various Convolutional Models such as EfficientNets.

 ```python
@ -128,7 +128,7 @@ We detail below how to reproduce the state-of-the-art results in reported in the

 ### Training with Quant-Noise

-To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
+To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta).
 The following command can be used to train a RoBERTa Base + QuantNoise model:

 ```bash
@ -158,7 +158,7 @@ fairseq-train $DATA_DIR \
    --quant-noise-pq 0.2 --quant-noise-pq-block-size 8 --untie-weights-roberta
 ```

-To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md).
+To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/roberta/README.glue.md).
 The following command can be used to finetune a RoBERTa Base + QuantNoise model on the RTE dataset:

 ```bash
@ -193,7 +193,7 @@ fairseq-train /path/to/rte/data/ \
    --quant-noise-pq 0.2 --quant-noise-pq-block-size 8
 ```

-To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model).
+To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model).
 The following command can be used to train a Transformer + QuantNoise model on Wikitext-103:

 ```bash
--- a/examples/roberta/README.md
+++ b/examples/roberta/README.md
@ -8,13 +8,13 @@ RoBERTa iterates on BERT's pretraining procedure, including training the model l

 ### What's New:

- December 2020: German model (GottBERT) is available: [GottBERT](https://github.com/pytorch/fairseq/tree/master/examples/gottbert).
+- December 2020: German model (GottBERT) is available: [GottBERT](https://github.com/pytorch/fairseq/tree/main/examples/gottbert).
 - January 2020: Italian model (UmBERTo) is available from Musixmatch Research: [UmBERTo](https://github.com/musixmatchresearch/umberto).
- November 2019: French model (CamemBERT) is available: [CamemBERT](https://github.com/pytorch/fairseq/tree/master/examples/camembert).
- November 2019: Multilingual encoder (XLM-RoBERTa) is available: [XLM-R](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
+- November 2019: French model (CamemBERT) is available: [CamemBERT](https://github.com/pytorch/fairseq/tree/main/examples/camembert).
+- November 2019: Multilingual encoder (XLM-RoBERTa) is available: [XLM-R](https://github.com/pytorch/fairseq/tree/main/examples/xlmr).
 - September 2019: TensorFlow and TPU support via the [transformers library](https://github.com/huggingface/transformers).
 - August 2019: RoBERTa is now supported in the [pytorch-transformers library](https://github.com/huggingface/pytorch-transformers).
- August 2019: Added [tutorial for finetuning on WinoGrande](https://github.com/pytorch/fairseq/tree/master/examples/roberta/wsc#roberta-training-on-winogrande-dataset).
+- August 2019: Added [tutorial for finetuning on WinoGrande](https://github.com/pytorch/fairseq/tree/main/examples/roberta/wsc#roberta-training-on-winogrande-dataset).
 - August 2019: Added [tutorial for pretraining RoBERTa using your own data](README.pretraining.md).

 ## Pre-trained models
--- a/examples/roberta/commonsense_qa/README.md
+++ b/examples/roberta/commonsense_qa/README.md
@ -96,4 +96,4 @@ print('Accuracy: ' + str(ncorrect / float(nsamples)))
 ```

 The above snippet is not batched, which makes it quite slow. See [instructions
-for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta#batched-prediction).
+for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta#batched-prediction).
--- a/examples/shuffled_word_order/README.md
+++ b/examples/shuffled_word_order/README.md
@ -40,7 +40,7 @@ For more results on probing tasks, please refer to [our paper](https://arxiv.org

 ## Example Usage

-Follow the same usage as in [RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta) to load and test your models:
+Follow the same usage as in [RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta) to load and test your models:

 ```python
 # Download roberta.base.shuffle.n1 model
@ -53,11 +53,11 @@ roberta = RoBERTaModel.from_pretrained('/path/to/roberta.base.shuffle.n1', check
 roberta.eval()  # disable dropout (or leave in train mode to finetune)
 ```

-**Note**: The model trained without positional embeddings (`roberta.base.nopos`) is a modified `RoBERTa` model, where the positional embeddings are not used. Thus, the typical `from_pretrained` method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new `RoBERTaModel` object by setting the flag `use_positional_embeddings` to `False` (or [in the latest code](https://github.com/pytorch/fairseq/blob/master/fairseq/models/roberta/model.py#L543), set `no_token_positional_embeddings` to `True`), and then load the individual weights.
+**Note**: The model trained without positional embeddings (`roberta.base.nopos`) is a modified `RoBERTa` model, where the positional embeddings are not used. Thus, the typical `from_pretrained` method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new `RoBERTaModel` object by setting the flag `use_positional_embeddings` to `False` (or [in the latest code](https://github.com/pytorch/fairseq/blob/main/fairseq/models/roberta/model.py#L543), set `no_token_positional_embeddings` to `True`), and then load the individual weights.

 ## Fine-tuning Evaluation

-We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to [finetuning details](README.finetuning.md) for the parameters of these models. Follow [RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta) instructions to evaluate these models.
+We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to [finetuning details](README.finetuning.md) for the parameters of these models. Follow [RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta) instructions to evaluate these models.

 | Model                                      | MNLI M Dev Accuracy | Link                                                                                                             |
 | :----------------------------------------- | :------------------ | :--------------------------------------------------------------------------------------------------------------- |
--- a/examples/speech_synthesis/docs/ljspeech_example.md
+++ b/examples/speech_synthesis/docs/ljspeech_example.md
@ -38,7 +38,7 @@ For your convenience, we provide pre-computed
 [force-alignment](https://dl.fbaipublicfiles.com/fairseq/s2/ljspeech_mfa.zip) from
 [Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and
 [pseudo-text units](s3://dl.fbaipublicfiles.com/fairseq/s2/ljspeech_hubert.tsv) from
-[HuBERT](https://github.com/pytorch/fairseq/tree/master/examples/hubert). You can also generate them by yourself using
+[HuBERT](https://github.com/pytorch/fairseq/tree/main/examples/hubert). You can also generate them by yourself using
 a different software or model.


@ -106,7 +106,7 @@ use `--sample-rate 16000` for `get_eval_manifest.py`.


 #### WER/CER metric
-We use wav2vec 2.0 ASR model as example. [Download](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec)
+We use wav2vec 2.0 ASR model as example. [Download](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec)
 the model checkpoint and dictionary, then compute WER/CER with
 ```bash
 python -m examples.speech_synthesis.evaluation.eval_asr \
--- a/examples/textless_nlp/gslm/README.md
+++ b/examples/textless_nlp/gslm/README.md
@ -3,7 +3,7 @@
 * [Paper](https://arxiv.org/abs/2102.01192)
 * [Demo](https://speechbot.github.io/gslm/index.html)

-We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/master/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec). Our system is composed of three components, namely, *speech2unit*, *ulm* and *unit2speech*. We explain about models and usage of these components in their respective sub-directories. See the links below.
+We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/main/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec). Our system is composed of three components, namely, *speech2unit*, *ulm* and *unit2speech*. We explain about models and usage of these components in their respective sub-directories. See the links below.

 ## Speech to Unit Model (speech2unit)
 Speech to unit model is used for quantizing raw speech into learned discrete speech units. [More details](speech2unit)
@ -18,4 +18,4 @@ Unit to speech model is used for synthesizing speech from discrete speech units.
 We show how to compute ASR based metrics as well as zero-shot metrics proposed in our paper [here](metrics).

 ## Tools
-We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools)
+We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools)
--- a/examples/wav2vec/unsupervised/README.md
+++ b/examples/wav2vec/unsupervised/README.md
@ -1,6 +1,6 @@
 # wav2vec Unsupervised  (wav2vec-U)
  
-Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition systems without any labeled training data as described in [Unsupervised Speech Recognition (Baevski et al., 2021)](https://ai.facebook.com/research/publications/unsupervised-speech-recognition).  The model takes as input wav2vec 2.0 or XLSR representations (see [pretrained models](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec)) as well as unlabeled speech and text data.  
+Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition systems without any labeled training data as described in [Unsupervised Speech Recognition (Baevski et al., 2021)](https://ai.facebook.com/research/publications/unsupervised-speech-recognition).  The model takes as input wav2vec 2.0 or XLSR representations (see [pretrained models](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec)) as well as unlabeled speech and text data.
  
  The wav2vec-U training procedure consists of three consecutive main steps:
 * Preparation of speech representations and text data
@ -8,7 +8,7 @@ Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition
 * Iterative self-training + Kaldi LM-decoding

 ## Preparation of speech and text data
-Similar to [wav2vec 2.0](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md),  data folders contain {train,valid,test}.{tsv,wrd,phn} files, where audio paths are stored in tsv files, and word, letter or phoneme transcriptions are stored in .{wrd,ltr,phn}.
+Similar to [wav2vec 2.0](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/README.md),  data folders contain {train,valid,test}.{tsv,wrd,phn} files, where audio paths are stored in tsv files, and word, letter or phoneme transcriptions are stored in .{wrd,ltr,phn}.

 In **/path/to/data/with_silence** you need a *train.tsv* file as well as (optionally) *{valid,test}.{tsv,wrd,phn}*. It is nice to have *10h.{tsv,phn}* files there too for reproducing the ablation study on  layer selection. In **/path/to/data/without_silence** you have the same files, except *.tsv* files contain audios with silences removed using rVAD.

--- a/fairseq/models/bart/hub_interface.py
+++ b/fairseq/models/bart/hub_interface.py
@ -23,7 +23,7 @@ logger = logging.getLogger(__name__)
 class BARTHubInterface(GeneratorHubInterface):
    """A simple PyTorch Hub interface to BART.

-    Usage: https://github.com/pytorch/fairseq/tree/master/examples/bart
+    Usage: https://github.com/pytorch/fairseq/tree/main/examples/bart
    """

    def __init__(self, cfg, task, model):
--- a/fairseq/models/roberta/hub_interface.py
+++ b/fairseq/models/roberta/hub_interface.py
@ -14,7 +14,7 @@ from fairseq.data import encoders
 class RobertaHubInterface(nn.Module):
    """A simple PyTorch Hub interface to RoBERTa.

-    Usage: https://github.com/pytorch/fairseq/tree/master/examples/roberta
+    Usage: https://github.com/pytorch/fairseq/tree/main/examples/roberta
    """

    def __init__(self, cfg, task, model):