Summary:
- Set default value of clip-norm back to 0.0 (disabled)
- Add comment explaining that we divide loss by log(2) to covert the base
- Fix `--zero-optimizer=os` (fixes#2811)
- Update requirements to PyTorch >= 1.5
- Fix bug in fixed LR schedule
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1392
Reviewed By: alexeib
Differential Revision: D24714231
Pulled By: myleott
fbshipit-source-id: 63dc8cfc74683bbccbf05b44228014eb12ddbfc7
Summary:
## What does this PR do?
Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156
This code was used to generate all the results from the paper excluding probing results.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455
Reviewed By: myleott
Differential Revision: D23444863
Pulled By: AkshatSh
fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf).
New features and design choices made:
- New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`.
- Design choice: move `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`.
- New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z.
- Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer.
- New feature: allow multilingual_translation task to train with latent depth (results in the paper).
- Design choice:
- added additional arguments in the multilingual_translation task.
- added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder.
- added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703
Reviewed By: myleott
Differential Revision: D24155059
Pulled By: xianxl
fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668
Summary:
# Before submitting
- [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [Y] Did you make sure to update the docs?
- [N/A] Did you write any new necessary tests?
## What does this PR do?
Add code to reproduce results from Cross-lingual Retrieval for Iterative Self-supervised Training.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1344
Test Plan:
Imported from GitHub, without a `Test Plan:` line.
See https://github.com/fairinternal/fairseq-py/tree/criss_pr/examples/criss
Reviewed By: myleott
Differential Revision: D24268469
Pulled By: chtran
fbshipit-source-id: d4dd36b22bde3c364ce6e935bd39baf8f96e0735
Summary:
# Before submitting
- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
## What does this PR do?
This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add *ordered constraints*, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).
### Usage and quick start
It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)):
```bash
echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \
| [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \
| [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \
| PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \
--bpe fastbpe \
--bpe-codes $modeldir/bpecodes \
--constraints \
--constraints-both
-s de -t en \
--path $modeldir/model1.pt \
--max-tokens 1000 \
--beam 5 \
```
Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)
```text
S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0 1.844 seconds
C-0 hard
C-0 influence
H-0 -1.5333266258239746 Mach@@ ine trans@@ lation is hard to influence .
D-0 -1.5333266258239746 Machine translation is hard to influence .
P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0 1.844 seconds
H-0 -0.3731671869754791 Mach@@ ine trans@@ lation is difficult to control .
D-0 -0.3731671869754791 Machine translation is difficult to control .
P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514
2020-07-31 12:17:55 | INFO | fairseq_cli.interactive | Total time: 12.803 seconds; translation time: 3.688
```
Note the new tags present in the output:
* `C-#` records active constraints (after applying preprocessing) for a sentence
* `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)
Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.
### Implementation notes
This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.
Addresses https://github.com/pytorch/fairseq/issues/1536.
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402
Reviewed By: alexeib
Differential Revision: D23188945
Pulled By: myleott
fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6
Summary:
Incorporate several fixes, incl. from OSS contributors:
- fix model argument in sequence generator in semisupervised_translation.py
- fix aggregate logging in semisupervised_translation.py
- Fix EOS token in multilingual_denoising
- Handle missing eos_idx in data_utils.collate_tokens
- Better OOM handling for single-GPU training
- fix prepend_bos argument in translation_from_pretrained_bart.py …
- Fix eos_idx in multilingual_denoising
- Small logging fixes
- Fix fb_hub on PyTorch 1.6
- Better variable names
- Add support for model parallel to interactive.py
- Use `//` operator to fix Integer division warning
- Set default `--clip-norm=0.0`
- Cleanup some binaries in root directory
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196
Reviewed By: ngoyal2707
Differential Revision: D22162202
Pulled By: myleott
fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Add code for published paper from FB
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
*Still WIP*
jmp84
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1707
Reviewed By: jmp84
Differential Revision: D21304498
Pulled By: xutaima
fbshipit-source-id: 073d522e0eeef3e02c83e4617b8e5b697ff6979b
Summary:
FUNCTIONALITY:
This diff provides two core pieces of functionality
- Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training
- Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training.
TODO:
-> Pierre, look at quantization code
-> int4 and int8 quantization will be added soon.
EVALUATED TEST CASES:
0. Training of LM and BERT models starts from scratch with no errors -> yes
1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise
2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes
3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline
4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes
5. Check wps during training and wps during inference, no large change from before -> yes
6. Check structured dropout isn't being applied at eval time -> yes
7. Works in combination with LayerDrop -> yes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896
Reviewed By: myleott
Differential Revision: D20609420
Pulled By: huihuifan
fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1124
Reviewed By: myleott
Differential Revision: D20749898
fbshipit-source-id: 42bca96d8d65158ae858ceaa7386afedf1696ebb
Summary:
Implemented byte-level BPE described in ["Neural Machine Translation with Byte-Level Subwords"](https://arxiv.org/abs/1909.03341)
* Added bytes/characters/byte-level BPE tokenizers to fairseq.data.encoder
* Added detokenization option to generate.py
* Added an example under examples/byte_level_bpe
* Implemented Transformer model with Bi-GRU embedding contextualization: `examples/byte_level_bpe/gru_transformer.py`
Reviewed By: myleott
Differential Revision: D20600963
fbshipit-source-id: 3eca4d046056c07f65333123416017a4eac04c8a
Summary:
Hi,
this PR updates the link to mBART documentation in main readme.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1789
Differential Revision: D20322673
Pulled By: myleott
fbshipit-source-id: b59c94f49176ba5bbd664791818b5b8ce7402698
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1033
Differential Revision: D20122520
Pulled By: yinhanliu
fbshipit-source-id: e2fd93e2fa9b7a8e276acc4316a176ba3ceae4ed
Summary:
Recent releases of apex removed the `fused_adam_cuda` function used in 3f4fc50163/fairseq/optim/adam.py (L220). Users need to use the `--deprecated_fused_adam` option to isntall `fused_adam_cuda`
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1563
Differential Revision: D19260517
Pulled By: myleott
fbshipit-source-id: 69af015f3ef1fa85b98d138c28876ada194c9437
Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.
I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904
Reviewed By: ngoyal2707
Differential Revision: D18418345
Pulled By: louismartin
fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17
Summary:
This is the first version of BART code / model release.
It still requires lot of clean up, instructions, making sure results are reproducible before we can release it.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902
Differential Revision: D18389535
fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46
Summary:
TODO:
1) Need to update bibtex entry
2) Need to upload models, spm_vocab and dict.txt to public s3 location.
For Future:
1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900
Reviewed By: myleott
Differential Revision: D18333076
Pulled By: myleott
fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877
This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".
In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095
Differential Revision: D17170337
Pulled By: myleott
fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859
Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class
Reviewed By: myleott
Differential Revision: D17297372
fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7
Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786
Differential Revision: D16560654
Pulled By: myleott
fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034
Summary:
See #467. Ping myleott to review.
This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794
Differential Revision: D15756816
Pulled By: myleott
fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b
Summary:
Change the wording to avoid confusion. Mixed precision ensures both higher arithmetic throughput and numerical stability, not exactly synonymous to pure half-precision/FP16 training. Also add mentioning of tensor cores since older generation GPUs without tensor cores don't support true mixed precision training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/766
Differential Revision: D15559565
Pulled By: myleott
fbshipit-source-id: c71e720772657bb3e8ad330b58bf69e23beb614e
Summary:
Code for the paper: [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](https://arxiv.org/abs/1902.07816).
Pull Request resolved: https://github.com/pytorch/fairseq/pull/521
Differential Revision: D14188021
Pulled By: myleott
fbshipit-source-id: ed5b1ed5ad9a582359bd5215fa2ea26dc76c673e