fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-21 22:27:16 +03:00

Author	SHA1	Message	Date
Alex Liu	a0ceabc287	include wav2vec-u 2.0 (#2826 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ x ] Did you make sure to update the docs? - [ x ] Did you write any new necessary tests? ## What does this PR do? include wav2vec-u 2.0 !!! TODO !!! update title/link of paper in readme X-link: https://github.com/fairinternal/fairseq-py/pull/2826 Reviewed By: michaelauli Differential Revision: D37162174 Pulled By: alexeib fbshipit-source-id: b985ebb9bb94c25d30b6fc53d8c79088cb9798f9	2022-06-14 21:54:56 -07:00
dianaml0	3a72168bd8	Add CircleCI status badge (#4473 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/facebookresearch/fairseq/pull/4473 Reviewed By: cbalioglu Differential Revision: D37052250 Pulled By: dianaml0 fbshipit-source-id: e5e4c38a9108c769953ef2202c7adb8aa335771a	2022-06-10 06:55:45 -07:00
dianaml0	51478ad3a1	xformer integration (#2263 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? This PR is a cleaned up version of https://github.com/fairinternal/fairseq-py/issues/2138. It is based on the `main` branch instead of the `gshard` branch. Removed call to xFormers MultiHeadDispatch, only using xFormers Attention. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � X-link: https://github.com/fairinternal/fairseq-py/pull/2263 Reviewed By: blefaudeux Differential Revision: D33800377 Pulled By: dianaml0 fbshipit-source-id: 658d52214c782212b12881b30c4d908a763b4cf2	2022-05-04 09:15:36 -07:00
Dmitry Vinnik	592c1227f4	docs: add social button in support of Ukraine (#4249 ) Summary: Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis. Pull Request resolved: https://github.com/pytorch/fairseq/pull/4249 Reviewed By: arbabu123 Differential Revision: D34635479 Pulled By: dmitryvinn-fb fbshipit-source-id: 488d30f0967ae9542ead968c5cb951ecf0e02a64	2022-03-04 16:28:09 -08:00
Ann Lee	5cd7a21cc1	S2ST oss (#2756 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Releasing code, model & recipe for the work "Direct speech-to-speech translation with discrete units". Main changes: 1. examples/speech_to_speech 2. tasks/speech_to_speech 3. data/audio/speech_to_speech_dataset 4. models/speech_to_speech 5. criterions/speech_to_speech_criterion ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2756 Reviewed By: sravyapopuri388, kahne Differential Revision: D32923969 Pulled By: an918tw fbshipit-source-id: 838ba42457f4684e9767d15b5b514681a9572b39	2021-12-28 08:07:55 -08:00
Arun Babu	7105d7f4b1	attempt5 (#2658 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2658 Reviewed By: ngoyal2707 Differential Revision: D32520446 Pulled By: arbabu123 fbshipit-source-id: a4cbc12624c9c8c1b5bc3d64eb47c2fdec01eb87	2021-11-17 20:56:37 -08:00
Sam Shleifer	c5ff181125	NormFormer: flags and docs (#2460 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2460 Reviewed By: myleott Differential Revision: D31731798 Pulled By: sshleifer fbshipit-source-id: 938456c17aa004cacffdcdd124aebe390da83d5f	2021-10-19 17:13:04 -07:00
Hu Xu	862efab86f	MMPT bug fixes (#2428 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes argument for `lr_scheduler.total_num_update`; missing import of `dsprocessor` for COIN; `vmasks` on demo inference; update README.md of fairseq for examples/MMPT. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2428 Reviewed By: berniebear Differential Revision: D31528947 Pulled By: howardhsu fbshipit-source-id: 1fecf34bdab82cbf6001e3905a532e4e6eb38e01	2021-10-10 02:21:43 -07:00
Qiantong Xu	6639842016	zero-shot model release (#2407 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? zero-shot model release ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2407 Reviewed By: alexeib Differential Revision: D31417241 Pulled By: xuqiantong fbshipit-source-id: 576644694638d3b2606f1751b74feb0531b50eb7	2021-10-05 18:31:47 -07:00
Diana Liskovich	5adfeaccf9	Rename references from master -> main in preparation for branch name change (#2297 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2297 Reviewed By: alexeib Differential Revision: D30906090 Pulled By: dianaml0 fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070	2021-09-20 08:29:38 -07:00
dianaml0	f6abcc2a67	update on branch renaming (#3879 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3879 Reviewed By: myleott Differential Revision: D30969142 Pulled By: dianaml0 fbshipit-source-id: 902154c03fd68ae6645d3e0ac07b7d729dfc7934	2021-09-16 10:03:02 -07:00
alexeib	2513524a16	add finetuned robust w2v models and update readme (#2196 ) Summary: adds finetuned robust w2v models and updates readme fixes #3721 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2196 Reviewed By: wnhsu Differential Revision: D30367999 Pulled By: alexeib fbshipit-source-id: 616b373bf31265c89f694fba7dccce2961d394f3	2021-08-17 06:56:49 -07:00
Ann Lee	7f2fb5caa8	Release code for the paper "Discriminative Reranking for Neural Machine Translation" (#2044 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Release the code for the paper "Discriminative Reranking for Neural Machine Translation" ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2044 Reviewed By: michaelauli Differential Revision: D29628590 Pulled By: an918tw fbshipit-source-id: 7a52602d495b736573187cc721829aa545d24770	2021-07-09 17:26:02 -07:00
alexeib	d18e44a289	add robust w2v model (#2046 ) Summary: add robust wav2vec model Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2046 Reviewed By: wnhsu Differential Revision: D29628639 Pulled By: alexeib fbshipit-source-id: 296cd2da579a969a71a0f9ffe1062002b73a8d86	2021-07-08 19:26:53 -07:00
Naman Goyal	2fd9d8a972	released xlmr xl and xxl model weights (#1944 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1944 Reviewed By: jingfeidu Differential Revision: D28944206 fbshipit-source-id: 583837f7dd387341574d27dd9acc145455d640a8	2021-06-07 15:05:53 -07:00
Myle Ott	00d5b7adbe	Add README/tutorial for Fully Sharded Data Parallel (#3327 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/3327 Reviewed By: sshleifer Differential Revision: D26899416 Pulled By: myleott fbshipit-source-id: bbb493a5c4e0a51f3b26fe8f94e3962b6206d6f6	2021-03-09 06:31:53 -08:00
Myle Ott	cfbf0dddbc	Small changes to make tests more reliable (#1572 ) Summary: After this, `python setup.py test` should be more reliable (including when multiple GPUs are present) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1572 Reviewed By: alexeib Differential Revision: D25984113 Pulled By: myleott fbshipit-source-id: 7fef27ae90c079c07f592ed9fb350ccf8b56d23d	2021-01-21 07:33:54 -08:00
Myle Ott	336942734c	Better support for WandB (#1530 ) Summary: Logs full config Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1530 Reviewed By: sshleifer Differential Revision: D25723894 Pulled By: myleott fbshipit-source-id: bb4b168c774bef498e336bbb3ba92eda4b08df3b	2021-01-02 10:24:03 -08:00
Dexter Ju	d740093bac	Porting adaptive span to fairseq (#1428 ) Summary: ## What does this PR do? 1. We add an enwiki8 character level LM task sweep for transformer XL, which lands at 1.05 matches the performance (1.06): https://github.com/kimiyoung/transformer-xl/tree/master/pytorch Eval with ``` PYTHONPATH=. python fairseq_cli/eval_lm.py /private/home/daju/data/enwik8/eos-data-bin/ --path /checkpoint/daju/2020-11-19/enwiki8.transformer_xl.fp16.transformer_xl.adam.cl0.25.cosine.lr0.00025.s2.ngpu4/checkpoint_best.pt --user-dir examples/truncated_bptt/ --task truncated_bptt_lm --batch-size 8 --tokens-per-sample 80 --model-overrides '{"mem_len":2100,"clamp_len":820,"same_length":True}' ``` 2. Impalements adaptive span in fairseq code. It reproduces the enwiki8 result at 1.03 comparing to 1.02 (for the 12 L model) reported in https://github.com/facebookresearch/adaptive-span, which is a consistent improvement over the transformer XL baseline listed above with a smaller model. You can evaluate the example run with: ``` PYTHONPATH=. python fairseq_cli/eval_lm.py /private/home/daju/data/enwik8/eos-data-bin/ --path /checkpoint/daju/2020-11-20/enwiki8.adaptivespan.headwise.adaptive_span.adagrad_with_grad_clip.ag_cl0.03.fixed.wu32000.lr0.07.s2.loss5e-07.ngpu4/checkpoint_best.pt --user-dir examples/truncated_bptt/ --task truncated_bptt_lm --batch-size 8 --tokens-per-sample 512 --gen-subset test ``` Paper: https://arxiv.org/abs/1905.07799 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1428 Reviewed By: myleott Differential Revision: D25495754 Pulled By: dexterju fbshipit-source-id: 15a875a5f82d506a4964dea934a374132ce39f8b	2020-12-14 11:37:18 -08:00
Raphael Scheible	f3d5045a71	add German RoBERTa model (GottBERT) (#2992 ) Summary: # Before submitting - There is no related issue for this pull request. - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - We did not see any necessity for tests. ## What does this PR do? Add German RoBERTa model (GottBERT) Pull Request resolved: https://github.com/pytorch/fairseq/pull/2992 Reviewed By: alexeib Differential Revision: D25494927 Pulled By: myleott fbshipit-source-id: b6790124d7c3c8dc387c141706cd8a527cc950ab	2020-12-11 19:10:49 -08:00
Myle Ott	0a848245f3	Add Truncated BPTT example + TransformerXL (#1410 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1410 Test Plan: - reproduced Transformer-XL results (see README) - added integration test Reviewed By: jingfeidu Differential Revision: D24928966 Pulled By: myleott fbshipit-source-id: 86376c17ab24d37e72e7c097b6dcec71b1a087a7	2020-11-15 19:47:42 -08:00
alexeib	bd2e804b9c	add and link hydra docs (#1405 ) Summary: updates hydra integration doc and links to it Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1405 Reviewed By: myleott Differential Revision: D24808779 Pulled By: alexeib fbshipit-source-id: a50160e196e469e30e39d6ee47440a569c0154bd	2020-11-07 17:25:18 -08:00
Myle Ott	b4d57c6d49	Move TPU grad reductions out of Trainer into TPUDistributedDataParallel (#1397 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1397 Data parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001` Data parallel before: ``` 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:20:14 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:20:14 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:20:14 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:20:19 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=5 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=160870, ups=4.91, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=517232, ups=15.77, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=537322, ups=16.38, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=540488, ups=16.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=543411, ups=16.57, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=540359, ups=16.47, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=533446, ups=16.26, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=539734, ups=16.46, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel after: ``` 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:03 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:03 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:03 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:08 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=157099, ups=4.79, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=560049, ups=17.08, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=558507, ups=17.03, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=514194, ups=15.68, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=552676, ups=16.85, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=546402, ups=16.66, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=508472, ups=15.5, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=552493, ups=16.84, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel command (no_c10d): `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001 --dp-backend no_c10d` Data parallel before: ``` 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:19:25 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:19:26 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:19:26 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:19:31 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:19:31 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=141659, ups=4.32, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=503762, ups=15.36, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=488599, ups=14.9, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=507855, ups=15.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=503270, ups=15.34, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=467778, ups=14.26, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=503800, ups=15.36, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=468486, ups=14.28, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Data parallel after: ``` 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:50 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:51 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:51 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:56 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=137677, ups=4.2, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=519541, ups=15.84, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=517063, ups=15.76, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=490728, ups=14.95, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=505262, ups=15.41, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=508874, ups=15.52, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=518028, ups=15.79, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=515996, ups=15.73, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Model parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm_megatron --decoder-layers 4 --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --model-parallel-size 2 --share-decoder-input-output-embed --lr 0.0001` Model parallel before: ``` 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:18:38 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:18:39 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:18:39 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:18:44 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=7 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=48371.7, ups=2.95, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72422.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=72933.5, ups=4.45, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=71974.8, ups=4.39, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=72897.8, ups=4.45, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=73044.6, ups=4.46, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73453.1, ups=4.48, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73442.6, ups=4.48, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=9 ``` Model parallel after: ``` 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:12:09 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:12:10 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:12:10 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:12:16 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=53097, ups=3.24, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72355.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=70526.4, ups=4.3, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=73063.5, ups=4.46, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=73559.4, ups=4.49, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=72693.2, ups=4.44, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73531.2, ups=4.49, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:19 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73187.6, ups=4.47, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=10 ``` Test Plan: Imported from OSS Reviewed By: ngoyal2707 Differential Revision: D24729295 Pulled By: myleott fbshipit-source-id: beee8bdece3eaa0419a2e813990420411e507c75	2020-11-05 15:29:33 -08:00
Myle Ott	dd52ed0f38	Small fixes (#1392 ) Summary: - Set default value of clip-norm back to 0.0 (disabled) - Add comment explaining that we divide loss by log(2) to covert the base - Fix `--zero-optimizer=os` (fixes #2811) - Update requirements to PyTorch >= 1.5 - Fix bug in fixed LR schedule Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1392 Reviewed By: alexeib Differential Revision: D24714231 Pulled By: myleott fbshipit-source-id: 63dc8cfc74683bbccbf05b44228014eb12ddbfc7	2020-11-03 20:45:06 -08:00
Armen Aghajanyan	f2fa07106c	RXF OS Implementation (#2455 ) Summary: ## What does this PR do? Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156 This code was used to generate all the results from the paper excluding probing results. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455 Reviewed By: myleott Differential Revision: D23444863 Pulled By: AkshatSh fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b	2020-10-16 14:32:12 -07:00
Xian Li	573c2f4b60	Opensource code for Deep Transformer with Latent Depth (#2703 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf). New features and design choices made: - New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`. - Design choice: move `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`. - New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z. - Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer. - New feature: allow multilingual_translation task to train with latent depth (results in the paper). - Design choice: - added additional arguments in the multilingual_translation task. - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder. - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703 Reviewed By: myleott Differential Revision: D24155059 Pulled By: xianxl fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668	2020-10-15 09:26:05 -07:00
Chau Tran	a2d0be4989	Add CRISS README and code to fairseq (#1344 ) Summary: # Before submitting - [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [Y] Did you make sure to update the docs? - [N/A] Did you write any new necessary tests? ## What does this PR do? Add code to reproduce results from Cross-lingual Retrieval for Iterative Self-supervised Training. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1344 Test Plan: Imported from GitHub, without a `Test Plan:` line. See https://github.com/fairinternal/fairseq-py/tree/criss_pr/examples/criss Reviewed By: myleott Differential Revision: D24268469 Pulled By: chtran fbshipit-source-id: d4dd36b22bde3c364ce6e935bd39baf8f96e0735	2020-10-14 10:34:51 -07:00
Myle Ott	a524832d1d	Publish Linformer to public fairseq Summary: Initial open source release for Linformer Reviewed By: madian9 Differential Revision: D22771263 fbshipit-source-id: bf08c64c5ecb899db9da00b79d09f6308347c915	2020-09-28 15:32:20 -07:00
Myle Ott	703fd48bb1	Fix README and #2496 (#2505 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2505 Reviewed By: shruti-bh Differential Revision: D23247882 Pulled By: myleott fbshipit-source-id: 1cfc9e0128e1aa55a1aca31d8dd30f231558e70f	2020-08-20 15:44:17 -07:00
Matt Post	bd1b35d9b7	Added constrained decoding (#1536 ) (#2402 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add ordered constraints, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references). ### Usage and quick start It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)): ```bash echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \ \| [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \ \| [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \ \| PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \ --bpe fastbpe \ --bpe-codes $modeldir/bpecodes \ --constraints \ --constraints-both -s de -t en \ --path $modeldir/model1.pt \ --max-tokens 1000 \ --beam 5 \ ``` Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!) ```text S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren . W-0 1.844 seconds C-0 hard C-0 influence H-0 -1.5333266258239746 Mach@@ ine trans@@ lation is hard to influence . D-0 -1.5333266258239746 Machine translation is hard to influence . P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511 S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren . W-0 1.844 seconds H-0 -0.3731671869754791 Mach@@ ine trans@@ lation is difficult to control . D-0 -0.3731671869754791 Machine translation is difficult to control . P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514 2020-07-31 12:17:55 \| INFO \| fairseq_cli.interactive \| Total time: 12.803 seconds; translation time: 3.688 ``` Note the new tags present in the output: * `C-#` records active constraints (after applying preprocessing) for a sentence * `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept) Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it. ### Implementation notes This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch. Addresses https://github.com/pytorch/fairseq/issues/1536. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402 Reviewed By: alexeib Differential Revision: D23188945 Pulled By: myleott fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6	2020-08-20 11:59:53 -07:00
Myle Ott	9831634946	Misc fixes (#2448 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2448 Reviewed By: ngoyal2707 Differential Revision: D23011193 Pulled By: myleott fbshipit-source-id: 1a29481707108e4465aca78ec1581fb79f05efba	2020-08-14 10:24:51 -07:00
alexeib	621e834103	wav2vec 2.0 (#1220 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1220 Test Plan: Please see examples/wav2vec/README.md for instructions Reviewed By: edunov Differential Revision: D22707565 Pulled By: alexeib fbshipit-source-id: 0c0d4ca7acc933ef7c0062f8dce550b94e414680	2020-08-04 14:19:56 -07:00
Myle Ott	f0a61a2774	Miscellaneous fixes (#1196 ) Summary: Incorporate several fixes, incl. from OSS contributors: - fix model argument in sequence generator in semisupervised_translation.py - fix aggregate logging in semisupervised_translation.py - Fix EOS token in multilingual_denoising - Handle missing eos_idx in data_utils.collate_tokens - Better OOM handling for single-GPU training - fix prepend_bos argument in translation_from_pretrained_bart.py … - Fix eos_idx in multilingual_denoising - Small logging fixes - Fix fb_hub on PyTorch 1.6 - Better variable names - Add support for model parallel to interactive.py - Use `//` operator to fix Integer division warning - Set default `--clip-norm=0.0` - Cleanup some binaries in root directory Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196 Reviewed By: ngoyal2707 Differential Revision: D22162202 Pulled By: myleott fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d	2020-06-24 10:08:53 -07:00
Myle Ott	145bc9de12	Several small fixes (incl. set default --data-buffer-size=10) (#2163 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2163 Reviewed By: ngoyal2707 Differential Revision: D21665601 Pulled By: myleott fbshipit-source-id: 47673ff7f07acf0002c4e28380aa08ff917618ee	2020-05-26 15:59:59 -07:00
Xutai Ma	12d5e6ff16	Monotonic multihead attention (#1707 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Add code for published paper from FB ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Still WIP jmp84 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1707 Reviewed By: jmp84 Differential Revision: D21304498 Pulled By: xutaima fbshipit-source-id: 073d522e0eeef3e02c83e4617b8e5b697ff6979b	2020-04-30 13:26:32 -07:00
Angela Fan	1c8ab79ca5	quant noise code, readme, start of adding quantization (#1896 ) Summary: FUNCTIONALITY: This diff provides two core pieces of functionality - Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training - Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training. TODO: -> Pierre, look at quantization code -> int4 and int8 quantization will be added soon. EVALUATED TEST CASES: 0. Training of LM and BERT models starts from scratch with no errors -> yes 1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise 2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes 3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline 4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes 5. Check wps during training and wps during inference, no large change from before -> yes 6. Check structured dropout isn't being applied at eval time -> yes 7. Works in combination with LayerDrop -> yes Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896 Reviewed By: myleott Differential Revision: D20609420 Pulled By: huihuifan fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955	2020-04-21 09:28:56 -07:00
Naman Goyal	78a995db2f	adding readme and releasing megatron big model (#1124 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1124 Reviewed By: myleott Differential Revision: D20749898 fbshipit-source-id: 42bca96d8d65158ae858ceaa7386afedf1696ebb	2020-04-03 12:03:56 -07:00
Myle Ott	5065077dfc	Use cross entropy from apex for improved memory efficiency (#1122 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1122 Reviewed By: ngoyal2707 Differential Revision: D20745717 Pulled By: myleott fbshipit-source-id: 877a1185f17952461ef204d8ad7f05b8d37b1fd9	2020-03-31 08:59:59 -07:00
Changhan Wang	fdac9bbce1	Byte-Level BPE paper code Summary: Implemented byte-level BPE described in ["Neural Machine Translation with Byte-Level Subwords"](https://arxiv.org/abs/1909.03341) * Added bytes/characters/byte-level BPE tokenizers to fairseq.data.encoder * Added detokenization option to generate.py * Added an example under examples/byte_level_bpe * Implemented Transformer model with Bi-GRU embedding contextualization: `examples/byte_level_bpe/gru_transformer.py` Reviewed By: myleott Differential Revision: D20600963 fbshipit-source-id: 3eca4d046056c07f65333123416017a4eac04c8a	2020-03-24 07:59:05 -07:00
Myle Ott	11cc356395	fairseq requires PyTorch >= 1.4 (#1844 ) Summary: Fixes https://github.com/pytorch/fairseq/issues/1843 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1844 Differential Revision: D20468391 Pulled By: myleott fbshipit-source-id: 0b2e2ba35c94eeb49d0e6bb05a8fefa4b847f46d	2020-03-16 07:45:50 -07:00
Stefan Schweter	3dd221c90f	Readme: Fix link to mBART documentation (#1789 ) Summary: Hi, this PR updates the link to mBART documentation in main readme. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1789 Differential Revision: D20322673 Pulled By: myleott fbshipit-source-id: b59c94f49176ba5bbd664791818b5b8ce7402698	2020-03-07 10:04:46 -08:00
Yinhan Liu	5e79322b3a	open source mbart (#1033 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1033 Differential Revision: D20122520 Pulled By: yinhanliu fbshipit-source-id: e2fd93e2fa9b7a8e276acc4316a176ba3ceae4ed	2020-02-27 08:30:43 -08:00
Myle Ott	37bd90845c	Update README (#1051 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1051 Differential Revision: D20119560 Pulled By: myleott fbshipit-source-id: caf089341931990777393b916c846a332b76e9dc	2020-02-26 09:13:47 -08:00
Pengcheng YIN	9090dad8d1	Update README for apex (#1563 ) Summary: Recent releases of apex removed the `fused_adam_cuda` function used in `3f4fc50163/fairseq/optim/adam.py (L220)`. Users need to use the `--deprecated_fused_adam` option to isntall `fused_adam_cuda` # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1563 Differential Revision: D19260517 Pulled By: myleott fbshipit-source-id: 69af015f3ef1fa85b98d138c28876ada194c9437	2020-01-02 11:37:30 -08:00
Myle Ott	6b4700cea4	Update README.md Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1521 Differential Revision: D19159323 Pulled By: myleott fbshipit-source-id: 7e3fefc29229a90bcffe8bb1c5cff3507712d94c	2019-12-18 07:04:19 -08:00
Myle Ott	dfde36bc66	Create build.yml Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1515 Differential Revision: D19151562 Pulled By: myleott fbshipit-source-id: 426eca1e449cac914d49877678323a6487c0adbe	2019-12-17 20:45:11 -08:00
Myle Ott	05514f8a82	Update README to indicate we only support Python >= 3.6 (fixes #1317 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/952 Differential Revision: D19133348 Pulled By: myleott fbshipit-source-id: 51f96ddb13386143fe0088f19f7cb0674755811f	2019-12-16 19:46:53 -08:00
Changhan Wang	7612eefc6d	add VizSeq to README Summary: add VizSeq to README Reviewed By: MultiPath Differential Revision: D18877679 fbshipit-source-id: f1de226e37b19ec967dfcec91216521d4e5b6e22	2019-12-08 11:08:46 -08:00
Myle Ott	cb6c67bcdb	Make torch.hub interface automatically apply tokenization and BPE Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926 Differential Revision: D18685772 Pulled By: myleott fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0	2019-11-26 07:49:37 -08:00
Louis Martin	b31849aa92	Camembert model and code (#904 ) Summary: Check locally that everything works fine. Model is uploaded to fbaipublicfiles. I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306.. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904 Reviewed By: ngoyal2707 Differential Revision: D18418345 Pulled By: louismartin fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17	2019-11-10 11:29:07 -08:00

1 2 3

101 Commits