fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

Author	SHA1	Message	Date
dianaml0	51478ad3a1	xformer integration (#2263 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? This PR is a cleaned up version of https://github.com/fairinternal/fairseq-py/issues/2138. It is based on the `main` branch instead of the `gshard` branch. Removed call to xFormers MultiHeadDispatch, only using xFormers Attention. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � X-link: https://github.com/fairinternal/fairseq-py/pull/2263 Reviewed By: blefaudeux Differential Revision: D33800377 Pulled By: dianaml0 fbshipit-source-id: 658d52214c782212b12881b30c4d908a763b4cf2	2022-05-04 09:15:36 -07:00
Diana Liskovich	0b54d9fb2e	fix formatting (#3350 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � X-link: https://github.com/fairinternal/fairseq-py/pull/3350 Reviewed By: shruti-bh Differential Revision: D36009526 Pulled By: dianaml0 fbshipit-source-id: 9cdc3d53086b8d40a780bcb64cfe28108091ab98	2022-04-28 14:17:09 -07:00
Diana Liskovich	72d3408481	Pull out some code into separate methods (#3068 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Pulling out some changes from https://github.com/fairinternal/fairseq-py/pull/2263 unrelated to xformers to make the PR cleaner ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � X-link: https://github.com/fairinternal/fairseq-py/pull/3068 Reviewed By: blefaudeux Differential Revision: D34149016 Pulled By: dianaml0 fbshipit-source-id: 6442a5f451d56cc47106227298a624516b19a9ad	2022-04-27 16:54:02 -07:00
Alexander Jipa	355ffbe4e2	add masked_lm test (#4344 ) Summary: # Before submitting - [X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [X] Did you make sure to update the docs? - [X] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/4300 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Big time! Note: I had to update `black` because of [this known issue](https://github.com/psf/black/issues/2964): ``` black....................................................................Failed - hook id: black - exit code: 1 Traceback (most recent call last): File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/bin/black", line 8, in <module> sys.exit(patched_main()) File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main patch_click() File "/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click from click import _unicodefun ImportError: cannot import name '_unicodefun' from 'click' (/Users/azzhipa/.cache/pre-commit/repoxt83whf2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py) ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/4344 Reviewed By: zhengwy888 Differential Revision: D35691648 Pulled By: dianaml0 fbshipit-source-id: 4bdf408bc9d9cca76c9c08e138cf85b1d00d14d4	2022-04-18 14:47:00 -07:00
spopuri	420136acd2	fix failing convtransformer test (#3107 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3107 Reviewed By: cndn Differential Revision: D34354339 Pulled By: sravyapopuri388 fbshipit-source-id: 50888706123d246c13d2cbb22d0e043740ff6bf5	2022-02-22 11:24:11 -08:00
Sravya Popuri	67eaecd2fc	Add regression test for SimulConvTransformerModel (#3031 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3031 Reviewed By: kahne Differential Revision: D34018108 Pulled By: sravyapopuri388 fbshipit-source-id: 4db96653658a998b15c0cdbc2e588198d951a420	2022-02-16 09:32:21 -08:00
dianaml0	5f2515e676	Fix failing test (#3065 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3065 Reviewed By: Mortimerp9 Differential Revision: D34144674 Pulled By: dianaml0 fbshipit-source-id: 842b0d29c9c85d4b56b640f2823fcb4e3f912f98	2022-02-10 12:17:47 -08:00
Sravya Popuri	8b02f00e8a	fix s2s test - disable multitasking by setting multitask_config_yaml to None (#3059 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3059 Reviewed By: kahne Differential Revision: D34083178 Pulled By: sravyapopuri388 fbshipit-source-id: a33af1696570be4826973b19fe34177bcf851e06	2022-02-09 10:05:22 -08:00
Sravya Popuri	11b2830d29	Refactor speech tests and add missing regression tests (#3001 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3001 Reviewed By: kahne Differential Revision: D33904550 Pulled By: sravyapopuri388 fbshipit-source-id: f55f8121d83e5abebdfcf7ac90dcba39f65cafaf	2022-02-04 14:35:02 -08:00
Vimal Manohar	6b7a7d6457	Fix EMA GPU test Summary: The GPU test was broken after D33809223 (`1b61bbad32`) Reviewed By: cruvadom Differential Revision: D33931570 fbshipit-source-id: 37962a437d8e25b1dafc58db0efa55c1afa5f3ee	2022-02-04 09:10:06 -08:00
Pierre Andrews	f591cc94ca	upgrade black for lints (#3004 ) Summary: This is the same as https://github.com/fairinternal/fairseq-py/issues/3003 but for main instead of gshard. the lint test will run the latest version of black, which is 22.1.0 right now and seems to be incompatible with the 21.12b0 version that is setup in pre-commit. This means that some files were with valid format in the past, but are not anymore... This PR formats these files with 22.1.0 and autoupdates pre-commit config to use that black version too. (note: this is the second time it happens. a solution would be to pin the lint test to the same version as the one in the pre-commit hook and that was used to format everything clean so that we have a stable formating) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3004 Reviewed By: dianaml0 Differential Revision: D33917490 Pulled By: Mortimerp9 fbshipit-source-id: d55e800b976f94545cdab4132daa7c45cbd0e34c	2022-02-02 04:31:33 -08:00
Vimal Manohar	1b61bbad32	Fix broken EMA in fairseq Summary: EMA broken since D33649708 (`995c204337`) due to indentation error. Reviewed By: cruvadom Differential Revision: D33809223 fbshipit-source-id: c6c4d0d327443bfea787817040e1832eef0f50e4	2022-01-27 13:02:58 -08:00
alexeib	995c204337	Data2vec prelim (#2929 ) Summary: Preliminaries for data2vec release, include some minor improvements and bug fixes Most important change is that we now default to raising an exception when fields in config do not have a corresponding field in the model dataclass Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2929 Reviewed By: wnhsu Differential Revision: D33649708 Pulled By: alexeib fbshipit-source-id: 629bdb4c361550740b451c570c2005bb956c6fcb	2022-01-20 00:02:16 -08:00
Liang Tan	1575f30dd0	Add ffn prune to fairseq Summary: Support FFN prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--ffn-blocks-to-remove 1024". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly. The idea of prune can be summarized as Fine tune model (e.g. roberta encoder) on a certain datasets with regularization After the model is trained. User could use _get_fc_rank and _prune_fc_layer functions to get the top X blocks with most importance in each transformer layer. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually. User will fine tune the the new roberta encoder via the ckpt saved above Reviewed By: dianaml0 Differential Revision: D33525055 fbshipit-source-id: 5087140ee891d6ec9266726e3a477947c233412c	2022-01-14 16:26:59 -08:00
Vimal Manohar	cf8ff8c3c5	Add unittests for jitting EMA model Summary: As title Reviewed By: nayansinghal Differential Revision: D32005717 fbshipit-source-id: ebdf1ed0e4a2b9fccffd841d0fa7be0b50ec6b79	2022-01-13 01:53:42 -08:00
Pierre Andrews	279796224f	Preprocess Split (#2738 ) Summary: This is the equivalent to PR https://github.com/fairinternal/fairseq-py/issues/2697 but on top of main instead of gshard (cherry-picked and merged the squash): * reorganize preprocess.py code a bit * use Binarizers objects in the multiprocess code * clean up the make_binary * multiprocess logic * learn to count * format and doc string * add basic test for vocab binarizer * generalize to one line * move multiprocess in binarizer Testing: ``` python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.cherry --workers 20 python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.main --workers 20 ``` ``` md5sum ~/fixathon/small_vocab_test/data-bin.cherry/train.bin == md5sum ~/fixathon/small_vocab_test/data-bin.main/train.bin ``` ``` diff ~/fixathon/small_vocab_test/data-bin.main/dict.txt ~/fixathon/small_vocab_test/data-bin.cherry/dict.tx ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2738 Reviewed By: sshleifer, dianaml0 Differential Revision: D32830875 Pulled By: Mortimerp9 fbshipit-source-id: e7463d5cdd96a877691bf39666daa319ebb3dcb8	2022-01-11 11:56:46 -08:00
Liang Tan	b3fa5100c6	Add mha prune to fairseq Summary: Support multihead attention prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--mha-heads-to-keep 8". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly. The idea of prune can be summarized as 1. Fine tune model (e.g. roberta encoder) on a certain datasets with regularization 2. After the model is trained. User could use get_reserve_head_index and _adaptive_prune_heads functions to get the top X heads with most importance. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually. 3. User will fine tune the the new roberta encoder via the ckpt saved above To get rid of registering different pruned version of Roberta, I use the argument --mha-heads-to-keep to prune the Roberta model into a pruned version which matches the pruned ckpt. Reviewed By: dianaml0 Differential Revision: D32449003 fbshipit-source-id: a952fd9ad723a6dbc5c2af574c42f2e9a1fa27dc	2022-01-11 10:09:07 -08:00
Sravya Popuri	40ff55abbe	conformer (#2859 ) Summary: This PR - Adds conformer layer based on https://arxiv.org/pdf/2005.08100.pdf. - Conformer implementation supports multihead attention based on 3 different positional embedding types - absolute positional embedding, relative positional encoding and rotational positional embedding. - Adds conformer encoder with conv1d subsampling, positional embedding followed by N conformer layers - Adds S2T_Conformer model based on the conformer encoder and transformer decoder. - Add conformer support in Wav2Vec2 - Add unit tests for core modules Verfication - Verified the set up on MUST-C En-De S2T, Covost2 Es-En S2T, Librispeech ASR to ensure the implementation is correct. - For S2T setups, the performance is either similar to the transformer based models or better. - Wav2vec2 pretraining and finetuning based on librispeech showed improvements over corresponding transformer baselines. - [WIP] Experiment log: https://docs.google.com/document/d/1QI-ROWVenUEXPJoHTaKD85Fq7T8ZXNc8bc54MzgwJjA/edit# Next steps - Add regression tests - Add README and open source checkpoints Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2859 Reviewed By: kahne Differential Revision: D33434092 Pulled By: sravyapopuri388 fbshipit-source-id: 62f22b917a332481370750e04a439e05832a2282	2022-01-10 16:18:38 -08:00
Yun Tang	e69f1fa37f	speech integration tests for jointly trained models Summary: Add test for DualInputS2TTransformerModel at examples/speech_text_joint_to_text/models/s2t_dualinputtransformer.py Reviewed By: kahne Differential Revision: D33284188 fbshipit-source-id: c02b697fc7734425661e00bbb606852b5d94a587	2022-01-07 12:45:20 -08:00
Changhan Wang	ee177fc4fa	add xm_transformer test; refactor speech tests Summary: add xm_transformer test; refactor speech tests Reviewed By: sravyapopuri388 Differential Revision: D33312231 fbshipit-source-id: a2b2695fc3c10d5420abbe23a4a3005777aa2ae1	2021-12-31 12:31:11 -08:00
Liang Tan	2762a1cfef	Add regularization for multihead attention module and ffn module Summary: [Fairseq] Add regularization for multihead attention module and ffn module Reviewed By: dianaml0 Differential Revision: D32441521 fbshipit-source-id: c648c1f8ec1a3310ba90c4952cdd40a21b959d26	2021-12-30 02:02:05 -08:00
Diana Liskovich	7fddb9d960	lint fixes (#2834 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Applied `black` and `isort` to fix failing CI ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2834 Reviewed By: vedanuj Differential Revision: D33262876 Pulled By: dianaml0 fbshipit-source-id: 03215c276fcddda9f7c78971bf6ed7c5ac21b2ee	2021-12-29 11:50:55 -08:00
Xian Li	7f3967805f	add readme for xglm models (#2808 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Add readme and task for xglm models. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2808 Reviewed By: punitkoura Differential Revision: D33237928 Pulled By: xianxl fbshipit-source-id: 7773cf56e896210dab1f4311ae69f0e00c6d9aff	2021-12-20 13:05:17 -08:00
Diana Liskovich	a54021305d	formatting fix (#2816 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? fix `black` failures ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2816 Reviewed By: alexeib Differential Revision: D33172615 Pulled By: dianaml0 fbshipit-source-id: 36b141f42941670f1bfa981041d878042feb0428	2021-12-16 16:11:19 -08:00
Changhan Wang	7b0159a202	add integration test for fastspeech2 Summary: Adding integration test (based on test set scores on pre-trained checkpoints) for fastspeech2 Reviewed By: yuntang Differential Revision: D33143301 fbshipit-source-id: dca0841b43dd1cb2933ce5c652ed3cdff0fc4a52	2021-12-15 16:15:38 -08:00
Changhan Wang	ee833ed49d	speech integration tests (batch 1) Summary: Adding the first batch of speech integration tests (based on test set scores on pre-trained checkpoints) for - S2T transformer - TTS transformer Reviewed By: yuntang Differential Revision: D33050653 fbshipit-source-id: fb5bb9f46e8e17cb705971ca1990c8e1cb99d5f9	2021-12-14 17:42:18 -08:00
Jingfei Du	16ebfa752c	Revert preix beamsearch fix (#2763 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? reverting to fix issue mentioned [here](https://github.com/pytorch/fairseq/issues/3913). Having another PR for fixing the original issue later. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2763 Reviewed By: myleott Differential Revision: D33000411 Pulled By: jingfeidu fbshipit-source-id: 95a54cbdc612129a0eab4b5e6aa576a5bcf00588	2021-12-14 13:22:09 -08:00
Changhan Wang	8548f1d401	Add loading from HuggingFace Hub Summary: Add loading from HuggingFace Hub. Revised from and to replace D32697723 (accepted). Reviewed By: pipibjc, dianaml0 Differential Revision: D32964041 fbshipit-source-id: 39676aa0ecb10454ae76b70968d5abe96ab6da54	2021-12-10 16:55:12 -08:00
dianaml0	88e7d2586b	fix flake8 issues (#2570 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? - [x] applies flake8 fixes to main branch (https://github.com/fairinternal/fairseq-py/issues/2546) - still more to be fixed Fix GPU tests: - [x] when torch.ao.quantization import doesn't work use torch.quantization - [x] build apex from earlier commit in circleci so that its compatible with pytorch 1.8 and 1.9 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2570 Reviewed By: Mortimerp9 Differential Revision: D32955312 Pulled By: dianaml0 fbshipit-source-id: e163cbd4998f171f819e31b0682c1c0f1986f9e1	2021-12-09 02:34:30 -08:00
dianaml0	0dfd6b6240	Add linting with black (#2678 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2678 Reviewed By: Mortimerp9 Differential Revision: D32653381 Pulled By: dianaml0 fbshipit-source-id: 2810d14867cd7d64f4d340740e2b590b82de47fe	2021-11-29 12:32:59 -08:00
Sam Shleifer	fb64e43c67	skip remainder batch (#2464 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2464 Reviewed By: myleott Differential Revision: D31742871 Pulled By: sshleifer fbshipit-source-id: e5d29ca9d594abd92212eb24b60c991f2840a4e8	2021-11-24 07:50:50 -08:00
Vinayak Tantia	3a5838c320	Update implemention of SlowMo to its implementation in Fairscale (#3996 ) Summary: - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo. Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](https://github.com/facebookresearch/fairscale/pull/378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3996 Reviewed By: dianaml0 Differential Revision: D32280163 Pulled By: vtantia fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70	2021-11-09 09:44:45 -08:00
Sam Shleifer	c5ff181125	NormFormer: flags and docs (#2460 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2460 Reviewed By: myleott Differential Revision: D31731798 Pulled By: sshleifer fbshipit-source-id: 938456c17aa004cacffdcdd124aebe390da83d5f	2021-10-19 17:13:04 -07:00
Vimal Manohar	1ef3d6a1a2	CPLTask for training with continuous pseudo labeling Summary: CPLTaskImpl provides implementation to augment existing tasks to take additional input of ema_model in its train_step and valid_step for continous pseudo-labeling (CPL) during training. It passes this ema_model to the criterion. See Kaizen semi-supervised training paper for more details https://arxiv.org/abs/2106.07759. This implementation also supports using CPLDataset which enables using unsupervised data only for `cpl_finetune_epoch > epochs >= cpl_start_epoch`. CPLDataset is like MultiCorpusDataset but ignores the unsupervised datasets while sampling. Another addition in this diff is to skip dataset in MultiCorpusDataset if the sampling probability is 0. Reviewed By: cruvadom Differential Revision: D30701536 fbshipit-source-id: 1d840eacfd538ed7aed3baaefc8b254390642b45	2021-10-14 22:09:07 -07:00
Vimal Manohar	8feccf9441	EMA Summary: Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759 1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint. 2. `ema.ema_start_update` to control when the EMA starts accumulating 3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False) 4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this. ``` This module has the EMA class used to store a copy of the exponentially decayed model params. Typical usage of EMA class involves initializing an object using an existing model (random or from a seed model) and setting the config like ema_decay, ema_start_update which determine how the EMA model is updated. After every update of the model i.e. at the end of the train_step, the EMA should be updated by passing the new model to the EMA.step function. The EMA model state dict can be stored in the extra state under the key of "ema" and dumped into a checkpoint and loaded. The EMA object can be passed to tasks by setting task.uses_ema property. EMA is a smoothed/ensemble model which might have better performance when used for inference or further fine-tuning. EMA class has a reverse function to load the EMA params into a model and use it like a regular model. ``` Reviewed By: cruvadom Differential Revision: D24238379 fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2	2021-09-01 12:29:51 -07:00
Pierre Andrews	68a81202a3	Indexed Huffman Coded dataset (#2029 ) Summary: ## What does this PR do? Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk. When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies. In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols. In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it. Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space. ## Follow Ups currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing: ``` code_builder = HuffmanCodeBuilder() with open(sample_file, 'r', encoding="utf-8") as input: for line in input: code_builder.add(*line.strip().split(" ")) coder = code_builder.build_code() with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder: with open(dataset_file, 'r', encoding="utf-8") as input: for line in input: builder.add_item(line.strip().split(' ')) ``` a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2029 Reviewed By: dianaml0 Differential Revision: D29557468 Pulled By: Mortimerp9 fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe	2021-08-31 01:12:35 -07:00
Jingfei Du	932a3d4aad	fix beam search with prefix tokens (#2227 ) Summary: 1. added test for genereting pad tokens during beam search with prefix tokens 2. modified lprobs for pad token and prefix tokens to avoid generating pad # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2227 Reviewed By: xianxl Differential Revision: D30649356 Pulled By: jingfeidu fbshipit-source-id: d94903a912e767391c8fca61f98f65b5cea3b56e	2021-08-30 18:07:13 -07:00
Pierre Andrews	bc1504d4d7	Hierarchical Configs Summary: This is a precursor to D29232595 The current behaviour to convert a dataclass to a namespace is that all the fields from all DCs in the field hierarchy are flattened at the top. This is also the legacy behaviour with `add_args`. This is kind of cumbersome to build reusable Dataclasses as we need to make sure that each field has a unique name. In the case of Transformer for instance, we have a Decoder and Encoder config that share a large part of their fields (embed_dim, layers, etc.). We can build a single dataclass for this that can be reused and extended in other implementations. To be then able to have a flat namespace, instead of adding all subfields as is to the root namespace, we introduce the name of the field as prefix to the arg in the namespace. So: `model.decoder.embed_dim` becomes `decoder_embed_dim` and `model.encoder.embed_dim` becomes `encoder_embed_dim`. Reviewed By: myleott, dianaml0 Differential Revision: D29521386 fbshipit-source-id: f4bef036f0eeb620c6d8709ce97f96ae288848ef	2021-07-16 04:56:12 -07:00
Omry Yadan	dd106d9534	fixes tests/test_train.py to mock checkpoint.save_dir config node (#3675 ) Summary: ## What does this PR do? Some downstream users reported that errors when passing Namespace to load_checkpoint(). A recent change made the assumption that the passed object is dict like (dict or DictConfig) that have a get function. This changes that and make sure the mocked config have checkpoint.save_dir to allow the test to run. Pull Request resolved: https://github.com/pytorch/fairseq/pull/3675 Reviewed By: omry Differential Revision: D29564805 Pulled By: lematt1991 fbshipit-source-id: 89308811da382667f6c5d3152ee2d6480416ee62	2021-07-06 15:07:31 -07:00
Pierre Andrews	53bf2b1293	Extract File Chunking to its own utils (#1955 ) Summary: ## What does this PR do? there are a few places where we do file chunking for multiprocessing a single file. However, the code is partly in Binarizer and partly just duplicated here and there. This PR extracts the file chunking/reading logic. The multiprocessing logic could probably be extracted too, but I haven't found a good abstraction yet. # Testing Added testing for this reading logic + maybe fixed a bug where the last part of a file might get dropped (even if it's unclear with the current stopping logic) Tested by running the preprocessing script as follow: ``` python -m fairseq_cli.preprocess --source-lang de --target-lang en --trainpref ...train.spm.clean.de_en --srcdict ...fairseq.dict --tgtdict .../fairseq.dict --destdir ... --workers 60 ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1955 Reviewed By: myleott Differential Revision: D29065473 Pulled By: Mortimerp9 fbshipit-source-id: c60843de8cfd45a63b3dbb8290f57ef3df3bf983	2021-06-28 01:46:32 -07:00
Diana Liskovich	50158da3a7	Migrate DummyMaskedLMTask to FairseqTask (#3593 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3593 Reviewed By: msbaines Differential Revision: D28992614 Pulled By: dianaml0 fbshipit-source-id: b2dfcab472a65c41536e78600a0e6b3745dc3a08	2021-06-10 09:43:08 -07:00
Mandeep Singh Baines	9497ae3cfb	disable raise_if_valid_subsets_unintentionally_ignored check for dummy tasks (#3552 ) Summary: Fixes the following crash: ```python Traceback (most recent call last): File "/private/home/msb/.conda/envs/fairseq-20210102-pt181/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/private/home/msb/code/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main main(cfg, *kwargs) File "/private/home/msb/code/fairseq/fairseq_cli/train.py", line 117, in main data_utils.raise_if_valid_subsets_unintentionally_ignored(cfg) File "/private/home/msb/code/fairseq/fairseq/data/data_utils.py", line 584, in raise_if_valid_subsets_unintentionally_ignored other_paths = _find_extra_valid_paths(train_cfg.task.data) AttributeError: 'Namespace' object has no attribute 'data' ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/3552 Reviewed By: sshleifer Differential Revision: D28667773 Pulled By: msbaines fbshipit-source-id: bc9a633184105dbae0cce58756bb1d379b03980a	2021-05-27 12:15:31 -07:00
Gagandeep Singh	237184e522	Add torch.cuda.amp support (#3460 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/3282 Add support for `torch.cuda.amp` AMP can be enabled by `--amp`, instead of using `--fp16` for the already present full fp16 support. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3460 Reviewed By: sshleifer, msbaines Differential Revision: D27932253 Pulled By: myleott fbshipit-source-id: 21637aefb5e788c59bf4f3c5de6c4a80f7319543	2021-05-26 14:39:10 -07:00
Weiyi Zheng	8df9e3a4a5	support FSDP sharded_state checkpoint loading during inference Summary: using the very useful feature added by QuentinDuval https://github.com/facebookresearch/fairscale/pull/683/files , we can consolidate sharded states into a full regular states. this allows inferences on sharded state almost transparently. The main complexity comes from trying to be smart about what kind of checkpoint the user wants to load. not sure if this is over-engineering 1. if the file checkpoint-shard0.pt exists, and `--checkpoint-shard-count` is > 1, then we load sharded FSDP checkpoint 2. if checkpoint-shard0.pt exists but --checkpoint-shard-count=1, we load consolidated FSDP checkpoint 3. if checkpoint-shard0.pt does not exist, but --checkpoint-shard-count > 1, we load model parallel checkpoint 4. otherwise we are loading a single, plain checkpoint. In theory we could be even smarter and load shard0.pt to check how many more checkpoints are needed. this is not implemented, though it will save the user having to specify --checkpoint-shard-count. Reviewed By: sshleifer Differential Revision: D28563441 fbshipit-source-id: dcafcaa7c9eaf5c9ff94f55c16bb3424c98dfa59	2021-05-25 17:45:51 -07:00
Sam Shleifer	2be2f3c7c1	Plasma tests: ask for less disk (#1893 ) Summary: Old logs: ``` /arrow/cpp/src/plasma/store.cc:1274: Allowing the Plasma store to use up to 107.374GB of memory. ``` New logs: ``` ... up to 1e-05GB of memory. ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1893 Reviewed By: myleott Differential Revision: D28641488 Pulled By: sshleifer fbshipit-source-id: 3373526042cdcbf434c61790be62a09f15e6ad06	2021-05-24 09:00:18 -07:00
Weiyi Zheng	425c36eaff	support use_sharded_state on command line Summary: we wanted to use sharded_state because 1. to save memory 2. support sharded state loading, which allows MoE models's weight to live on their respective shard I just added the use_sharded_state as a config option, and added unit test to make sure it runs fine. old revision's comment: fairseq.FSDP has a flag use_sharded_state, but I had to address a couple problems before being able to use it. 1. fairscale FSDP (FSDP for short) calls self.state_dict/load_state_dict, which has been overwritten by fairseq.FSDP, this is not a desired behavior 2. the optimizer states shouldn't be sharded again when use_sharded_state is True 3. expose this option on the command line. Reviewed By: sshleifer Differential Revision: D28375035 fbshipit-source-id: c2f59a9c62163405033f34ed595ba78528aea850	2021-05-14 18:53:16 -07:00
Sam Shleifer	97969ac5f5	--combine-valid-sets (#1843 ) Summary: - `--combine-valid-sets` causes valid.bin, valid1.bin, ... to be concatenated. All metrics will be reported together. - `--valid-subsets` works the same. If you pass `--valid-subsets valid1,valid2` you get valid1_loss and valid2_loss logged separately. - if user passes `--valid-subset valid` (the default) and we see files named valid1, valid2 we raise an error. User must pass `--ignore-unused-valid-sets` to override. This previously led to valid1, valid2 being silently ignored. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1843 Reviewed By: myleott Differential Revision: D28323815 Pulled By: sshleifer fbshipit-source-id: dfd46076d3f684e36f8dacfadd38fd0038ce6755	2021-05-10 23:43:24 -07:00
Yun Wang	d6855baec8	Simplify CountingIterator Summary: Simplify the implementation of `CountingIterator`, and added test cases. The old implementation could fail on such a test case: ``` ref = list(range(10)) itr = CountingIterator(ref) first_item = next(itr) # consume one item remaining_items = list(itr) # raises exception because of "length mismatch" ``` This happens because `list(itr)` invokes `itr.__iter__` and reiterate the underlying list from the start, but `itr.n` has been already incremented by `next(itr)`. The new implementation is simpler and avoids such an error. Reviewed By: myleott Differential Revision: D27802505 fbshipit-source-id: c97fd0a27d865c0ff3b24016fa6aa0afabbf0a73	2021-04-29 16:17:00 -07:00
Sam Shleifer	05b86005bc	Fix FSDP optim state loading (#1819 ) Summary: ### Problem: - if we consolidate optim state dict on rank 0, rank 1+ save `optimizer.state_dict()`. When they try to load, they call get_shard(last_optim_state), which is wrong since the optim state is already shared. They should find the global consolidated optimizer state dict and load that. ### Possible Solutions: - if world size is the same, you could just reuse the local OSD. - [this PR] rank 1+ load optim state from the rank0 file and call get_shard - separate file for optim_state that every rank loads. (like 'shared.pt' on `gshard-azure`). This will save some CPU Ram. ### Note: - I don't think it's possible to pass `--use-sharded-state` from the command line. It should be I think. ### Implementation here + if FSDP saves -1 as state['last_optimizer_key'], it means that, on load, rank 0's optim state must be loaded. + regression test Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1819 Reviewed By: zhengwy888 Differential Revision: D27910281 Pulled By: sshleifer fbshipit-source-id: d34987008f77ce7e0cb28b7224dd2aabed38a70c	2021-04-21 15:50:13 -07:00
Michael Anderson	207254bf56	Adding check for filler size (#3495 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/3495 Avoid creating size-0 tensor "filler" in case src_len is the same as key_padding_mask_size or prev_key_padding_mask_size Reviewed By: jackm321 Differential Revision: D27897778 fbshipit-source-id: 26fd95852da2cd932717c7abcac3e1fb43deaf77	2021-04-21 09:09:19 -07:00

1 2 3 4 5 ...

294 Commits