fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 12:00:25 +03:00

Author	SHA1	Message	Date
Vimal Manohar	6b7a7d6457	Fix EMA GPU test Summary: The GPU test was broken after D33809223 (`1b61bbad32`) Reviewed By: cruvadom Differential Revision: D33931570 fbshipit-source-id: 37962a437d8e25b1dafc58db0efa55c1afa5f3ee	2022-02-04 09:10:06 -08:00
Diana Liskovich	7fddb9d960	lint fixes (#2834 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Applied `black` and `isort` to fix failing CI ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2834 Reviewed By: vedanuj Differential Revision: D33262876 Pulled By: dianaml0 fbshipit-source-id: 03215c276fcddda9f7c78971bf6ed7c5ac21b2ee	2021-12-29 11:50:55 -08:00
Xian Li	7f3967805f	add readme for xglm models (#2808 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Add readme and task for xglm models. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2808 Reviewed By: punitkoura Differential Revision: D33237928 Pulled By: xianxl fbshipit-source-id: 7773cf56e896210dab1f4311ae69f0e00c6d9aff	2021-12-20 13:05:17 -08:00
dianaml0	88e7d2586b	fix flake8 issues (#2570 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? - [x] applies flake8 fixes to main branch (https://github.com/fairinternal/fairseq-py/issues/2546) - still more to be fixed Fix GPU tests: - [x] when torch.ao.quantization import doesn't work use torch.quantization - [x] build apex from earlier commit in circleci so that its compatible with pytorch 1.8 and 1.9 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2570 Reviewed By: Mortimerp9 Differential Revision: D32955312 Pulled By: dianaml0 fbshipit-source-id: e163cbd4998f171f819e31b0682c1c0f1986f9e1	2021-12-09 02:34:30 -08:00
dianaml0	0dfd6b6240	Add linting with black (#2678 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2678 Reviewed By: Mortimerp9 Differential Revision: D32653381 Pulled By: dianaml0 fbshipit-source-id: 2810d14867cd7d64f4d340740e2b590b82de47fe	2021-11-29 12:32:59 -08:00
Vinayak Tantia	3a5838c320	Update implemention of SlowMo to its implementation in Fairscale (#3996 ) Summary: - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo. Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](https://github.com/facebookresearch/fairscale/pull/378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3996 Reviewed By: dianaml0 Differential Revision: D32280163 Pulled By: vtantia fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70	2021-11-09 09:44:45 -08:00
Vimal Manohar	8feccf9441	EMA Summary: Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759 1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint. 2. `ema.ema_start_update` to control when the EMA starts accumulating 3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False) 4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this. ``` This module has the EMA class used to store a copy of the exponentially decayed model params. Typical usage of EMA class involves initializing an object using an existing model (random or from a seed model) and setting the config like ema_decay, ema_start_update which determine how the EMA model is updated. After every update of the model i.e. at the end of the train_step, the EMA should be updated by passing the new model to the EMA.step function. The EMA model state dict can be stored in the extra state under the key of "ema" and dumped into a checkpoint and loaded. The EMA object can be passed to tasks by setting task.uses_ema property. EMA is a smoothed/ensemble model which might have better performance when used for inference or further fine-tuning. EMA class has a reverse function to load the EMA params into a model and use it like a regular model. ``` Reviewed By: cruvadom Differential Revision: D24238379 fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2	2021-09-01 12:29:51 -07:00
Gagandeep Singh	237184e522	Add torch.cuda.amp support (#3460 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/3282 Add support for `torch.cuda.amp` AMP can be enabled by `--amp`, instead of using `--fp16` for the already present full fp16 support. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3460 Reviewed By: sshleifer, msbaines Differential Revision: D27932253 Pulled By: myleott fbshipit-source-id: 21637aefb5e788c59bf4f3c5de6c4a80f7319543	2021-05-26 14:39:10 -07:00
Weiyi Zheng	8df9e3a4a5	support FSDP sharded_state checkpoint loading during inference Summary: using the very useful feature added by QuentinDuval https://github.com/facebookresearch/fairscale/pull/683/files , we can consolidate sharded states into a full regular states. this allows inferences on sharded state almost transparently. The main complexity comes from trying to be smart about what kind of checkpoint the user wants to load. not sure if this is over-engineering 1. if the file checkpoint-shard0.pt exists, and `--checkpoint-shard-count` is > 1, then we load sharded FSDP checkpoint 2. if checkpoint-shard0.pt exists but --checkpoint-shard-count=1, we load consolidated FSDP checkpoint 3. if checkpoint-shard0.pt does not exist, but --checkpoint-shard-count > 1, we load model parallel checkpoint 4. otherwise we are loading a single, plain checkpoint. In theory we could be even smarter and load shard0.pt to check how many more checkpoints are needed. this is not implemented, though it will save the user having to specify --checkpoint-shard-count. Reviewed By: sshleifer Differential Revision: D28563441 fbshipit-source-id: dcafcaa7c9eaf5c9ff94f55c16bb3424c98dfa59	2021-05-25 17:45:51 -07:00
Weiyi Zheng	425c36eaff	support use_sharded_state on command line Summary: we wanted to use sharded_state because 1. to save memory 2. support sharded state loading, which allows MoE models's weight to live on their respective shard I just added the use_sharded_state as a config option, and added unit test to make sure it runs fine. old revision's comment: fairseq.FSDP has a flag use_sharded_state, but I had to address a couple problems before being able to use it. 1. fairscale FSDP (FSDP for short) calls self.state_dict/load_state_dict, which has been overwritten by fairseq.FSDP, this is not a desired behavior 2. the optimizer states shouldn't be sharded again when use_sharded_state is True 3. expose this option on the command line. Reviewed By: sshleifer Differential Revision: D28375035 fbshipit-source-id: c2f59a9c62163405033f34ed595ba78528aea850	2021-05-14 18:53:16 -07:00
Sam Shleifer	05b86005bc	Fix FSDP optim state loading (#1819 ) Summary: ### Problem: - if we consolidate optim state dict on rank 0, rank 1+ save `optimizer.state_dict()`. When they try to load, they call get_shard(last_optim_state), which is wrong since the optim state is already shared. They should find the global consolidated optimizer state dict and load that. ### Possible Solutions: - if world size is the same, you could just reuse the local OSD. - [this PR] rank 1+ load optim state from the rank0 file and call get_shard - separate file for optim_state that every rank loads. (like 'shared.pt' on `gshard-azure`). This will save some CPU Ram. ### Note: - I don't think it's possible to pass `--use-sharded-state` from the command line. It should be I think. ### Implementation here + if FSDP saves -1 as state['last_optimizer_key'], it means that, on load, rank 0's optim state must be loaded. + regression test Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1819 Reviewed By: zhengwy888 Differential Revision: D27910281 Pulled By: sshleifer fbshipit-source-id: d34987008f77ce7e0cb28b7224dd2aabed38a70c	2021-04-21 15:50:13 -07:00
Sam Shleifer	da0432a3cd	MultiGPU test and --log-file workaround (#1793 ) Summary: The initial problem I set out to solve was that it's not easy to add a multigpu test. I solved that problem but it ruined log capturing, both with `self.assertLogs` and `with contextlib.redirect_stdout(StringIO())`. After some brief digging, I gave up on trying to get those to work, and added support for `--log-file AGI_v0.log` which will write the `progress_bar.log()` statements to `log-file` as well as `stdout`. This functionality is used by the resumption test. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1793 Reviewed By: myleott Differential Revision: D27671192 Pulled By: sshleifer fbshipit-source-id: bcba5f9df7a965889a4cd6993f7eeb0f14b770c6	2021-04-21 06:39:00 -07:00
Myle Ott	d464af2feb	Fix NAT code (#1454 ) Summary: D23752010 (`add65adcc5`) broke some GPU-only tests for NAT. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1454 Test Plan: Imported from OSS Reviewed By: jmp84 Differential Revision: D25108461 Pulled By: myleott fbshipit-source-id: f32b890221578c421944d6f9a49f06ef1dc075c6	2020-11-20 12:42:33 -08:00
Myle Ott	2d900bf308	Fix tests (#1352 ) Summary: We need to keep `--num-workers=0` during tests Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1352 Reviewed By: alexeib Differential Revision: D24375411 Pulled By: myleott fbshipit-source-id: 9975ed5405f3b19b4dd0877ca15ee3081b185942	2020-10-16 17:36:13 -07:00
Myle Ott	7c292af66f	Fix hub (#2687 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2687 Reviewed By: alexeib Differential Revision: D24095130 Pulled By: myleott fbshipit-source-id: 7d371bccb550ec68b2b9b39dfa4c0718356508d6	2020-10-02 19:02:01 -07:00
Stanislau Hlebik	698e3b91ff	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:21:51 -07:00
Stanislau Hlebik	7ea5e3b341	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:21:45 -07:00
Myle Ott	5abc774eea	Re-enable test_transformer_fp16 GPU test Reviewed By: theweiho Differential Revision: D21890628 fbshipit-source-id: 4088884dd2a82a831f1c129e675eb233c469242a	2020-06-05 06:06:20 -07:00
Wei Ho	ea092c2aa6	Split out fairseq GPU tests & add new deeplearning_fairseq_gpu contbuild using remote execution Reviewed By: myleott Differential Revision: D21472387 fbshipit-source-id: efde278baf6a05e8a81a9630b44c7e7e7c7fe7fc	2020-06-03 18:53:35 -07:00

19 Commits