fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

History

Myle Ott 656d7e5779 Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1667 Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) This enables fully parameter + optimizer state sharding by using FullyShardedDataParallel (FSDP) from fairscale. The user just needs to provide `--ddp-backend=fully_sharded` to enable. Other common options work out-of-the-box (e.g., `--fp16`, `--memory-efficient-fp16`, `--update-freq`, etc.). This should be a drop-in replacement for the "c10d" backend. This yields pretty big speedups for small models and enables training ~13B parameter models on 8 GPUs and 175B parameter models on 128 GPUs, without model parallelism. This also adds a new option `--cpu-offload` that offloads the optimizer state and FP32 model copy to CPU, which is particularly useful when combined with `--optimizer=cpu_adam`. Note: after enabling this, each GPU will save a checkpoint file, since the optimizer state is sharded. Each checkpoint will contain a single shard of the optimizer state and the rank 0 checkpoint will contain the full model weights. Note: a known limitation of the current implementation is that you cannot resume training on a different world_size. This constraint will be relaxed in future iterations. Test Plan: Imported from OSS Reviewed By: sshleifer Differential Revision: D26771144 Pulled By: myleott fbshipit-source-id: 74c2f46f57719e24e2dcfc9d9ee7c2fc0aeedb46		2021-03-04 13:32:46 -08:00
..
distributed	Add tests for fairseq.distributed.utils.all_gather_list (#1548 )	2021-01-28 14:21:10 -08:00
gpu	Fix NAT code (#1454 )	2020-11-20 12:42:33 -08:00
speech_recognition	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
__init__.py	remediation of S205607	2020-07-17 17:21:51 -07:00
test_activation_checkpointing.py	Make checkpoint wrapper pickleable (#1603 )	2021-02-06 08:07:32 -08:00
test_average_checkpoints.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_backtranslation_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_binaries.py	Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667 )	2021-03-04 13:32:46 -08:00
test_character_token_embedder.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_checkpoint_utils.py	Move checkpoint state_dict creation into Trainer (#1666 )	2021-03-04 13:32:44 -08:00
test_concat_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_constraints.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_convtbc.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_data_utils.py	batch_by_size refactoring: 100x speedup and optimization of memory footprint	2020-12-28 21:05:51 -08:00
test_dataset.py	Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667 )	2021-03-04 13:32:46 -08:00
test_dictionary.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_export.py	Improve torchscript compatibility of transfomer and transformer pg (#3247 )	2021-02-22 14:22:54 -08:00
test_file_io.py	ioPath async - Fairseq unittests (#1669 )	2021-03-03 10:50:39 -08:00
test_fp16_optimizer.py	end to end hydra configs (#1393 )	2020-11-04 18:20:12 -08:00
test_inference_dropout.py	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
test_iopath.py	Support atomic saves for checkpoints (#1520 )	2020-12-18 07:40:49 -08:00
test_iterators.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_label_smoothing.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_lm_context_window.py	Fix --context-window and add test (#1526 )	2020-12-23 18:35:54 -08:00
test_lstm_jitable.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_memory_efficient_fp16.py	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
test_metrics.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_multi_corpus_dataset.py	optimize sampling process of multi_corpus_dataset	2021-03-03 19:31:40 -08:00
test_multi_corpus_sampled_dataset.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_multihead_attention.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_noising.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_reproducibility.py	Support atomic saves for checkpoints (#1520 )	2020-12-18 07:40:49 -08:00
test_resampling_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_sequence_generator.py	fastseq ngram blocking (#1509 )	2020-12-30 12:58:09 -08:00
test_sequence_scorer.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_sparse_multihead_attention.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_token_block_dataset.py	TokenBlockDataset np type promotion issue (#1658 )	2021-02-26 21:00:38 -08:00
test_train.py	Move checkpoint state_dict creation into Trainer (#1666 )	2021-03-04 13:32:44 -08:00
test_utils.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
utils.py	LASER training code (#1207 )	2021-02-18 03:10:55 -08:00