fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 20:10:40 +03:00

History

Vimal Manohar 8feccf9441 EMA Summary: Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759 1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint. 2. `ema.ema_start_update` to control when the EMA starts accumulating 3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False) 4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this. ``` This module has the EMA class used to store a copy of the exponentially decayed model params. Typical usage of EMA class involves initializing an object using an existing model (random or from a seed model) and setting the config like ema_decay, ema_start_update which determine how the EMA model is updated. After every update of the model i.e. at the end of the train_step, the EMA should be updated by passing the new model to the EMA.step function. The EMA model state dict can be stored in the extra state under the key of "ema" and dumped into a checkpoint and loaded. The EMA object can be passed to tasks by setting task.uses_ema property. EMA is a smoothed/ensemble model which might have better performance when used for inference or further fine-tuning. EMA class has a reverse function to load the EMA params into a model and use it like a regular model. ``` Reviewed By: cruvadom Differential Revision: D24238379 fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2		2021-09-01 12:29:51 -07:00
..
distributed	Add tests for fairseq.distributed.utils.all_gather_list (#1548 )	2021-01-28 14:21:10 -08:00
gpu	EMA	2021-09-01 12:29:51 -07:00
speech_recognition	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
__init__.py	remediation of S205607	2020-07-17 17:21:51 -07:00
test_activation_checkpointing.py	Make checkpoint wrapper pickleable (#1603 )	2021-02-06 08:07:32 -08:00
test_amp_optimizer.py	Add torch.cuda.amp support (#3460 )	2021-05-26 14:39:10 -07:00
test_average_checkpoints.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_backtranslation_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_binaries.py	MultiGPU test and --log-file workaround (#1793 )	2021-04-21 06:39:00 -07:00
test_character_token_embedder.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_checkpoint_utils.py	Move checkpoint state_dict creation into Trainer (#1666 )	2021-03-04 13:32:44 -08:00
test_concat_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_constraints.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_convtbc.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_data_utils.py	batch_by_size refactoring: 100x speedup and optimization of memory footprint	2020-12-28 21:05:51 -08:00
test_dataclass_utils.py	Hierarchical Configs	2021-07-16 04:56:12 -07:00
test_dataset.py	Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667 )	2021-03-04 13:32:46 -08:00
test_dictionary.py	Extract File Chunking to its own utils (#1955 )	2021-06-28 01:46:32 -07:00
test_ema.py	EMA	2021-09-01 12:29:51 -07:00
test_export.py	Improve torchscript compatibility of transfomer and transformer pg (#3247 )	2021-02-22 14:22:54 -08:00
test_file_chunker_utils.py	Extract File Chunking to its own utils (#1955 )	2021-06-28 01:46:32 -07:00
test_file_io.py	Delete line that breaks gh ci (#1814 )	2021-04-19 16:31:11 -07:00
test_fp16_optimizer.py	end to end hydra configs (#1393 )	2020-11-04 18:20:12 -08:00
test_huffman.py	Indexed Huffman Coded dataset (#2029 )	2021-08-31 01:12:35 -07:00
test_inference_dropout.py	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
test_iopath.py	Support atomic saves for checkpoints (#1520 )	2020-12-18 07:40:49 -08:00
test_iterators.py	Simplify CountingIterator	2021-04-29 16:17:00 -07:00
test_label_smoothing.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_lm_context_window.py	Fix --context-window and add test (#1526 )	2020-12-23 18:35:54 -08:00
test_lstm_jitable.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_memory_efficient_fp16.py	Enable Hydra configs in fairseq (#1343 ) (#1510 )	2020-10-20 00:32:26 -07:00
test_metrics.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_multi_corpus_dataset.py	optimize sampling process of multi_corpus_dataset	2021-03-03 19:31:40 -08:00
test_multi_corpus_sampled_dataset.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_multihead_attention.py	Adding check for filler size (#3495 )	2021-04-21 09:09:19 -07:00
test_noising.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_online_backtranslation.py	Obt 2 (#1614 )	2021-03-30 09:56:03 -07:00
test_plasma_utils.py	Plasma tests: ask for less disk (#1893 )	2021-05-24 09:00:18 -07:00
test_reproducibility.py	Add torch.cuda.amp support (#3460 )	2021-05-26 14:39:10 -07:00
test_resampling_dataset.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_roberta.py	Obt 2 (#1614 )	2021-03-30 09:56:03 -07:00
test_sequence_generator.py	fix beam search with prefix tokens (#2227 )	2021-08-30 18:07:13 -07:00
test_sequence_scorer.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_sparse_multihead_attention.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_token_block_dataset.py	TokenBlockDataset np type promotion issue (#1658 )	2021-02-26 21:00:38 -08:00
test_train.py	fixes tests/test_train.py to mock checkpoint.save_dir config node (#3675 )	2021-07-06 15:07:31 -07:00
test_transformer.py	fix MultiHeadAttention assert (#1798 )	2021-04-14 04:59:59 -07:00
test_utils.py	Apply black+isort (#1357 )	2020-10-18 18:14:51 -07:00
test_valid_subset_checks.py	Migrate DummyMaskedLMTask to FairseqTask (#3593 )	2021-06-10 09:43:08 -07:00
utils.py	MultiGPU test and --log-file workaround (#1793 )	2021-04-21 06:39:00 -07:00