Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1268
We previously had a memory leak when using sharded datasets. In particular,
each sharded dataset is a new FairseqDataset instance, and the cache is keyed
by the `dataset` instance. Since we never clear the cache, this would
eventually cause the system to run out of CPU RAM.
This diff disables caching when using sharded datasets.
Note that we also change the signature to `get_batch_iterator`, which needs to
propagate to many places. We previously avoided this update when adding
`data_buffer_size`, so I'm also adding that everywhere.
Reviewed By: ngoyal2707
Differential Revision: D23319135
fbshipit-source-id: 6bcd6aee141ad9cc234448c49106a8dbf8ea1800
Summary:
Incorporate several fixes, incl. from OSS contributors:
- fix model argument in sequence generator in semisupervised_translation.py
- fix aggregate logging in semisupervised_translation.py
- Fix EOS token in multilingual_denoising
- Handle missing eos_idx in data_utils.collate_tokens
- Better OOM handling for single-GPU training
- fix prepend_bos argument in translation_from_pretrained_bart.py …
- Fix eos_idx in multilingual_denoising
- Small logging fixes
- Fix fb_hub on PyTorch 1.6
- Better variable names
- Add support for model parallel to interactive.py
- Use `//` operator to fix Integer division warning
- Set default `--clip-norm=0.0`
- Cleanup some binaries in root directory
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196
Reviewed By: ngoyal2707
Differential Revision: D22162202
Pulled By: myleott
fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d
Summary:
sanitized vq-wav2vec implementation. i will also add docs to this. i have a fixed-up checkpoint that this code can load and verified that it produces same results as what we used in paper
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1029
Differential Revision: D20129246
Pulled By: alexeib
fbshipit-source-id: f72f455e0c309168e644ab86ec18c768c308da98
Summary:
Hi,
I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188
Differential Revision: D17627778
Pulled By: myleott
fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69
Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/891
Differential Revision: D16377132
Pulled By: myleott
fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7
Summary:
Notable (possibly breaking) changes:
- d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c2: Move LM definitions into separate files
- dffb167: Updates to model API:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- 34726d5: Move `distributed_init` into `DistributedFairseqModel`
- cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db80: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
- 96ac28d: Rename `--sampling-temperature` -> `--temperature`
- fc1a19a: Deprecate dummy batches
- a1c997b: Add memory mapped datasets
- 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"
Plus many additional features and bugfixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/817
Differential Revision: D15913844
Pulled By: myleott
fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541
Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo
Differential Revision: D14251048
fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq