fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

Author	SHA1	Message	Date
Stanislau Hlebik	698e3b91ff	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:21:51 -07:00
Stanislau Hlebik	7ea5e3b341	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:21:45 -07:00
James Cross	3655cf266e	optional limit on total training time (#2333 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2333 This change adds a new option (`--stop-time-hours`) which if specified limits the total training time to that number of hours. In order to stop training within the inner training loop (after the first update exceeding the time limit) the starting time is stored on the trainer. In addition, in order to persist the training time when when restoring from checkpoints (important because training runs are sometimes killed due to resource constraints), training time already completed is stored as extra state in the checkpoints (though this change is backward compatible with existing checkpoints). Reviewed By: myleott Differential Revision: D22573166 fbshipit-source-id: 01c59274a1c196acc8a3a0243814167e1d368b1a	2020-07-16 17:45:07 -07:00
Duc Le	75d354c92b	NNLM training in PySpeech Summary: Enable support for NNLM training in PySpeech. This implementation slightly modifies Fairseq's `LanguageModelingTask` in a few ways: 1. `source` and `input` used during training are slightly different (see `_maybe_add_bos` under `PySpeechLMDataset`). 2. The underlying model is `PySpeechEncoderModel` instead of `FairseqDecoder`. This lets us interface more easily with PySpeech, and the jitted model can easily be used in C++. Reviewed By: jay-mahadeokar Differential Revision: D22077479 fbshipit-source-id: 4918b26ba78de8786870060ada0bc3d3a28d64b0	2020-07-16 10:57:37 -07:00
Myle Ott	77df83ab6e	Consolidate distributed init code into distributed_utils.call_main (#1218 ) Summary: We use `distributed_utils.call_main` in most of the other CLI tools (e.g., generate.py, eval_lm.py), but not train.py. The only place where they're different is that train.py supports TPUs and the `after_distributed_init_fn` hook. We can add that support to `distributed_utils.call_main` and merge them. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1218 Reviewed By: jhcross Differential Revision: D22556771 Pulled By: myleott fbshipit-source-id: 4f7110155f5f5d96905ef0bd17a4aa243ec8c443	2020-07-16 10:00:13 -07:00
Yuqing Tang	e52d071ee8	Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new multiligual task Summary: A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext - A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py - Minor changes to - fairseq/data/iterators.py to allow dynamic batch sampler - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued. Reviewed By: pipibjc Differential Revision: D22483484 fbshipit-source-id: 283b67e538508f330b0968609b7dae64d26bea05	2020-07-16 09:34:29 -07:00
Yuqing Tang	033daef0fc	Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: multiligual dataset manager Summary: A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext - Major work is in fairseq/data/multilingual - fairseq/data/multilingual/multilingual_data_manager.py to support a few sophisticated multilingual data combinations - fairseq/data/multilingual/sampling_method.py to support basic sampling functions Reviewed By: pipibjc Differential Revision: D22483471 fbshipit-source-id: 3d9d2643877a29333915975020e419508887b3ae	2020-07-16 09:34:28 -07:00
Yuqing Tang	c0b5226853	Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new datasets (#1205 ) Summary: A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext - Major work is in fairseq/data/multilingual - fairseq/data/multilingual/sampled_multi_dataset.py to enable sampling and virtual data sizes - fairseq/data/multilingual/sampled_multi_epoch_dataset.py to enable virtual epoch data size to start training without going through the whole data (which reduces the loading from 1.5 hours into <30 seconds) - [next diff] fairseq/data/multilingual/multilingual_data_manager.py to support a few sophisticated multilingual data combinations - [next diff] fairseq/data/multilingual/sampling_method.py to support basic sampling functions - [next diff] A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py - Minor changes to - fairseq/data/language_pair_dataset.py to (1) have language IDs in the batch if they are set, (2) allow a preset max_size of batch; (2) corresponding changes to fairseq/data/data_utils.py - [next diff] fairseq/data/denoising_dataset.py to (1) allow additional transformation; (2) allow a preset max_size of batch; - [next diff] fairseq/data/iterators.py to allow dynamic batch sampler - [next diff] fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1205 Test Plan: buck test mode/dev //deeplearning/projects/fairseq-py:test_cpu -- 'test_translation_multi_simple_epoch \(tests\.test_binaries\.TestTranslation\)' https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259 Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259 ✓ deeplearning/projects/fairseq-py:test_cpu - test_translation_multi_simple_epoch (tests.test_binaries.TestTranslation) 331.967 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259 Summary (total time 352.88s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Reviewed By: myleott Differential Revision: D22463947 Pulled By: tangyuq fbshipit-source-id: e430c040231035af73141dc736960bd972bd4b6e	2020-07-16 09:34:23 -07:00
Mandeep Baines	84896af72c	Fix memory-efficient-fp16 when using update_freq other than 1 (#1219 ) Summary: Tested the following model and verified that gnorms and losses match the following commit: commit `3b7cf75584` Author: m_fomicheva <mari.fomicheva@gmail.com> Date: Wed Jul 8 13:04:55 2020 -0700 The loss and gnorm are identical to the number of digits reported in the logs and the ppl is very close to many signficant digits. Thanks again to Jun Ru for reporting. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1219 Test Plan: CUDA_VISIBLE_DEVICES=0 fairseq-train --task language_modeling data-bin/wikitext-103 --save-dir checkpoints/transformer_wikitext-103 --arch transformer_lm --share-decoder-input-output-embed --dropout 0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 --tokens-per-sample 512 --sample-break-mode none --max-tokens 2048 --update-freq 16 --max-update 50000 --memory-efficient-fp16 --no-progress-bar --log-interval 1 --seed 4 Before (commit `3b7cf755`): 2020-07-15 12:17:28 \| INFO \| train_inner \| epoch 001: 45 / 3151 loss=19.083, ppl=555252, wps=7165.8, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=208 2020-07-15 12:17:33 \| INFO \| train_inner \| epoch 001: 46 / 3151 loss=19.042, ppl=539620, wps=7176.6, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=213 2020-07-15 12:17:37 \| INFO \| train_inner \| epoch 001: 47 / 3151 loss=18.908, ppl=492042, wps=7188.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=217 2020-07-15 12:17:42 \| INFO \| train_inner \| epoch 001: 48 / 3151 loss=18.894, ppl=487224, wps=7192, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=222 2020-07-15 12:17:47 \| INFO \| train_inner \| epoch 001: 49 / 3151 loss=18.829, ppl=465781, wps=7182.5, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=226 2020-07-15 12:17:51 \| INFO \| train_inner \| epoch 001: 50 / 3151 loss=18.752, ppl=441564, wps=7185.4, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=231 After: 2020-07-15 15:13:10 \| INFO \| train_inner \| epoch 001: 45 / 3151 loss=19.083, ppl=555249, wps=7220.5, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=207 2020-07-15 15:13:14 \| INFO \| train_inner \| epoch 001: 46 / 3151 loss=19.042, ppl=539617, wps=7216.3, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=212 2020-07-15 15:13:19 \| INFO \| train_inner \| epoch 001: 47 / 3151 loss=18.908, ppl=492041, wps=7220.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=216 2020-07-15 15:13:24 \| INFO \| train_inner \| epoch 001: 48 / 3151 loss=18.894, ppl=487228, wps=7229.4, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=221 2020-07-15 15:13:28 \| INFO \| train_inner \| epoch 001: 49 / 3151 loss=18.829, ppl=465783, wps=7231.2, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=225 2020-07-15 15:13:33 \| INFO \| train_inner \| epoch 001: 50 / 3151 loss=18.752, ppl=441559, wps=7224.5, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=230 # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Reviewed By: myleott Differential Revision: D22560914 Pulled By: msbaines fbshipit-source-id: f2fdc3daa46de0b75f26cb4d5712e92d1a820d60	2020-07-16 08:10:10 -07:00
Mandeep Baines	9c21a715d6	Fix regression in memory-efficient-fp16 (#1216 ) Summary: The fused_adam optimizer divides by the scale while our logic multiplies by the scale. I'm surprised this even worked. The first few iterations had nearly similar loss with the old code and even converged. However, Jun Ru noticed that the loss are very different after more iterations. # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1216 Reviewed By: myleott, shruti-bh Differential Revision: D22536377 Pulled By: msbaines fbshipit-source-id: 9328a1764a1895572c18567f99bee3330f25179e	2020-07-15 17:13:49 -07:00
Mandeep Baines	a541b19d85	Add dummy task for translation benchmarking (#1212 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1212 Test Plan: python train.py \ -a transformer \ --clip-norm 0.4 --optimizer adam --lr 0.001 \ --dropout 0.0 \ --decoder-layers 7 \ --encoder-layers 7 \ --encoder-ffn-embed-dim 2048 \ --decoder-ffn-embed-dim 2048 \ --encoder-embed-dim 1024 \ --decoder-embed-dim 1024 \ --max-tokens 8192 \ --criterion cross_entropy --max-update 50 \ --attention-dropout 0.0 \ --adam-betas '(0.9, 0.98)' \ --disable-validation --no-save \ --task dummy_mt # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Reviewed By: myleott Differential Revision: D22484873 Pulled By: msbaines fbshipit-source-id: bc61165ab91290d0b6aa2077c968ab537bce8a6a	2020-07-15 16:09:13 -07:00
Myle Ott	ffecb4e349	Small fixes (#1215 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1215 Reviewed By: ngoyal2707, msbaines Differential Revision: D22514719 Pulled By: myleott fbshipit-source-id: 5f15ba501fd66af1eb49b5702aff940f06c3d91f	2020-07-14 14:17:13 -07:00
Aditya Pillai	5d88d379ca	bug fix: use cls.load_dictionary for multilingual translation Summary: Currently, multilingual translation imports Dictionary and calls its load function. However, this does not permit extending the class with a different load_dictionary function to modify its behavior. Reviewed By: myleott, chtran Differential Revision: D22441356 fbshipit-source-id: b0ef159182b15adb479b117581ddcd2f65724980	2020-07-09 13:22:26 -07:00
Mandeep Baines	16e9661bd9	avoid fp16 unscales and multiply_grads (#1201 ) Summary: The object of this patch is to avoid fp16 unscale calls which can potentially under/over-flow by 1) scaling grad_narm instead of unscaling grads before calculating grad_norm and 2) using scale argument to step (if supported by optimizer). By letting the optimizer scale we avoid multiply_grads (saving on GPU compute/mem). We also get better precision since the unscale occurs in the kernel resulting in an FP32 unscaled grad instead of an FP16 unscaled grad. A side-effect of this patch is a noticeable WPS win due to a multi-tensor kernel being used for grad_norm and because we avoid multiply_grads. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1201 Test Plan: Verified grad_norm and loss before and after. Before: epoch 001 \| loss 19.506 \| ppl 744403 \| wps 13966.7 \| ups 0.21 \| wpb 65536 \| bsz 128 \| num_updates 50 \| lr 6.34875e-06 \| gnorm 8.173 \| loss_scale 10 \| train_wall 250 \| wall 259 After: epoch 001 \| loss 19.506 \| ppl 744363 \| wps 14003 \| ups 0.21 \| wpb 65536 \| bsz 128 \| num_updates 50 \| lr 6.34875e-06 \| gnorm 8.173 \| loss_scale 10 \| train_wall 250 \| wall 258 # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Reviewed By: myleott Differential Revision: D22251842 Pulled By: msbaines fbshipit-source-id: e6d82cdd3c95e7770835abe054db4b50e6ad569e	2020-07-08 14:47:36 -07:00
m_fomicheva	2887663811	Implemented applying dropout at inference time (#2308 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2308 Implemented Monte Carlo dropout. Added README to reproduce the results from our paper that applies this idea for unsupervised quality estimation of NMT (joint work of Facebook AI and the University of Sheffield): Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia. Unsupervised Quality Estimation for Neural Machine Translation. Accepted to TACL Retaining dropout at test time is not possible in the current code base. The statement ``` if not self.retain_dropout: model.eval() ``` in `SequenceGenerator` does not have any effect, since model `training` attribute is already set to False by the method `make_generate_fast_`, which is applied before initializing `SequenceGenerator` in `generate.py`. `make_generate_fast_` throws an exception when trying to set `training` to True after its application. Also, if I am not mistaken `self.training=True` can have other effects, so setting it to True only for the purpose of retaining dropout at test time might be confusing. I propose an alternative implementation where `retain_dropout` is an attribute of FairseqModel class. # Before submitting - [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [Y] Did you make sure to update the docs? - [Y] Did you write any new necessary tests? ## What does this PR do? New feature. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2151 Reviewed By: ngoyal2707 Differential Revision: D22048889 Pulled By: myleott fbshipit-source-id: 0d0d4784a7314fc7a45b76341fd3b8232b3e2cf0	2020-07-08 13:06:13 -07:00
Myle Ott	d73e543e38	Update LinformerSentenceEncoder to inherit from TransformerSentenceEncoder Summary: It seems we can make this work by setting `compress_layer` in `build_transformer_sentence_encoder_layer` and adding an "init_fn" callback. Doing this refactoring now since the stacked diff (D22048889) broke Linformer training, so safer to inherit from TransformerSentenceEncoder directly. Reviewed By: ngoyal2707 Differential Revision: D22411012 fbshipit-source-id: d4ecb71eedd6ddf49abbb1e700d0f2af24e39e5a	2020-07-08 13:06:10 -07:00
Wei Ho	9f92b05e2a	TorchElastic for fairseq FBTranslate Summary: Use TorchElastic for multi-node, multi-GPU training Reviewed By: cndn Differential Revision: D22083634 fbshipit-source-id: 3673308671b0bc985b6012ee5327d604d995409f	2020-07-08 00:26:52 -07:00
Gil Keren	7816946ff9	Fix memory leak with small data-buffer-size Summary: As part of zhengwy888's debugging of a memory leak, he suggested that trimming the number of batches in pyspeech's train.py may cause the BufferedIterator to leave some batches in the queue, causing a memory leak. Therefore, propagating `take` to the buffered iterator, which should prevent the consumer thread from hanging on `queue.put`. Reviewed By: myleott Differential Revision: D22405263 fbshipit-source-id: 80f40a355652016af4ba8c386b623cb0552b1928	2020-07-07 16:42:12 -07:00
Siddharth Shah	578164a0ef	0 warmup in tri stage lr scheduler Summary: Current code fails due to division by zero. This diff allows for zero warmup in tri stage scheduler. Reviewed By: myleott Differential Revision: D22416482 fbshipit-source-id: dedb41ac141528314dc86cd73b8b67e699bf457b	2020-07-07 16:15:50 -07:00
Myle Ott	97ca0c022c	Fix data hang with buffered iterator (#1206 ) Summary: According to Tom Birch: "I think there's an issue with torch.utils.data.dataloader._MultiProcessingDataLoaderIter when next(...) is supposed to raise StopIteration it just blocks indefinitely instead." This PR is a workaround that fixes the issue. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1206 Reviewed By: froody Differential Revision: D22411150 Pulled By: myleott fbshipit-source-id: 7cdfa67cf55e9cff81cf7d4904f1d38bfa36a0d0	2020-07-07 10:23:09 -07:00
Myle Ott	fc29aab203	Fix model parallel training after quantization/interactive.py changes (#1202 ) Summary: - fix model parallel training after output_projection changes - fix training with non-vocab parallel criterions Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1202 Reviewed By: ngoyal2707 Differential Revision: D22266462 Pulled By: myleott fbshipit-source-id: c7bb9a95c01f5fdaf415a709a93bacb15336271c	2020-07-06 08:27:27 -07:00
Daniel Adkins	a87cafda71	update fairseq binarizer to use PathManager Summary: Currently, fairseq binarizer does not work with Manifold files, making it incompatible with some internal procedures. This change preserves the old functionality while allowing Manifold files to be passed into binarizer functions. motivated by theweiho: "I think we should change Binarizer to use PathManager so that it can handle either Manifold path or POSIX path" (D22241626) Reviewed By: akinh Differential Revision: D22293525 fbshipit-source-id: d1bf4f8b50dda6a9214ee2fbe45e112ca9628f60	2020-06-30 12:53:33 -07:00
Belinda Li	894ae64858	Add Linformer to internal fairseq Summary: Adding linformer Reviewed By: myleott Differential Revision: D22253918 fbshipit-source-id: 0bb86dddae1be09450544cb25530400e914c640f	2020-06-27 16:11:26 -07:00
Ning Dong	d2b5265a60	Merge FBSequenceGenerator & SequenceGenerator Summary: See discussion in D20995796 (`4725487bbc`). Will merge 2 diffs if this looks good to you myleott jhcross Reviewed By: myleott Differential Revision: D21214974 fbshipit-source-id: ebb59b0491a8c209bed2420a0cd94e9c41d05f2e	2020-06-24 22:30:46 -07:00
Myle Ott	f0a61a2774	Miscellaneous fixes (#1196 ) Summary: Incorporate several fixes, incl. from OSS contributors: - fix model argument in sequence generator in semisupervised_translation.py - fix aggregate logging in semisupervised_translation.py - Fix EOS token in multilingual_denoising - Handle missing eos_idx in data_utils.collate_tokens - Better OOM handling for single-GPU training - fix prepend_bos argument in translation_from_pretrained_bart.py … - Fix eos_idx in multilingual_denoising - Small logging fixes - Fix fb_hub on PyTorch 1.6 - Better variable names - Add support for model parallel to interactive.py - Use `//` operator to fix Integer division warning - Set default `--clip-norm=0.0` - Cleanup some binaries in root directory Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196 Reviewed By: ngoyal2707 Differential Revision: D22162202 Pulled By: myleott fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d	2020-06-24 10:08:53 -07:00
Myle Ott	da94e58c70	TPU support for Translation (#2245 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2245 Reviewed By: ngoyal2707 Differential Revision: D22070745 Pulled By: myleott fbshipit-source-id: e43a96a585366b10d997a12522e8cd6496294ad2	2020-06-24 09:56:42 -07:00
Marco Gaido	a12c5c5de8	Add max position params to speech recognition (#1783 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1782. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1783 Reviewed By: okhonko Differential Revision: D21663633 Pulled By: myleott fbshipit-source-id: 5f3b4b7df83e27d866efb489daeffb3b38a66f38	2020-06-23 06:48:47 -07:00
Myle Ott	d0ccc3e02e	Add FairseqDecoder.reorder_incremental_state_scripting for TorchScript (#1190 ) Summary: The main changes are in fairseq_incremental_decoder.py. I made the base `reorder_incremental_state` implementation a no-op and instead we expect callers (e.g., SequenceGenerator) to call `reorder_incremental_state_scripting`. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1190 Test Plan: I ran unit tests both in PyTorch 1.5 and nightly (1.6). I also tested some of the pretrained translation models, but it'd be good to test with some prod runs. Reviewed By: jhcross Differential Revision: D22095614 Pulled By: myleott fbshipit-source-id: 484b8d47b4feda4efe52233a3d46a207d0816766	2020-06-22 18:54:28 -07:00
Ronan Riochet	d5d2cf3cd5	Add timeout kwarg to EpochBatchIterator (#2261 ) Summary: Add an optional ```timeout``` argument to ```EpochBatchIterator```. I need it to fix this issue: https://github.com/pytorch/pytorch/issues/2474 I could do something more general, allowing one to pass ```**dataloader_kwargs``` to ```torch.utils.data.DataLoader```, if you think it's worth. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2261 Reviewed By: huihuifan Differential Revision: D22162936 Pulled By: myleott fbshipit-source-id: 959b408a53356c19c04fc5ae94aad5f164a32dcd	2020-06-22 18:43:25 -07:00
gvskalyan	88c58b6718	Preprocess dict number (#2228 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/2227 . ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2228 Reviewed By: huihuifan Differential Revision: D22163032 Pulled By: myleott fbshipit-source-id: a5afbfca2d9a11563026f47cd246654e131d92fb	2020-06-22 18:40:35 -07:00
Yi-Hsiu Liao	e187f6e116	add maybe_no_sync for multilingual_translation task (#2238 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ x ] Did you write any new necessary tests? ## What does this PR do? This PR reduces unnecessary communication overhead between GPUs since we only need to sync up once for all lang-pairs. We see significant training speedup especially with large number of lang-pairs. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2238 Reviewed By: pipibjc Differential Revision: D22149086 Pulled By: myleott fbshipit-source-id: 6fff09e5a51b49bdcf5bc3986c0719b19d31c0a9	2020-06-22 18:25:59 -07:00
Tony Lekhtman	a9cb84df68	Update hub_utils.py (#2253 ) Summary: fix bug for print_alignment # Before submitting - [ V] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ V] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? not relevant - [ ] Did you write any new necessary tests? not relevant ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1880 . ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2253 Reviewed By: huihuifan Differential Revision: D22162948 Pulled By: myleott fbshipit-source-id: 3ec5508506184a9effa330fbcd43ffe917b533c6	2020-06-22 15:28:57 -07:00
Mike Ruberry	320bf8cf96	Updates full to no longer use deprecated integer fill_value type inference Summary: In PyTorch 1.5 using an integer fill_value and not setting the dtype or out kwarg with torch.full was deprecated, and soon will throw a runtime error. In the future, torch.full will infer its dtype from the fill_value, and these would produce integer, not float, tensors. This update maintains the current behavior. Created from Diffusion's 'Open in Editor' feature. Reviewed By: myleott Differential Revision: D22161456 fbshipit-source-id: b5d687e4de83dba6e76cae6e61b5106bf5b320db	2020-06-22 11:56:58 -07:00
Joshua Meier	8eb9123f56	Patch masked_lm memory leak on GPUs (#1195 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes memory leak in masked_lm criterion. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1195 Reviewed By: myleott Differential Revision: D22155285 Pulled By: joshim5 fbshipit-source-id: 9414e307e1e2d2a9225884dc94aae964a1627682	2020-06-22 10:02:21 -07:00
Myle Ott	3ea511d899	Revert Dataloader changes Summary: D22052683 may have introduced a memory leak, revert those parts for now The original motivation is described here: https://github.com/pytorch/fairseq/issues/2168. Previously I/O was bursty when training with large update frequency. This meant to even it out, but possibly introduced a memory leak. More context on the change can be found here: https://github.com/pytorch/fairseq/issues/2168 Reviewed By: yqwangustc Differential Revision: D22156157 fbshipit-source-id: 390ff39bc3e268d6312971768c34fe44d4bd84b7	2020-06-22 06:48:37 -07:00
Mandeep Baines	6f6461b81a	Add tracepoints (#1192 ) Summary: There is no overhead when the profiling is not enabled. When running using profile.py, I measure an overhead of 3%. # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1192 Reviewed By: sidgoyal78 Differential Revision: D22102341 Pulled By: msbaines fbshipit-source-id: ffddb9cceb853df88db34195be18bae7723d4c98	2020-06-19 16:23:22 -07:00
Rohit Kopparthy	8d8d773c57	Set EncoderOut Attributes to None instead of torch.empty(0) Summary: The ConvTransformer model throws an error during training because of certain attributes having been changed to torch.empty(0) instead of None to meed torchscript type requirements. Existing assertion checks only check if these attributes are not None, rather than not torch.empty(0). To fix this, types have been modified to Optional types and allowed to stay as None like before. Reviewed By: zhengwy888 Differential Revision: D22115126 fbshipit-source-id: de3c7b64c5e7142c860a354f778b8b818a7b0bb8	2020-06-19 10:51:50 -07:00
Wei Ho	d617c292f8	Apply black formatter to fairseq_cli/train.py Differential Revision: D22125634 fbshipit-source-id: a05f483ac4b564f5d7a21f5ae3605615e7fcd263	2020-06-18 18:38:11 -07:00
Gil Keren	82f99df8e4	Gradually releasing the restrictions on data-buffer-size Summary: the buffer was a suspect in creating some everstore overload, therefore was restricted in D21804332. But since it's part in those problems was inconclusive, and the everstore read limit was increased for the speech group, gradually increasing it back. Differential Revision: D22076534 fbshipit-source-id: cb01d50d4df5843b86f7d730e1805a88ea3f41d8	2020-06-17 18:50:23 -07:00
Yongqiang Wang	c294e2fcfb	print out all the CUDA environment information (including name, memory size, Summary: Recently, we found there are more and more likely that different generations (V100 vs P100) / memory size (16GB, 32GB) GPUs are mixed up in training, while the users do not even know about this. Print out this message can be helpful for debugging Reviewed By: myleott Differential Revision: D21782630 fbshipit-source-id: 7e1075e1b928d969594bbee92275a819cf1a0877	2020-06-17 13:47:32 -07:00
Rohit Kopparthy	3c16b002b9	Scripting ConvTransformer Summary: This diff is building off of D21986239 to script the ConvTransformer Model instead of the VggTransformer. The changes made in data_utils.py were copied over from D20443519. A new file called test_convtransformer.py was added to test scripting the model. The scripted model compiles and also produces the same output as before scripting. Reviewed By: myleott Differential Revision: D22022654 fbshipit-source-id: 8f5a36a9af391142b468818650be3af218235fc2	2020-06-16 12:16:03 -07:00
Myle Ott	14ee059a36	Dataloading fixes (#1189 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1189 Reviewed By: ngoyal2707 Differential Revision: D22052683 Pulled By: myleott fbshipit-source-id: afdfda291907ad4441af51cfc9e44f1bd01ea696	2020-06-16 11:49:39 -07:00
Alex Xiao	8570277f91	dataset sampling for minibatch training Summary: Motivation: We have 3 datasets: Portal, Video, and Messenger Voice Clips. We want to specify a distribution [p1, p2, p3] such that we sample utterances from Portal with prob p1, etc. Previously, D21675421 samples from datasets by batches. This is not acceptable for minibatch training, as we need to maintain LSTM states across consecutive batches. As a result, we need utterance level sampling, not batch level. Design 1. Created a new MultiCorpusDataset, similar to MultiCorpusSampledDataset, except it does sampling on utterance level. Specifically, everytime `ordered_indices` is called, a new sample of the multiple datasets is generated based on an input distribution. We ensure that the randomness of this is seeded by the input seed and the epoch, to enable reproducibility on loading from checkpoints. 2. Created MiniBatchMultiCorpusDataset, which adds minibatch specific logic to MultiCorpusDataset, mainly for handling things like the start frame and deleting cache. 3. Refactored different sampling strategies into a single `build_sampled_dataset` for easy re-use. 4. Added flag --reset-iterator, enabling us to reset the batch iterator every epoch, enabling a new `ordered_indices` to be generated every epoch 5. some minor refactoring of existing code Usage 1. In your data.json, include extra splits in addition to "train" (i.e. "portal", "video"), with whatever transforms/handle file you want. 2. In your flow, provide "--extra-splits portal 0.2 video 0.3" and "--reset-iterator" as flags 3. Enjoy WER improvements Differential Revision: D21887303 fbshipit-source-id: 6b377bed8a68a8e72e2528f8a5a28b675eebaadf	2020-06-15 15:28:01 -07:00
Yongqiang Wang	86edf989dd	cast grad_norm to float in case fp16 training Summary: we found that grad_norm could become inf because it is accumulated in meter many times; and fp16 it becomes easy to overflow. Using fp32 for each `grad_norm` cost minimum memory Reviewed By: myleott Differential Revision: D22015643 fbshipit-source-id: 429d24bbb9c9a785edf0bfb06480497022f80418	2020-06-12 22:12:52 -07:00
Joshua Meier	242269d439	Fix truncation in sentence_ranking (#1185 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes current breaking change on master. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1185 Reviewed By: myleott Differential Revision: D21924644 Pulled By: joshim5 fbshipit-source-id: 0eabd2393c76060dcf1568eba308878a90af7a87	2020-06-10 11:59:58 -07:00
Mike Ruberry	2e1da09a9c	Updates argument to np.arange to avoid performing floor division using torch.div Summary: Performing floor division with torch.div is deprecated and will soon throw a runtime error. Perhaps surprisingly, calling np.arange on a torch tensor can use torch.div to perform floor division. Taking the number from the tensor using .item() should prevent this issue and keep this code working. Created from Diffusion's 'Open in Editor' feature. Reviewed By: lematt1991 Differential Revision: D21941120 fbshipit-source-id: 4d76451d4b33d487946af1c2f9ed21eca858cb06	2020-06-09 13:04:47 -07:00
Gil Keren	5c4f0f8903	Better exception handling for the data buffer thread Summary: When data-buffer-size != 0 was used, an exception happening in the data preparation (therefore in the buffer thread) was not raised properrly, and the main thread hanged on `queue.get`. This fixes it, by raising the error to the main thread. Reviewed By: myleott Differential Revision: D21917739 fbshipit-source-id: 8d3f875b663b37625f44a943fb3904e25216db06	2020-06-08 10:52:41 -07:00
Xilun Chen	e03bfd9bf4	Check if the checkpoint is from the latest version before updating the state_dict in TransformerDecoder.upgrade_state_dict_named() (#2222 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2222 When share_input_output_embed is set to True, the existing code always overrides output_projection.weight with embed_tokens.weight This is unncessary, and caused a very obscure bug in our custom BART model. Added a check to skip the update to state_dict if f"{name}.output_projection.weight" is already in the checkpoint. Reviewed By: myleott Differential Revision: D21915833 fbshipit-source-id: d298e24394be2ee85c8f686ba459b7e4cbd4298a	2020-06-08 10:16:59 -07:00
Joshua Meier	2699f4a28b	add random sequence truncation (#1173 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Allows taking random crops in language_modeling task. Discussed with myleott robert-verkuil tomsercu in meeting yesterday. Ultimately took a different, more general approach to implementing this. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1173 Reviewed By: myleott Differential Revision: D21904561 Pulled By: joshim5 fbshipit-source-id: 66e8dfb10a0d36b76acd2eb181e00db6fc2433fc	2020-06-05 12:58:07 -07:00
Joshua Meier	152a3fe143	Support residual connections in LSTM models (#1103 ) Summary: Adds support for residual connections in LSTM models. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1103 Reviewed By: myleott Differential Revision: D21639942 Pulled By: joshim5 fbshipit-source-id: a02ddfe080a847fd91a9c6a5074cb6dc782f7727	2020-06-05 12:15:10 -07:00

1 2 3 4 5 ...

1376 Commits