Commit Graph

1376 Commits

Author SHA1 Message Date
Stanislau Hlebik
698e3b91ff remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:21:51 -07:00
Stanislau Hlebik
7ea5e3b341 remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:21:45 -07:00
James Cross
3655cf266e optional limit on total training time (#2333)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2333

This change adds a new option (`--stop-time-hours`) which if specified limits the total training time to that number of hours. In order to stop training within the inner training loop (after the first update exceeding the time limit) the starting time is stored on the trainer.

In addition, in order to persist the training time when when restoring from checkpoints (important because training runs are sometimes killed due to resource constraints), training time already completed is stored as extra state in the checkpoints (though this change is backward compatible with existing checkpoints).

Reviewed By: myleott

Differential Revision: D22573166

fbshipit-source-id: 01c59274a1c196acc8a3a0243814167e1d368b1a
2020-07-16 17:45:07 -07:00
Duc Le
75d354c92b NNLM training in PySpeech
Summary:
Enable support for NNLM training in PySpeech. This implementation slightly modifies Fairseq's `LanguageModelingTask` in a few ways:

1. `source` and `input` used during training are slightly different (see `_maybe_add_bos` under `PySpeechLMDataset`).
2. The underlying model is `PySpeechEncoderModel` instead of `FairseqDecoder`. This lets us interface more easily with PySpeech, and the jitted model can easily be used in C++.

Reviewed By: jay-mahadeokar

Differential Revision: D22077479

fbshipit-source-id: 4918b26ba78de8786870060ada0bc3d3a28d64b0
2020-07-16 10:57:37 -07:00
Myle Ott
77df83ab6e Consolidate distributed init code into distributed_utils.call_main (#1218)
Summary:
We use `distributed_utils.call_main` in most of the other CLI tools (e.g., generate.py, eval_lm.py), but not train.py.

The only place where they're different is that train.py supports TPUs and the `after_distributed_init_fn` hook. We can add that support to `distributed_utils.call_main` and merge them.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1218

Reviewed By: jhcross

Differential Revision: D22556771

Pulled By: myleott

fbshipit-source-id: 4f7110155f5f5d96905ef0bd17a4aa243ec8c443
2020-07-16 10:00:13 -07:00
Yuqing Tang
e52d071ee8 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new multiligual task
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py
- Minor changes to
    - fairseq/data/iterators.py to allow dynamic batch sampler
    - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Reviewed By: pipibjc

Differential Revision: D22483484

fbshipit-source-id: 283b67e538508f330b0968609b7dae64d26bea05
2020-07-16 09:34:29 -07:00
Yuqing Tang
033daef0fc Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: multiligual dataset manager
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- Major work is in fairseq/data/multilingual
    -  fairseq/data/multilingual/multilingual_data_manager.py to support a few sophisticated multilingual data combinations
    -  fairseq/data/multilingual/sampling_method.py to support basic sampling functions

Reviewed By: pipibjc

Differential Revision: D22483471

fbshipit-source-id: 3d9d2643877a29333915975020e419508887b3ae
2020-07-16 09:34:28 -07:00
Yuqing Tang
c0b5226853 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new datasets (#1205)
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- Major work is in fairseq/data/multilingual
   - fairseq/data/multilingual/sampled_multi_dataset.py to enable sampling and virtual data sizes
   - fairseq/data/multilingual/sampled_multi_epoch_dataset.py to enable virtual epoch data size to start training without going through the whole data (which reduces the loading from 1.5 hours into <30 seconds)
    - [next diff] fairseq/data/multilingual/multilingual_data_manager.py to support a few sophisticated multilingual data combinations
    - [next diff] fairseq/data/multilingual/sampling_method.py to support basic sampling functions
- [next diff] A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py
- Minor changes to
    - fairseq/data/language_pair_dataset.py to (1) have language IDs in the batch if they are set, (2) allow a preset max_size of batch; (2) corresponding changes to fairseq/data/data_utils.py
    - [next diff] fairseq/data/denoising_dataset.py to (1) allow additional transformation; (2) allow a preset max_size of batch;
    - [next diff] fairseq/data/iterators.py to allow dynamic batch sampler
    - [next diff] fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1205

Test Plan:
buck test mode/dev //deeplearning/projects/fairseq-py:test_cpu -- 'test_translation_multi_simple_epoch \(tests\.test_binaries\.TestTranslation\)'

https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259

Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259
      ✓ deeplearning/projects/fairseq-py:test_cpu - test_translation_multi_simple_epoch (tests.test_binaries.TestTranslation) 331.967 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174727046259
Summary (total time 352.88s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Reviewed By: myleott

Differential Revision: D22463947

Pulled By: tangyuq

fbshipit-source-id: e430c040231035af73141dc736960bd972bd4b6e
2020-07-16 09:34:23 -07:00
Mandeep Baines
84896af72c Fix memory-efficient-fp16 when using update_freq other than 1 (#1219)
Summary:
Tested the following model and verified that gnorms and losses match
the following commit:

commit 3b7cf75584
Author: m_fomicheva <mari.fomicheva@gmail.com>
Date:   Wed Jul 8 13:04:55 2020 -0700

The loss and gnorm are identical to the number of digits reported in the logs
and the ppl is very close to many signficant digits.

Thanks again to Jun Ru for reporting.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1219

Test Plan:
CUDA_VISIBLE_DEVICES=0 fairseq-train --task language_modeling   data-bin/wikitext-103   --save-dir checkpoints/transformer_wikitext-103   --arch transformer_lm --share-decoder-input-output-embed   --dropout 0.1   --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0   --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07   --tokens-per-sample 512 --sample-break-mode none   --max-tokens 2048 --update-freq 16   --max-update 50000  --memory-efficient-fp16 --no-progress-bar --log-interval 1 --seed 4

Before (commit 3b7cf755):

2020-07-15 12:17:28 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555252, wps=7165.8, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=208
2020-07-15 12:17:33 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539620, wps=7176.6, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=213
2020-07-15 12:17:37 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492042, wps=7188.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=217
2020-07-15 12:17:42 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487224, wps=7192, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=222
2020-07-15 12:17:47 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465781, wps=7182.5, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=226
2020-07-15 12:17:51 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441564, wps=7185.4, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=231

After:

2020-07-15 15:13:10 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555249, wps=7220.5, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=207
2020-07-15 15:13:14 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539617, wps=7216.3, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=212
2020-07-15 15:13:19 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492041, wps=7220.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=216
2020-07-15 15:13:24 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487228, wps=7229.4, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=221
2020-07-15 15:13:28 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465783, wps=7231.2, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=225
2020-07-15 15:13:33 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441559, wps=7224.5, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=230

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Reviewed By: myleott

Differential Revision: D22560914

Pulled By: msbaines

fbshipit-source-id: f2fdc3daa46de0b75f26cb4d5712e92d1a820d60
2020-07-16 08:10:10 -07:00
Mandeep Baines
9c21a715d6 Fix regression in memory-efficient-fp16 (#1216)
Summary:
The fused_adam optimizer divides by the scale while our logic
multiplies by the scale. I'm surprised this even worked. The
first few iterations had nearly similar loss with the old
code and even converged.

However, Jun Ru noticed that the loss are very different after
more iterations.

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1216

Reviewed By: myleott, shruti-bh

Differential Revision: D22536377

Pulled By: msbaines

fbshipit-source-id: 9328a1764a1895572c18567f99bee3330f25179e
2020-07-15 17:13:49 -07:00
Mandeep Baines
a541b19d85 Add dummy task for translation benchmarking (#1212)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1212

Test Plan:
python train.py \
    -a transformer \
    --clip-norm 0.4 --optimizer adam --lr 0.001 \
    --dropout 0.0 \
    --decoder-layers 7 \
    --encoder-layers 7 \
    --encoder-ffn-embed-dim 2048 \
    --decoder-ffn-embed-dim 2048 \
    --encoder-embed-dim 1024 \
    --decoder-embed-dim 1024 \
    --max-tokens 8192 \
    --criterion cross_entropy --max-update 50 \
    --attention-dropout 0.0 \
    --adam-betas '(0.9, 0.98)' \
    --disable-validation --no-save \
    --task dummy_mt

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Reviewed By: myleott

Differential Revision: D22484873

Pulled By: msbaines

fbshipit-source-id: bc61165ab91290d0b6aa2077c968ab537bce8a6a
2020-07-15 16:09:13 -07:00
Myle Ott
ffecb4e349 Small fixes (#1215)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1215

Reviewed By: ngoyal2707, msbaines

Differential Revision: D22514719

Pulled By: myleott

fbshipit-source-id: 5f15ba501fd66af1eb49b5702aff940f06c3d91f
2020-07-14 14:17:13 -07:00
Aditya Pillai
5d88d379ca bug fix: use cls.load_dictionary for multilingual translation
Summary:
Currently, multilingual translation imports Dictionary and calls its load function.

However, this does not permit extending the class with a different load_dictionary function to modify its behavior.

Reviewed By: myleott, chtran

Differential Revision: D22441356

fbshipit-source-id: b0ef159182b15adb479b117581ddcd2f65724980
2020-07-09 13:22:26 -07:00
Mandeep Baines
16e9661bd9 avoid fp16 unscales and multiply_grads (#1201)
Summary:
The object of this patch is to avoid fp16 unscale calls which can
potentially under/over-flow by 1) scaling grad_narm instead of
unscaling grads before calculating grad_norm and 2) using scale
argument to step (if supported by optimizer). By letting the
optimizer scale we avoid multiply_grads (saving on GPU compute/mem).
We also get better precision since the unscale occurs in the kernel
resulting in an FP32 unscaled grad instead of an FP16 unscaled grad.

A side-effect of this patch is a noticeable WPS win due to a
multi-tensor kernel being used for grad_norm and because we
avoid multiply_grads.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1201

Test Plan:
Verified grad_norm and loss before and after.

Before:

epoch 001 | loss 19.506 | ppl 744403 | wps 13966.7 | ups 0.21 | wpb 65536 | bsz 128 | num_updates 50 | lr 6.34875e-06 | gnorm 8.173 | loss_scale 10 | train_wall 250 | wall 259

After:

epoch 001 | loss 19.506 | ppl 744363 | wps 14003 | ups 0.21 | wpb 65536 | bsz 128 | num_updates 50 | lr 6.34875e-06 | gnorm 8.173 | loss_scale 10 | train_wall 250 | wall 258

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Reviewed By: myleott

Differential Revision: D22251842

Pulled By: msbaines

fbshipit-source-id: e6d82cdd3c95e7770835abe054db4b50e6ad569e
2020-07-08 14:47:36 -07:00
m_fomicheva
2887663811 Implemented applying dropout at inference time (#2308)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2308

Implemented Monte Carlo dropout. Added README to reproduce the results from our paper
that applies this idea for unsupervised quality estimation of NMT (joint work of Facebook AI and the University of Sheffield):

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia. Unsupervised Quality Estimation for Neural Machine Translation. Accepted to TACL

Retaining dropout at test time is not possible in the current code base. The statement
```
if not self.retain_dropout:
  model.eval()
```
in `SequenceGenerator` does not have any effect, since model `training` attribute is already set to False by the method `make_generate_fast_`, which is applied before initializing `SequenceGenerator` in `generate.py`. `make_generate_fast_` throws an exception when trying to set `training` to True after its application. Also, if I am not mistaken `self.training=True` can have other effects, so setting it to True only for the purpose of retaining dropout at test time might be confusing. I propose an alternative implementation where `retain_dropout` is an attribute of FairseqModel class.

# Before submitting

- [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [Y] Did you make sure to update the docs?
- [Y] Did you write any new necessary tests?

## What does this PR do?
New feature.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2151

Reviewed By: ngoyal2707

Differential Revision: D22048889

Pulled By: myleott

fbshipit-source-id: 0d0d4784a7314fc7a45b76341fd3b8232b3e2cf0
2020-07-08 13:06:13 -07:00
Myle Ott
d73e543e38 Update LinformerSentenceEncoder to inherit from TransformerSentenceEncoder
Summary:
It seems we can make this work by setting `compress_layer` in `build_transformer_sentence_encoder_layer` and adding an "init_fn" callback.

Doing this refactoring now since the stacked diff (D22048889) broke Linformer training, so safer to inherit from TransformerSentenceEncoder directly.

Reviewed By: ngoyal2707

Differential Revision: D22411012

fbshipit-source-id: d4ecb71eedd6ddf49abbb1e700d0f2af24e39e5a
2020-07-08 13:06:10 -07:00
Wei Ho
9f92b05e2a TorchElastic for fairseq FBTranslate
Summary: Use TorchElastic for multi-node, multi-GPU training

Reviewed By: cndn

Differential Revision: D22083634

fbshipit-source-id: 3673308671b0bc985b6012ee5327d604d995409f
2020-07-08 00:26:52 -07:00
Gil Keren
7816946ff9 Fix memory leak with small data-buffer-size
Summary:
As part of zhengwy888's debugging of a memory leak, he suggested that trimming the number of batches in pyspeech's train.py may cause the BufferedIterator to leave some batches in the queue, causing a memory leak.

Therefore, propagating `take` to the buffered iterator, which should prevent the consumer thread from hanging on `queue.put`.

Reviewed By: myleott

Differential Revision: D22405263

fbshipit-source-id: 80f40a355652016af4ba8c386b623cb0552b1928
2020-07-07 16:42:12 -07:00
Siddharth Shah
578164a0ef 0 warmup in tri stage lr scheduler
Summary:
Current code fails due to division by zero. This diff allows for zero warmup in
tri stage scheduler.

Reviewed By: myleott

Differential Revision: D22416482

fbshipit-source-id: dedb41ac141528314dc86cd73b8b67e699bf457b
2020-07-07 16:15:50 -07:00
Myle Ott
97ca0c022c Fix data hang with buffered iterator (#1206)
Summary:
According to Tom Birch: "I think there's an issue with torch.utils.data.dataloader._MultiProcessingDataLoaderIter when next(...) is supposed to raise StopIteration it just blocks indefinitely instead." This PR is a workaround that fixes the issue.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1206

Reviewed By: froody

Differential Revision: D22411150

Pulled By: myleott

fbshipit-source-id: 7cdfa67cf55e9cff81cf7d4904f1d38bfa36a0d0
2020-07-07 10:23:09 -07:00
Myle Ott
fc29aab203 Fix model parallel training after quantization/interactive.py changes (#1202)
Summary:
- fix model parallel training after output_projection changes
- fix training with non-vocab parallel criterions
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1202

Reviewed By: ngoyal2707

Differential Revision: D22266462

Pulled By: myleott

fbshipit-source-id: c7bb9a95c01f5fdaf415a709a93bacb15336271c
2020-07-06 08:27:27 -07:00
Daniel Adkins
a87cafda71 update fairseq binarizer to use PathManager
Summary:
Currently, fairseq binarizer does not work with Manifold files, making it incompatible with some internal procedures. This change preserves the old functionality while allowing Manifold files to be passed into binarizer functions.

motivated by theweiho: "I think we should change Binarizer to use PathManager so that it can handle either Manifold path or POSIX path" (D22241626)

Reviewed By: akinh

Differential Revision: D22293525

fbshipit-source-id: d1bf4f8b50dda6a9214ee2fbe45e112ca9628f60
2020-06-30 12:53:33 -07:00
Belinda Li
894ae64858 Add Linformer to internal fairseq
Summary: Adding linformer

Reviewed By: myleott

Differential Revision: D22253918

fbshipit-source-id: 0bb86dddae1be09450544cb25530400e914c640f
2020-06-27 16:11:26 -07:00
Ning Dong
d2b5265a60 Merge FBSequenceGenerator & SequenceGenerator
Summary: See discussion in D20995796 (4725487bbc). Will merge 2 diffs if this looks good to you myleott jhcross

Reviewed By: myleott

Differential Revision: D21214974

fbshipit-source-id: ebb59b0491a8c209bed2420a0cd94e9c41d05f2e
2020-06-24 22:30:46 -07:00
Myle Ott
f0a61a2774 Miscellaneous fixes (#1196)
Summary:
Incorporate several fixes, incl. from OSS contributors:
- fix model argument in sequence generator in semisupervised_translation.py
- fix aggregate logging in semisupervised_translation.py
- Fix EOS token in multilingual_denoising
- Handle missing eos_idx in data_utils.collate_tokens
- Better OOM handling for single-GPU training
- fix prepend_bos argument in translation_from_pretrained_bart.py …
- Fix eos_idx in multilingual_denoising
- Small logging fixes
- Fix fb_hub on PyTorch 1.6
- Better variable names
- Add support for model parallel to interactive.py
- Use `//` operator to fix Integer division warning
- Set default `--clip-norm=0.0`
- Cleanup some binaries in root directory

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196

Reviewed By: ngoyal2707

Differential Revision: D22162202

Pulled By: myleott

fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d
2020-06-24 10:08:53 -07:00
Myle Ott
da94e58c70 TPU support for Translation (#2245)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2245

Reviewed By: ngoyal2707

Differential Revision: D22070745

Pulled By: myleott

fbshipit-source-id: e43a96a585366b10d997a12522e8cd6496294ad2
2020-06-24 09:56:42 -07:00
Marco Gaido
a12c5c5de8 Add max position params to speech recognition (#1783)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1782.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1783

Reviewed By: okhonko

Differential Revision: D21663633

Pulled By: myleott

fbshipit-source-id: 5f3b4b7df83e27d866efb489daeffb3b38a66f38
2020-06-23 06:48:47 -07:00
Myle Ott
d0ccc3e02e Add FairseqDecoder.reorder_incremental_state_scripting for TorchScript (#1190)
Summary:
The main changes are in fairseq_incremental_decoder.py. I made the base `reorder_incremental_state` implementation a no-op and instead we expect callers (e.g., SequenceGenerator) to call `reorder_incremental_state_scripting`.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1190

Test Plan:
I ran unit tests both in PyTorch 1.5 and nightly (1.6).

I also tested some of the pretrained translation models, but it'd be good to test with some prod runs.

Reviewed By: jhcross

Differential Revision: D22095614

Pulled By: myleott

fbshipit-source-id: 484b8d47b4feda4efe52233a3d46a207d0816766
2020-06-22 18:54:28 -07:00
Ronan Riochet
d5d2cf3cd5 Add timeout kwarg to EpochBatchIterator (#2261)
Summary:
Add an optional ```timeout``` argument to ```EpochBatchIterator```.

I need it to fix this issue: https://github.com/pytorch/pytorch/issues/2474

I could do something more general, allowing one to pass ```**dataloader_kwargs``` to ```torch.utils.data.DataLoader```, if you think it's worth.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2261

Reviewed By: huihuifan

Differential Revision: D22162936

Pulled By: myleott

fbshipit-source-id: 959b408a53356c19c04fc5ae94aad5f164a32dcd
2020-06-22 18:43:25 -07:00
gvskalyan
88c58b6718 Preprocess dict number (#2228)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2227 .

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2228

Reviewed By: huihuifan

Differential Revision: D22163032

Pulled By: myleott

fbshipit-source-id: a5afbfca2d9a11563026f47cd246654e131d92fb
2020-06-22 18:40:35 -07:00
Yi-Hsiu Liao
e187f6e116 add maybe_no_sync for multilingual_translation task (#2238)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ x ] Did you write any new necessary tests?

## What does this PR do?
This PR reduces unnecessary communication overhead between GPUs since we only need to sync up once for all lang-pairs. We see significant training speedup especially with large number of lang-pairs.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2238

Reviewed By: pipibjc

Differential Revision: D22149086

Pulled By: myleott

fbshipit-source-id: 6fff09e5a51b49bdcf5bc3986c0719b19d31c0a9
2020-06-22 18:25:59 -07:00
Tony Lekhtman
a9cb84df68 Update hub_utils.py (#2253)
Summary:
fix bug for print_alignment

# Before submitting

- [ V] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ V] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?   not relevant
- [ ] Did you write any new necessary tests?  not relevant

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1880 .

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2253

Reviewed By: huihuifan

Differential Revision: D22162948

Pulled By: myleott

fbshipit-source-id: 3ec5508506184a9effa330fbcd43ffe917b533c6
2020-06-22 15:28:57 -07:00
Mike Ruberry
320bf8cf96 Updates full to no longer use deprecated integer fill_value type inference
Summary:
In PyTorch 1.5 using an integer fill_value and not setting the dtype or out kwarg with torch.full was deprecated, and soon will throw a runtime error. In the future, torch.full will infer its dtype from the fill_value, and these would produce integer, not float, tensors. This update maintains the current behavior.

Created from Diffusion's 'Open in Editor' feature.

Reviewed By: myleott

Differential Revision: D22161456

fbshipit-source-id: b5d687e4de83dba6e76cae6e61b5106bf5b320db
2020-06-22 11:56:58 -07:00
Joshua Meier
8eb9123f56 Patch masked_lm memory leak on GPUs (#1195)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes memory leak in masked_lm criterion.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1195

Reviewed By: myleott

Differential Revision: D22155285

Pulled By: joshim5

fbshipit-source-id: 9414e307e1e2d2a9225884dc94aae964a1627682
2020-06-22 10:02:21 -07:00
Myle Ott
3ea511d899 Revert Dataloader changes
Summary:
D22052683 may have introduced a memory leak, revert those parts for now

The original motivation is described here: https://github.com/pytorch/fairseq/issues/2168.
Previously I/O was bursty when training with large update frequency.
This meant to even it out, but possibly introduced a memory leak.

More context on the change can be found here: https://github.com/pytorch/fairseq/issues/2168

Reviewed By: yqwangustc

Differential Revision: D22156157

fbshipit-source-id: 390ff39bc3e268d6312971768c34fe44d4bd84b7
2020-06-22 06:48:37 -07:00
Mandeep Baines
6f6461b81a Add tracepoints (#1192)
Summary:
There is no overhead when the profiling is not enabled.
When running using profile.py, I measure an overhead of 3%.

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1192

Reviewed By: sidgoyal78

Differential Revision: D22102341

Pulled By: msbaines

fbshipit-source-id: ffddb9cceb853df88db34195be18bae7723d4c98
2020-06-19 16:23:22 -07:00
Rohit Kopparthy
8d8d773c57 Set EncoderOut Attributes to None instead of torch.empty(0)
Summary: The ConvTransformer model throws an error during training because of certain attributes having been changed to torch.empty(0) instead of None to meed torchscript type requirements. Existing assertion checks only check if these attributes are not None, rather than not torch.empty(0). To fix this, types have been modified to Optional types and allowed to stay as None like before.

Reviewed By: zhengwy888

Differential Revision: D22115126

fbshipit-source-id: de3c7b64c5e7142c860a354f778b8b818a7b0bb8
2020-06-19 10:51:50 -07:00
Wei Ho
d617c292f8 Apply black formatter to fairseq_cli/train.py
Differential Revision: D22125634

fbshipit-source-id: a05f483ac4b564f5d7a21f5ae3605615e7fcd263
2020-06-18 18:38:11 -07:00
Gil Keren
82f99df8e4 Gradually releasing the restrictions on data-buffer-size
Summary: the buffer was a suspect in creating some everstore overload, therefore was restricted in D21804332. But since it's part in those problems was inconclusive, and the everstore read limit was increased for the speech group, gradually increasing it back.

Differential Revision: D22076534

fbshipit-source-id: cb01d50d4df5843b86f7d730e1805a88ea3f41d8
2020-06-17 18:50:23 -07:00
Yongqiang Wang
c294e2fcfb print out all the CUDA environment information (including name, memory size,
Summary:
Recently, we found there are more and more likely that different
generations (V100 vs P100) / memory size (16GB, 32GB) GPUs are mixed up in
training, while the users do not even know about this. Print out this message
can be helpful for debugging

Reviewed By: myleott

Differential Revision: D21782630

fbshipit-source-id: 7e1075e1b928d969594bbee92275a819cf1a0877
2020-06-17 13:47:32 -07:00
Rohit Kopparthy
3c16b002b9 Scripting ConvTransformer
Summary: This diff is building off of D21986239 to script the ConvTransformer Model instead of the VggTransformer. The changes made in data_utils.py were copied over from D20443519. A new file called test_convtransformer.py was added to test scripting the model. The scripted model compiles and also produces the same output as before scripting.

Reviewed By: myleott

Differential Revision: D22022654

fbshipit-source-id: 8f5a36a9af391142b468818650be3af218235fc2
2020-06-16 12:16:03 -07:00
Myle Ott
14ee059a36 Dataloading fixes (#1189)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1189

Reviewed By: ngoyal2707

Differential Revision: D22052683

Pulled By: myleott

fbshipit-source-id: afdfda291907ad4441af51cfc9e44f1bd01ea696
2020-06-16 11:49:39 -07:00
Alex Xiao
8570277f91 dataset sampling for minibatch training
Summary:
**Motivation:**

We have 3 datasets: Portal, Video, and Messenger Voice Clips. We want to specify a distribution [p1, p2, p3] such that we sample utterances from Portal with prob p1, etc.

Previously, D21675421 samples from datasets by **batches**.  This is not acceptable for minibatch training, as we need to maintain LSTM states across consecutive batches. As a result, we need utterance level sampling, not batch level.

**Design**

1. Created a new MultiCorpusDataset, similar to MultiCorpusSampledDataset, except it does sampling on utterance level. Specifically, everytime `ordered_indices` is called, a new sample of the multiple datasets is generated based on an input distribution. We ensure that the randomness of this is seeded by the input seed and the epoch, to enable reproducibility on loading from checkpoints.
2. Created MiniBatchMultiCorpusDataset, which adds minibatch specific logic to MultiCorpusDataset, mainly for handling things like the start frame and deleting cache.
3. Refactored different sampling strategies into a single `build_sampled_dataset` for easy re-use.
4. Added flag --reset-iterator, enabling us to reset the batch iterator every epoch, enabling a new `ordered_indices` to be generated every epoch
5. some minor refactoring of existing code

**Usage**

1. In your data.json, include extra splits in addition to "train" (i.e. "portal", "video"), with whatever transforms/handle file you want.
2. In your flow, provide "--extra-splits portal 0.2 video 0.3" and "--reset-iterator" as flags
3. Enjoy WER improvements

Differential Revision: D21887303

fbshipit-source-id: 6b377bed8a68a8e72e2528f8a5a28b675eebaadf
2020-06-15 15:28:01 -07:00
Yongqiang Wang
86edf989dd cast grad_norm to float in case fp16 training
Summary:
we found that grad_norm could become inf because it is accumulated in
meter many times; and fp16 it becomes easy to overflow. Using fp32 for each `grad_norm`
cost minimum memory

Reviewed By: myleott

Differential Revision: D22015643

fbshipit-source-id: 429d24bbb9c9a785edf0bfb06480497022f80418
2020-06-12 22:12:52 -07:00
Joshua Meier
242269d439 Fix truncation in sentence_ranking (#1185)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes current breaking change on master.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1185

Reviewed By: myleott

Differential Revision: D21924644

Pulled By: joshim5

fbshipit-source-id: 0eabd2393c76060dcf1568eba308878a90af7a87
2020-06-10 11:59:58 -07:00
Mike Ruberry
2e1da09a9c Updates argument to np.arange to avoid performing floor division using torch.div
Summary:
Performing floor division with torch.div is deprecated and will soon throw a runtime error. Perhaps surprisingly, calling np.arange on a torch tensor can use torch.div to perform floor division. Taking the number from the tensor using .item() should prevent this issue and keep this code working.

Created from Diffusion's 'Open in Editor' feature.

Reviewed By: lematt1991

Differential Revision: D21941120

fbshipit-source-id: 4d76451d4b33d487946af1c2f9ed21eca858cb06
2020-06-09 13:04:47 -07:00
Gil Keren
5c4f0f8903 Better exception handling for the data buffer thread
Summary: When data-buffer-size != 0 was used, an exception happening in the data preparation (therefore in the buffer thread) was not raised properrly, and the main thread hanged on `queue.get`. This fixes it, by raising the error to the main thread.

Reviewed By: myleott

Differential Revision: D21917739

fbshipit-source-id: 8d3f875b663b37625f44a943fb3904e25216db06
2020-06-08 10:52:41 -07:00
Xilun Chen
e03bfd9bf4 Check if the checkpoint is from the latest version before updating the state_dict in TransformerDecoder.upgrade_state_dict_named() (#2222)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2222

When share_input_output_embed is set to True, the existing code always overrides output_projection.weight with embed_tokens.weight

This is unncessary, and caused a very obscure bug in our custom BART model.

Added a check to skip the update to state_dict if f"{name}.output_projection.weight" is already in the checkpoint.

Reviewed By: myleott

Differential Revision: D21915833

fbshipit-source-id: d298e24394be2ee85c8f686ba459b7e4cbd4298a
2020-06-08 10:16:59 -07:00
Joshua Meier
2699f4a28b add random sequence truncation (#1173)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Allows taking random crops in language_modeling task.
Discussed with myleott robert-verkuil tomsercu in meeting yesterday. Ultimately took a different, more general approach to implementing this.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1173

Reviewed By: myleott

Differential Revision: D21904561

Pulled By: joshim5

fbshipit-source-id: 66e8dfb10a0d36b76acd2eb181e00db6fc2433fc
2020-06-05 12:58:07 -07:00
Joshua Meier
152a3fe143 Support residual connections in LSTM models (#1103)
Summary:
Adds support for residual connections in LSTM models.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1103

Reviewed By: myleott

Differential Revision: D21639942

Pulled By: joshim5

fbshipit-source-id: a02ddfe080a847fd91a9c6a5074cb6dc782f7727
2020-06-05 12:15:10 -07:00