Commit Graph

275 Commits

Author SHA1 Message Date
Changhan Wang
ee177fc4fa add xm_transformer test; refactor speech tests
Summary: add xm_transformer test; refactor speech tests

Reviewed By: sravyapopuri388

Differential Revision: D33312231

fbshipit-source-id: a2b2695fc3c10d5420abbe23a4a3005777aa2ae1
2021-12-31 12:31:11 -08:00
Liang Tan
2762a1cfef Add regularization for multihead attention module and ffn module
Summary: [Fairseq] Add regularization for multihead attention module and ffn module

Reviewed By: dianaml0

Differential Revision: D32441521

fbshipit-source-id: c648c1f8ec1a3310ba90c4952cdd40a21b959d26
2021-12-30 02:02:05 -08:00
Diana Liskovich
7fddb9d960 lint fixes (#2834)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Applied `black` and `isort` to fix failing CI

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2834

Reviewed By: vedanuj

Differential Revision: D33262876

Pulled By: dianaml0

fbshipit-source-id: 03215c276fcddda9f7c78971bf6ed7c5ac21b2ee
2021-12-29 11:50:55 -08:00
Xian Li
7f3967805f add readme for xglm models (#2808)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Add readme and task for xglm models.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2808

Reviewed By: punitkoura

Differential Revision: D33237928

Pulled By: xianxl

fbshipit-source-id: 7773cf56e896210dab1f4311ae69f0e00c6d9aff
2021-12-20 13:05:17 -08:00
Diana Liskovich
a54021305d formatting fix (#2816)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
fix `black` failures

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2816

Reviewed By: alexeib

Differential Revision: D33172615

Pulled By: dianaml0

fbshipit-source-id: 36b141f42941670f1bfa981041d878042feb0428
2021-12-16 16:11:19 -08:00
Changhan Wang
7b0159a202 add integration test for fastspeech2
Summary: Adding integration test (based on test set scores on pre-trained checkpoints) for fastspeech2

Reviewed By: yuntang

Differential Revision: D33143301

fbshipit-source-id: dca0841b43dd1cb2933ce5c652ed3cdff0fc4a52
2021-12-15 16:15:38 -08:00
Changhan Wang
ee833ed49d speech integration tests (batch 1)
Summary:
Adding the first batch of speech integration tests (based on test set scores on pre-trained checkpoints) for
- S2T transformer
- TTS transformer

Reviewed By: yuntang

Differential Revision: D33050653

fbshipit-source-id: fb5bb9f46e8e17cb705971ca1990c8e1cb99d5f9
2021-12-14 17:42:18 -08:00
Jingfei Du
16ebfa752c Revert preix beamsearch fix (#2763)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
reverting to fix issue mentioned [here](https://github.com/pytorch/fairseq/issues/3913). Having another PR for fixing the original issue later.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2763

Reviewed By: myleott

Differential Revision: D33000411

Pulled By: jingfeidu

fbshipit-source-id: 95a54cbdc612129a0eab4b5e6aa576a5bcf00588
2021-12-14 13:22:09 -08:00
Changhan Wang
8548f1d401 Add loading from HuggingFace Hub
Summary: Add loading from HuggingFace Hub. Revised from and to replace D32697723 (accepted).

Reviewed By: pipibjc, dianaml0

Differential Revision: D32964041

fbshipit-source-id: 39676aa0ecb10454ae76b70968d5abe96ab6da54
2021-12-10 16:55:12 -08:00
dianaml0
88e7d2586b fix flake8 issues (#2570)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
- [x] applies flake8 fixes to main branch (https://github.com/fairinternal/fairseq-py/issues/2546) - still more to be fixed

Fix GPU tests:
- [x] when torch.ao.quantization import doesn't work use torch.quantization
- [x] build apex from earlier commit in circleci so that its compatible with pytorch 1.8 and 1.9

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2570

Reviewed By: Mortimerp9

Differential Revision: D32955312

Pulled By: dianaml0

fbshipit-source-id: e163cbd4998f171f819e31b0682c1c0f1986f9e1
2021-12-09 02:34:30 -08:00
dianaml0
0dfd6b6240 Add linting with black (#2678)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2678

Reviewed By: Mortimerp9

Differential Revision: D32653381

Pulled By: dianaml0

fbshipit-source-id: 2810d14867cd7d64f4d340740e2b590b82de47fe
2021-11-29 12:32:59 -08:00
Sam Shleifer
fb64e43c67 skip remainder batch (#2464)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2464

Reviewed By: myleott

Differential Revision: D31742871

Pulled By: sshleifer

fbshipit-source-id: e5d29ca9d594abd92212eb24b60c991f2840a4e8
2021-11-24 07:50:50 -08:00
Vinayak Tantia
3a5838c320 Update implemention of SlowMo to its implementation in Fairscale (#3996)
Summary:
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo.
Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](https://github.com/facebookresearch/fairscale/pull/378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3996

Reviewed By: dianaml0

Differential Revision: D32280163

Pulled By: vtantia

fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70
2021-11-09 09:44:45 -08:00
Sam Shleifer
c5ff181125 NormFormer: flags and docs (#2460)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2460

Reviewed By: myleott

Differential Revision: D31731798

Pulled By: sshleifer

fbshipit-source-id: 938456c17aa004cacffdcdd124aebe390da83d5f
2021-10-19 17:13:04 -07:00
Vimal Manohar
1ef3d6a1a2 CPLTask for training with continuous pseudo labeling
Summary:
CPLTaskImpl provides implementation to augment existing tasks to take additional input of ema_model in its train_step and valid_step for continous pseudo-labeling (CPL) during training. It passes this ema_model to the criterion.

See Kaizen semi-supervised training paper for more details https://arxiv.org/abs/2106.07759.

This implementation also supports using CPLDataset which enables using unsupervised data only for `cpl_finetune_epoch > epochs >= cpl_start_epoch`. CPLDataset is like MultiCorpusDataset but ignores the unsupervised datasets while sampling.

Another addition in this diff is to skip dataset in MultiCorpusDataset if the sampling probability is 0.

Reviewed By: cruvadom

Differential Revision: D30701536

fbshipit-source-id: 1d840eacfd538ed7aed3baaefc8b254390642b45
2021-10-14 22:09:07 -07:00
Vimal Manohar
8feccf9441 EMA
Summary:
Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759

1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint.
2. `ema.ema_start_update` to control when the EMA starts accumulating
3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False)
4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this.

```
This module has the EMA class used to store a copy of the exponentially decayed
model params.

Typical usage of EMA class involves initializing an object using an existing
model (random or from a seed model) and setting the config like ema_decay,
ema_start_update which determine how the EMA model is updated. After every
update of the model i.e. at the end of the train_step, the EMA should be updated
by passing the new model to the EMA.step function. The EMA model state dict
can be stored in the extra state under the key of "ema" and dumped
into a checkpoint and loaded. The EMA object can be passed to tasks
by setting task.uses_ema property.
EMA is a smoothed/ensemble model which might have better performance
when used for inference or further fine-tuning. EMA class has a
reverse function to load the EMA params into a model and use it
like a regular model.
```

Reviewed By: cruvadom

Differential Revision: D24238379

fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2
2021-09-01 12:29:51 -07:00
Pierre Andrews
68a81202a3 Indexed Huffman Coded dataset (#2029)
Summary:
## What does this PR do?

Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk.

When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies.

In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols.

In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it.

Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space.

## Follow Ups

currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing:
```
code_builder = HuffmanCodeBuilder()
with open(sample_file, 'r', encoding="utf-8") as input:
    for line in input:
        code_builder.add(*line.strip().split(" "))

coder = code_builder.build_code()

with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder:
    with open(dataset_file, 'r', encoding="utf-8") as input:
        for line in input:
            builder.add_item(line.strip().split(' '))
```

a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class

the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2029

Reviewed By: dianaml0

Differential Revision: D29557468

Pulled By: Mortimerp9

fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe
2021-08-31 01:12:35 -07:00
Jingfei Du
932a3d4aad fix beam search with prefix tokens (#2227)
Summary:
1. added test for genereting pad tokens during beam search with prefix
tokens
2. modified lprobs for pad token and prefix tokens to avoid generating
pad

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2227

Reviewed By: xianxl

Differential Revision: D30649356

Pulled By: jingfeidu

fbshipit-source-id: d94903a912e767391c8fca61f98f65b5cea3b56e
2021-08-30 18:07:13 -07:00
Pierre Andrews
bc1504d4d7 Hierarchical Configs
Summary:
This is a precursor to D29232595

The current behaviour to convert a dataclass to a namespace is that all the fields from all DCs in the field hierarchy are flattened at the top. This is also the legacy behaviour with `add_args`.

This is kind of cumbersome to build reusable Dataclasses as we need to make sure that each field has a unique  name. In the case of Transformer for instance, we have a Decoder and Encoder config that share a large part of their fields (embed_dim, layers, etc.). We can build a single dataclass for this that can be reused and extended in other implementations. To be then able to have  a flat namespace, instead of adding all subfields as is to the root namespace, we introduce the name of the field as prefix to the arg in the namespace.

So:
`model.decoder.embed_dim` becomes `decoder_embed_dim` and `model.encoder.embed_dim` becomes `encoder_embed_dim`.

Reviewed By: myleott, dianaml0

Differential Revision: D29521386

fbshipit-source-id: f4bef036f0eeb620c6d8709ce97f96ae288848ef
2021-07-16 04:56:12 -07:00
Omry Yadan
dd106d9534 fixes tests/test_train.py to mock checkpoint.save_dir config node (#3675)
Summary:
## What does this PR do?
Some downstream users reported that errors when passing Namespace to load_checkpoint().

A recent change made the assumption that the passed object is dict like (dict or DictConfig) that have a get function.
This changes that and make sure the mocked config have checkpoint.save_dir to allow the test to run.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3675

Reviewed By: omry

Differential Revision: D29564805

Pulled By: lematt1991

fbshipit-source-id: 89308811da382667f6c5d3152ee2d6480416ee62
2021-07-06 15:07:31 -07:00
Pierre Andrews
53bf2b1293 Extract File Chunking to its own utils (#1955)
Summary:
## What does this PR do?

there are a few places where we do file chunking for multiprocessing a single file. However, the code is partly in Binarizer and partly just duplicated here and there.

This PR extracts the file chunking/reading logic. The multiprocessing logic could probably be extracted too, but I haven't found a good abstraction yet.

# Testing

Added testing for this reading logic + maybe fixed a bug where the last part of a file might get dropped (even if it's unclear with the current stopping logic)

Tested by running the preprocessing script as follow:
```
python -m fairseq_cli.preprocess --source-lang de --target-lang en --trainpref ...train.spm.clean.de_en --srcdict ...fairseq.dict --tgtdict .../fairseq.dict --destdir ... --workers 60
```

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1955

Reviewed By: myleott

Differential Revision: D29065473

Pulled By: Mortimerp9

fbshipit-source-id: c60843de8cfd45a63b3dbb8290f57ef3df3bf983
2021-06-28 01:46:32 -07:00
Diana Liskovich
50158da3a7 Migrate DummyMaskedLMTask to FairseqTask (#3593)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3593

Reviewed By: msbaines

Differential Revision: D28992614

Pulled By: dianaml0

fbshipit-source-id: b2dfcab472a65c41536e78600a0e6b3745dc3a08
2021-06-10 09:43:08 -07:00
Mandeep Singh Baines
9497ae3cfb disable raise_if_valid_subsets_unintentionally_ignored check for dummy tasks (#3552)
Summary:
Fixes the following crash:
```python
Traceback (most recent call last):
  File "/private/home/msb/.conda/envs/fairseq-20210102-pt181/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/private/home/msb/code/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main
    main(cfg, **kwargs)
  File "/private/home/msb/code/fairseq/fairseq_cli/train.py", line 117, in main
    data_utils.raise_if_valid_subsets_unintentionally_ignored(cfg)
  File "/private/home/msb/code/fairseq/fairseq/data/data_utils.py", line 584, in raise_if_valid_subsets_unintentionally_ignored
    other_paths = _find_extra_valid_paths(train_cfg.task.data)
AttributeError: 'Namespace' object has no attribute 'data'
```

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3552

Reviewed By: sshleifer

Differential Revision: D28667773

Pulled By: msbaines

fbshipit-source-id: bc9a633184105dbae0cce58756bb1d379b03980a
2021-05-27 12:15:31 -07:00
Gagandeep Singh
237184e522 Add torch.cuda.amp support (#3460)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/3282
Add support for `torch.cuda.amp`
AMP can be enabled by `--amp`, instead of using `--fp16` for the already present full fp16 support.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3460

Reviewed By: sshleifer, msbaines

Differential Revision: D27932253

Pulled By: myleott

fbshipit-source-id: 21637aefb5e788c59bf4f3c5de6c4a80f7319543
2021-05-26 14:39:10 -07:00
Weiyi Zheng
8df9e3a4a5 support FSDP sharded_state checkpoint loading during inference
Summary:
using the very useful feature added by QuentinDuval https://github.com/facebookresearch/fairscale/pull/683/files , we can consolidate sharded states into a full regular states. this allows inferences on sharded state almost transparently.

The main complexity comes from trying to be smart about what kind of checkpoint the user wants to load. not sure if this is over-engineering
1. if the file checkpoint-shard0.pt exists, and `--checkpoint-shard-count` is > 1, then we load sharded FSDP checkpoint
2. if checkpoint-shard0.pt exists but --checkpoint-shard-count=1, we load consolidated FSDP checkpoint
3. if checkpoint-shard0.pt does not exist, but --checkpoint-shard-count > 1, we load model parallel checkpoint
4. otherwise we are loading a single, plain checkpoint.

In theory we could be even smarter and load shard0.pt to check how many more checkpoints are needed. this is not implemented, though it will save the user having to specify --checkpoint-shard-count.

Reviewed By: sshleifer

Differential Revision: D28563441

fbshipit-source-id: dcafcaa7c9eaf5c9ff94f55c16bb3424c98dfa59
2021-05-25 17:45:51 -07:00
Sam Shleifer
2be2f3c7c1 Plasma tests: ask for less disk (#1893)
Summary:
Old logs:
```
/arrow/cpp/src/plasma/store.cc:1274: Allowing the Plasma store to use up to 107.374GB of memory.
```

New logs:
```
... up to 1e-05GB of memory.
```

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1893

Reviewed By: myleott

Differential Revision: D28641488

Pulled By: sshleifer

fbshipit-source-id: 3373526042cdcbf434c61790be62a09f15e6ad06
2021-05-24 09:00:18 -07:00
Weiyi Zheng
425c36eaff support use_sharded_state on command line
Summary:
we wanted to use sharded_state because
1. to save memory
2. support sharded state loading, which allows MoE models's weight to live on their respective shard
I just added the use_sharded_state as a config option, and added unit test to make sure it runs fine.

old revision's comment:
fairseq.FSDP has a  flag use_sharded_state, but I had to address a couple problems before being able to use it.
1. fairscale FSDP (FSDP for short) calls self.state_dict/load_state_dict, which has been overwritten by fairseq.FSDP, this is not a desired behavior
2. the optimizer states shouldn't be sharded again when use_sharded_state is True
3. expose this option on the command line.

Reviewed By: sshleifer

Differential Revision: D28375035

fbshipit-source-id: c2f59a9c62163405033f34ed595ba78528aea850
2021-05-14 18:53:16 -07:00
Sam Shleifer
97969ac5f5 --combine-valid-sets (#1843)
Summary:
- `--combine-valid-sets` causes valid.bin, valid1.bin, ... to be concatenated. All metrics will be reported together.
- `--valid-subsets` works the same. If you pass `--valid-subsets valid1,valid2` you get valid1_loss and valid2_loss logged separately.
- if user passes `--valid-subset valid` (the default) and we see files named valid1, valid2 we raise an error. User must pass `--ignore-unused-valid-sets` to override. This previously led to valid1, valid2 being silently ignored.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1843

Reviewed By: myleott

Differential Revision: D28323815

Pulled By: sshleifer

fbshipit-source-id: dfd46076d3f684e36f8dacfadd38fd0038ce6755
2021-05-10 23:43:24 -07:00
Yun Wang
d6855baec8 Simplify CountingIterator
Summary:
Simplify the implementation of `CountingIterator`, and added test cases.

The old implementation could fail on such a test case:
```
ref = list(range(10))
itr = CountingIterator(ref)
first_item = next(itr)  # consume one item
remaining_items = list(itr)  # raises exception because of "length mismatch"
```
This happens because `list(itr)` invokes `itr.__iter__` and reiterate the underlying list from the start, but `itr.n` has been already incremented by `next(itr)`.

The new implementation is simpler and avoids such an error.

Reviewed By: myleott

Differential Revision: D27802505

fbshipit-source-id: c97fd0a27d865c0ff3b24016fa6aa0afabbf0a73
2021-04-29 16:17:00 -07:00
Sam Shleifer
05b86005bc Fix FSDP optim state loading (#1819)
Summary:
### Problem:
- if we consolidate optim state dict on rank 0, rank 1+ save `optimizer.state_dict()`. When they try to load, they call get_shard(last_optim_state), which is wrong since the optim state is already shared. They should find the global consolidated optimizer state dict and load that.

### Possible Solutions:
- if world size is the same, you could just reuse the local OSD.
- [this PR] rank 1+ load optim state from the rank0 file and call get_shard
- separate file for optim_state that every rank loads. (like 'shared.pt' on `gshard-azure`). This will save some CPU Ram.

### Note:
- I don't think it's possible to pass `--use-sharded-state` from the command line. It should be I think.

### Implementation here
+ if FSDP saves -1 as state['last_optimizer_key'], it means that, on load, rank 0's optim state must be loaded.
+ regression test

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1819

Reviewed By: zhengwy888

Differential Revision: D27910281

Pulled By: sshleifer

fbshipit-source-id: d34987008f77ce7e0cb28b7224dd2aabed38a70c
2021-04-21 15:50:13 -07:00
Michael Anderson
207254bf56 Adding check for filler size (#3495)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/3495

Avoid creating size-0 tensor "filler" in case src_len is the same as key_padding_mask_size or prev_key_padding_mask_size

Reviewed By: jackm321

Differential Revision: D27897778

fbshipit-source-id: 26fd95852da2cd932717c7abcac3e1fb43deaf77
2021-04-21 09:09:19 -07:00
Sam Shleifer
da0432a3cd MultiGPU test and --log-file workaround (#1793)
Summary:
The initial problem I set out to solve was that it's not easy to add a multigpu test. I solved that problem but it ruined log capturing, both with `self.assertLogs` and `with contextlib.redirect_stdout(StringIO())`.

After some brief digging, I gave up on trying to get those to work, and added support for `--log-file AGI_v0.log` which will write the `progress_bar.log()` statements to `log-file` as well as `stdout`. This functionality is used by the resumption test.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1793

Reviewed By: myleott

Differential Revision: D27671192

Pulled By: sshleifer

fbshipit-source-id: bcba5f9df7a965889a4cd6993f7eeb0f14b770c6
2021-04-21 06:39:00 -07:00
Sam Shleifer
f6f220e917 Delete line that breaks gh ci (#1814)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1814

Reviewed By: myleott

Differential Revision: D27867552

Pulled By: sshleifer

fbshipit-source-id: ed30e02c962b31797e003cb810c085934a53202c
2021-04-19 16:31:11 -07:00
Sujit Verma
fc90910314 Migrating fairseq-py from fvcore to iopath.
Summary: Migrating fairseq-py from fvcore to iopath.

Reviewed By: myleott

Differential Revision: D27109864

fbshipit-source-id: 041177c1bc9b5793b2ce0ecab87692097f3f353b
2021-04-14 21:54:08 -07:00
Guillaume Wenzek
436166a00c fix MultiHeadAttention assert (#1798)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/fairinternal/fairseq-py/issues/1538.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1798

Reviewed By: myleott

Differential Revision: D27710902

Pulled By: gwenzek

fbshipit-source-id: 2efdf645bb30e4cf6653c48371bfca8df6f94eaf
2021-04-14 04:59:59 -07:00
Guillaume Wenzek
c2e8904b60 Obt 2 (#1614)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.m)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
  too many of them actually ^^

## What does this PR do?
This is a rewrite of https://github.com/fairinternal/fairseq-py/issues/1538 following the discussion there, and taking into account the proposed https://github.com/fairinternal/fairseq-py/issues/1560 from Myle.
it brings online backtranslation to fairseq.
It adds a RobertaEncDec to fairseq. RobertaEncDec can be built from a pretrained Roberta model allowing to do transfer learning. This is crucial for backtranslation.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1614

Reviewed By: myleott

Differential Revision: D27157296

Pulled By: gwenzek

fbshipit-source-id: 43020bc27743419bd4b138716165bf5764117c21
2021-03-30 09:56:03 -07:00
Sam Shleifer
8c14a8f7df --nval only validate for a few steps (#1735)
Summary:
Afaict, it's easy to set --max-update 4 to run training quickly, but it's hard to control validation without changing the data.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1735

Reviewed By: myleott

Differential Revision: D27246062

Pulled By: sshleifer

fbshipit-source-id: 30a210cbbb45791647a050f49e6f38fbacd0d988
2021-03-22 20:52:51 -07:00
Sam Shleifer
2235f86b40 PlasmaView: don't materialize array in memory (#1645)
Summary:
### Changes:
- `PlasmaArray` saves the underlying data to `self.array`, `PlasmaView` never does that, instead it fetches the data from `plasma_store` shared memory when it is needed.
- `PlasmaArray` starts a new, ephemeral plasma_store and puts a new array in it when it is pickled. If `--use-plasma-view`, there is one server started before `spawn` and arrays are only put into it once, in `PlasmaArray.__init__` to accommodate this.
- user can now pass `--plasma-path` to explicitly control where server is started.
- We now make plasma keys based on `(split_path, (block_size, document_sep_len, str(break_mode), len(dataset)))`, so two jobs sharing plasma server but with different datasets, or same dataset but different clargs, will read each the other's array.

### Results [pre March 1]
This saves some CPU memory (5-15%), according to both `psutil` and `psrecord`:
here we run base_cmd (below) with num_workers=0,2,8, 2 GPUS and collect the logs. `branch` refers to `--use-plasma-view`, `master` uses `PlasmaArray`

```
+-------------------------+----------------+---------+-------+
| setting                 |   cpu_mem_used |     wps |   ppl |
+=========================+================+=========+=======+
| branch_nw0_gpu2_ddm.log |          12    | 55143.2 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw2_gpu2_ddm.log |          13.67 | 43377.6 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw8_gpu2_ddm.log |          18.36 | 53019.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw0_gpu2_ddm.log |          12.26 | 56733   | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw2_gpu2_ddm.log |          14.58 | 53337.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw8_gpu2_ddm.log |          21.1  | 53217.2 | 429.1 |
+-------------------------+----------------+---------+-------+
```

### Replication

1) get this branch
```bash
git fetch && git checkout share-plasma-server
```

2) Train tiny model and save logs

```bash

base_cmd () {
  fairseq-train --fp16 /private/home/sshleifer/data-bin/stories_mmap \
            --task language_modeling \
            --arch transformer_lm_gpt2_tiny \
            --sample-break-mode complete --tokens-per-sample 512 \
            --optimizer adam --clip-norm 0.0 --lr 0.0005 \
            --batch-size 1 \
            --max-update 200 --max-epoch 1 \
            --log-format simple --log-interval 100 \
            --restore-file x.pt --no-save \
            --skip-invalid-size-inputs-valid-test --disable-validation $@
}

USE_LOCK=1 CUDA_VISIBLE_DEVICES=0,1 base_cmd --num-workers 0 --use-plasma-view | tee branch_nw0_gpu2_ddm.log
```

### TODO:

- [x] test larger dataset
- [x] make it optional, cleanup
- [x] 1 GPU
- [x] unit-tests
- [x] ask hashing Q on stackoverflow https://stackoverflow.com/questions/66354598/deterministic-method-to-hash-np-array-int
- [ ] measure whether `PlasmaArray` disable for small array's logic helps
- [ x] test with fb_sweep
- [ x] measure 4 GPU savings

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1645

Test Plan: Read github PR description: https://github.com/fairinternal/fairseq-py/pull/1645

Reviewed By: myleott

Differential Revision: D26630365

Pulled By: sshleifer

fbshipit-source-id: b0c4163fbc97a7aefb116de70265fba11f6d7b42
2021-03-12 12:31:12 -08:00
Myle Ott
656d7e5779 Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1667

Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded)

This enables fully parameter + optimizer state sharding by using
FullyShardedDataParallel (FSDP) from fairscale. The user just needs to provide
`--ddp-backend=fully_sharded` to enable. Other common options work
out-of-the-box (e.g., `--fp16`, `--memory-efficient-fp16`, `--update-freq`,
etc.). This should be a drop-in replacement for the "c10d" backend.

This yields pretty big speedups for small models and enables training ~13B
parameter models on 8 GPUs and 175B parameter models on 128 GPUs, without model
parallelism.

This also adds a new option `--cpu-offload` that offloads the optimizer state
and FP32 model copy to CPU, which is particularly useful when combined with
`--optimizer=cpu_adam`.

Note: after enabling this, each GPU will save a checkpoint file, since the
optimizer state is sharded. Each checkpoint will contain a single shard of the
optimizer state and the rank 0 checkpoint will contain the full model weights.

Note: a known limitation of the current implementation is that you cannot
resume training on a different world_size. This constraint will be relaxed in
future iterations.

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D26771144

Pulled By: myleott

fbshipit-source-id: 74c2f46f57719e24e2dcfc9d9ee7c2fc0aeedb46
2021-03-04 13:32:46 -08:00
Myle Ott
6d23cc7e7c Move checkpoint state_dict creation into Trainer (#1666)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1666

Context: the checkpoint saving call stack has become a bit convoluted:
```
train.py
+ checkpoint_utils.save_checkpoint
 + trainer.save_checkpoint
  + checkpoint_utils.save_state
   + checkpoint_utils.torch_persistent_save
```

This diff slightly simplifies the checkpoint saving logic by exposing a `state_dict` method inside the Trainer. This simplifies the call stack to:
```
train.py
+ checkpoint_utils.save_checkpoint
 + trainer.save_checkpoint
  + checkpoint_utils.torch_persistent_save
```

This new structure is important for the FullyShardedDataParallel diff (next diff in the stack), since it enables the Trainer to save multiple checkpoints for the different optimizer state shards.

Test Plan:
- unit tests
- trained WMT En-De models; confirmed checkpoints save/load properly, resuming from a checkpoint gives identical results
- `buck test fblearner/flow/projects/langtech/translation:tests` (2 failures are in trunk too): https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274840914654/

Reviewed By: zhengwy888

Differential Revision: D26771146

Pulled By: myleott

fbshipit-source-id: 10f91979cd42205c1d8abcaa9ab56f63eba31e93
2021-03-04 13:32:44 -08:00
Alex Xiao
fc2840de58 optimize sampling process of multi_corpus_dataset
Summary:
The sampling process in multi_corpus_dataset is very inefficient. Turns out we can signficantly optimize it by sampling in batches rather than one by one. this allows:

1. fast local development and iteration with corpus sampling, as the turnaround time was long before
2. makes it take less time for our jobs can start training, enabling earlier signal if for example there is a configuration issue

Reviewed By: zhengwy888

Differential Revision: D26187821

fbshipit-source-id: b4f7f6b7c187b3785499308226e2af671a6c354f
2021-03-03 19:31:40 -08:00
Alex Xiao
1fed7a8426 add unit test for multi_corpus_dataset
Reviewed By: vimalmanohar

Differential Revision: D26220694

fbshipit-source-id: ed13f8527a1b203e1a9d004fa8a86e1ad6423d60
2021-03-03 19:31:39 -08:00
Eric Lou
7d2394b56f ioPath async - Fairseq unittests (#1669)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1669

Unit tests for async writes integration done in D26467815 (3100d0b8e5).

Ongoing performance tests: https://fb.quip.com/kjM7Atb1kKbO

Reviewed By: myleott

Differential Revision: D26732660

fbshipit-source-id: faf8cac67b9167af4195358c1a2592804c13562c
2021-03-03 10:50:39 -08:00
Sam Shleifer
4f881a760e TokenBlockDataset np type promotion issue (#1658)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1658

Reviewed By: jxmsML

Differential Revision: D26701840

Pulled By: sshleifer

fbshipit-source-id: 90d631c3cd775ab847366fe7a05136c29d90cd63
2021-02-26 21:00:38 -08:00
Miguel Del-Agua
808b751597 Improve torchscript compatibility of transfomer and transformer pg (#3247)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

Fixes https://github.com/pytorch/fairseq/issues/3246
Fixes https://github.com/pytorch/fairseq/issues/3248

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3247

Reviewed By: myleott

Differential Revision: D26513267

Pulled By: lematt1991

fbshipit-source-id: 958de0b3a58a0dd2a56bd6c6d7fb2644a89f6746
2021-02-22 14:22:54 -08:00
Onur Çelebi
7040ce71f3 LASER training code (#1207)
Summary:
Integrating LASER (Language-Agnostic SEntence Representations) training code

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ N/A] Did you make sure to update the docs?
- [ Y] Did you write any new necessary tests?  => an additional test in `test_iterators.py`

## What does this PR do?

This diff introduces the training code for LASER.
It includes a specific `laser` task in `laser_task.py` which reads a
json configuration file describing the binarized datasets of language
pairs.

`multitask_data_utils.py` defines dataset wrappers and iterators used by
`laser` task.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Yes. �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1207

Reviewed By: myleott

Differential Revision: D26454296

Pulled By: Celebio

fbshipit-source-id: c987672aa66abf31b039ee11867b06912d3486e5
2021-02-18 03:10:55 -08:00
Myle Ott
5a170841f2 Make checkpoint wrapper pickleable (#1603)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1603

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D26237760

Pulled By: myleott

fbshipit-source-id: 73c67bdea4b5b16e3159a5d4f0151e514e853357
2021-02-06 08:07:32 -08:00
Guillaume Wenzek
da83e2f356 add fast filter_indices_by_size for RoundRobinZipDatasets (#1555)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue?
    this has been extracted from https://github.com/fairinternal/fairseq-py/issues/1538
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

Implements a fast RoundRobinZipDataset.filter_indices_by_size.
Instead of filtering the dataset sample by sample, the different datasets that are part of the RoundRobinZipDataset,
are now filtered before being zipped together.
This might generate slightly different datasets.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1555

Reviewed By: myleott

Differential Revision: D25924464

Pulled By: gwenzek

fbshipit-source-id: bc64d9dc35eee62da7e3e17fd75a7f9facb60452
2021-02-02 09:26:16 -08:00
Myle Ott
148327d8c1 Add tests for fairseq.distributed.utils.all_gather_list (#1548)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1548

Test Plan: Imported from OSS

Reviewed By: girifb

Differential Revision: D25836857

Pulled By: myleott

fbshipit-source-id: 3fb844fa21640cbda989dafa6592ef3e5c59bfa7
2021-01-28 14:21:10 -08:00
Myle Ott
27b96eb698 Move fairseq.distributed_utils -> fairseq.distributed.utils (#1547)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1547

Test Plan: Imported from OSS

Reviewed By: girifb

Differential Revision: D25836855

Pulled By: myleott

fbshipit-source-id: addd8a7fe8dac43252b100d7331e04e95f555781
2021-01-28 14:21:09 -08:00