Commit Graph

309 Commits

Author SHA1 Message Date
Pierre Andrews
68a81202a3 Indexed Huffman Coded dataset (#2029)
Summary:
## What does this PR do?

Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk.

When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies.

In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols.

In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it.

Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space.

## Follow Ups

currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing:
```
code_builder = HuffmanCodeBuilder()
with open(sample_file, 'r', encoding="utf-8") as input:
    for line in input:
        code_builder.add(*line.strip().split(" "))

coder = code_builder.build_code()

with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder:
    with open(dataset_file, 'r', encoding="utf-8") as input:
        for line in input:
            builder.add_item(line.strip().split(' '))
```

a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class

the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2029

Reviewed By: dianaml0

Differential Revision: D29557468

Pulled By: Mortimerp9

fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe
2021-08-31 01:12:35 -07:00
Jingfei Du
932a3d4aad fix beam search with prefix tokens (#2227)
Summary:
1. added test for genereting pad tokens during beam search with prefix
tokens
2. modified lprobs for pad token and prefix tokens to avoid generating
pad

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2227

Reviewed By: xianxl

Differential Revision: D30649356

Pulled By: jingfeidu

fbshipit-source-id: d94903a912e767391c8fca61f98f65b5cea3b56e
2021-08-30 18:07:13 -07:00
Pierre Andrews
bc1504d4d7 Hierarchical Configs
Summary:
This is a precursor to D29232595

The current behaviour to convert a dataclass to a namespace is that all the fields from all DCs in the field hierarchy are flattened at the top. This is also the legacy behaviour with `add_args`.

This is kind of cumbersome to build reusable Dataclasses as we need to make sure that each field has a unique  name. In the case of Transformer for instance, we have a Decoder and Encoder config that share a large part of their fields (embed_dim, layers, etc.). We can build a single dataclass for this that can be reused and extended in other implementations. To be then able to have  a flat namespace, instead of adding all subfields as is to the root namespace, we introduce the name of the field as prefix to the arg in the namespace.

So:
`model.decoder.embed_dim` becomes `decoder_embed_dim` and `model.encoder.embed_dim` becomes `encoder_embed_dim`.

Reviewed By: myleott, dianaml0

Differential Revision: D29521386

fbshipit-source-id: f4bef036f0eeb620c6d8709ce97f96ae288848ef
2021-07-16 04:56:12 -07:00
Omry Yadan
dd106d9534 fixes tests/test_train.py to mock checkpoint.save_dir config node (#3675)
Summary:
## What does this PR do?
Some downstream users reported that errors when passing Namespace to load_checkpoint().

A recent change made the assumption that the passed object is dict like (dict or DictConfig) that have a get function.
This changes that and make sure the mocked config have checkpoint.save_dir to allow the test to run.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3675

Reviewed By: omry

Differential Revision: D29564805

Pulled By: lematt1991

fbshipit-source-id: 89308811da382667f6c5d3152ee2d6480416ee62
2021-07-06 15:07:31 -07:00
Pierre Andrews
53bf2b1293 Extract File Chunking to its own utils (#1955)
Summary:
## What does this PR do?

there are a few places where we do file chunking for multiprocessing a single file. However, the code is partly in Binarizer and partly just duplicated here and there.

This PR extracts the file chunking/reading logic. The multiprocessing logic could probably be extracted too, but I haven't found a good abstraction yet.

# Testing

Added testing for this reading logic + maybe fixed a bug where the last part of a file might get dropped (even if it's unclear with the current stopping logic)

Tested by running the preprocessing script as follow:
```
python -m fairseq_cli.preprocess --source-lang de --target-lang en --trainpref ...train.spm.clean.de_en --srcdict ...fairseq.dict --tgtdict .../fairseq.dict --destdir ... --workers 60
```

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1955

Reviewed By: myleott

Differential Revision: D29065473

Pulled By: Mortimerp9

fbshipit-source-id: c60843de8cfd45a63b3dbb8290f57ef3df3bf983
2021-06-28 01:46:32 -07:00
Diana Liskovich
50158da3a7 Migrate DummyMaskedLMTask to FairseqTask (#3593)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3593

Reviewed By: msbaines

Differential Revision: D28992614

Pulled By: dianaml0

fbshipit-source-id: b2dfcab472a65c41536e78600a0e6b3745dc3a08
2021-06-10 09:43:08 -07:00
Mandeep Singh Baines
9497ae3cfb disable raise_if_valid_subsets_unintentionally_ignored check for dummy tasks (#3552)
Summary:
Fixes the following crash:
```python
Traceback (most recent call last):
  File "/private/home/msb/.conda/envs/fairseq-20210102-pt181/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/private/home/msb/code/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main
    main(cfg, **kwargs)
  File "/private/home/msb/code/fairseq/fairseq_cli/train.py", line 117, in main
    data_utils.raise_if_valid_subsets_unintentionally_ignored(cfg)
  File "/private/home/msb/code/fairseq/fairseq/data/data_utils.py", line 584, in raise_if_valid_subsets_unintentionally_ignored
    other_paths = _find_extra_valid_paths(train_cfg.task.data)
AttributeError: 'Namespace' object has no attribute 'data'
```

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3552

Reviewed By: sshleifer

Differential Revision: D28667773

Pulled By: msbaines

fbshipit-source-id: bc9a633184105dbae0cce58756bb1d379b03980a
2021-05-27 12:15:31 -07:00
Gagandeep Singh
237184e522 Add torch.cuda.amp support (#3460)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/3282
Add support for `torch.cuda.amp`
AMP can be enabled by `--amp`, instead of using `--fp16` for the already present full fp16 support.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3460

Reviewed By: sshleifer, msbaines

Differential Revision: D27932253

Pulled By: myleott

fbshipit-source-id: 21637aefb5e788c59bf4f3c5de6c4a80f7319543
2021-05-26 14:39:10 -07:00
Weiyi Zheng
8df9e3a4a5 support FSDP sharded_state checkpoint loading during inference
Summary:
using the very useful feature added by QuentinDuval https://github.com/facebookresearch/fairscale/pull/683/files , we can consolidate sharded states into a full regular states. this allows inferences on sharded state almost transparently.

The main complexity comes from trying to be smart about what kind of checkpoint the user wants to load. not sure if this is over-engineering
1. if the file checkpoint-shard0.pt exists, and `--checkpoint-shard-count` is > 1, then we load sharded FSDP checkpoint
2. if checkpoint-shard0.pt exists but --checkpoint-shard-count=1, we load consolidated FSDP checkpoint
3. if checkpoint-shard0.pt does not exist, but --checkpoint-shard-count > 1, we load model parallel checkpoint
4. otherwise we are loading a single, plain checkpoint.

In theory we could be even smarter and load shard0.pt to check how many more checkpoints are needed. this is not implemented, though it will save the user having to specify --checkpoint-shard-count.

Reviewed By: sshleifer

Differential Revision: D28563441

fbshipit-source-id: dcafcaa7c9eaf5c9ff94f55c16bb3424c98dfa59
2021-05-25 17:45:51 -07:00
Sam Shleifer
2be2f3c7c1 Plasma tests: ask for less disk (#1893)
Summary:
Old logs:
```
/arrow/cpp/src/plasma/store.cc:1274: Allowing the Plasma store to use up to 107.374GB of memory.
```

New logs:
```
... up to 1e-05GB of memory.
```

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1893

Reviewed By: myleott

Differential Revision: D28641488

Pulled By: sshleifer

fbshipit-source-id: 3373526042cdcbf434c61790be62a09f15e6ad06
2021-05-24 09:00:18 -07:00
Weiyi Zheng
425c36eaff support use_sharded_state on command line
Summary:
we wanted to use sharded_state because
1. to save memory
2. support sharded state loading, which allows MoE models's weight to live on their respective shard
I just added the use_sharded_state as a config option, and added unit test to make sure it runs fine.

old revision's comment:
fairseq.FSDP has a  flag use_sharded_state, but I had to address a couple problems before being able to use it.
1. fairscale FSDP (FSDP for short) calls self.state_dict/load_state_dict, which has been overwritten by fairseq.FSDP, this is not a desired behavior
2. the optimizer states shouldn't be sharded again when use_sharded_state is True
3. expose this option on the command line.

Reviewed By: sshleifer

Differential Revision: D28375035

fbshipit-source-id: c2f59a9c62163405033f34ed595ba78528aea850
2021-05-14 18:53:16 -07:00
Sam Shleifer
97969ac5f5 --combine-valid-sets (#1843)
Summary:
- `--combine-valid-sets` causes valid.bin, valid1.bin, ... to be concatenated. All metrics will be reported together.
- `--valid-subsets` works the same. If you pass `--valid-subsets valid1,valid2` you get valid1_loss and valid2_loss logged separately.
- if user passes `--valid-subset valid` (the default) and we see files named valid1, valid2 we raise an error. User must pass `--ignore-unused-valid-sets` to override. This previously led to valid1, valid2 being silently ignored.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1843

Reviewed By: myleott

Differential Revision: D28323815

Pulled By: sshleifer

fbshipit-source-id: dfd46076d3f684e36f8dacfadd38fd0038ce6755
2021-05-10 23:43:24 -07:00
Yun Wang
d6855baec8 Simplify CountingIterator
Summary:
Simplify the implementation of `CountingIterator`, and added test cases.

The old implementation could fail on such a test case:
```
ref = list(range(10))
itr = CountingIterator(ref)
first_item = next(itr)  # consume one item
remaining_items = list(itr)  # raises exception because of "length mismatch"
```
This happens because `list(itr)` invokes `itr.__iter__` and reiterate the underlying list from the start, but `itr.n` has been already incremented by `next(itr)`.

The new implementation is simpler and avoids such an error.

Reviewed By: myleott

Differential Revision: D27802505

fbshipit-source-id: c97fd0a27d865c0ff3b24016fa6aa0afabbf0a73
2021-04-29 16:17:00 -07:00
Sam Shleifer
05b86005bc Fix FSDP optim state loading (#1819)
Summary:
### Problem:
- if we consolidate optim state dict on rank 0, rank 1+ save `optimizer.state_dict()`. When they try to load, they call get_shard(last_optim_state), which is wrong since the optim state is already shared. They should find the global consolidated optimizer state dict and load that.

### Possible Solutions:
- if world size is the same, you could just reuse the local OSD.
- [this PR] rank 1+ load optim state from the rank0 file and call get_shard
- separate file for optim_state that every rank loads. (like 'shared.pt' on `gshard-azure`). This will save some CPU Ram.

### Note:
- I don't think it's possible to pass `--use-sharded-state` from the command line. It should be I think.

### Implementation here
+ if FSDP saves -1 as state['last_optimizer_key'], it means that, on load, rank 0's optim state must be loaded.
+ regression test

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1819

Reviewed By: zhengwy888

Differential Revision: D27910281

Pulled By: sshleifer

fbshipit-source-id: d34987008f77ce7e0cb28b7224dd2aabed38a70c
2021-04-21 15:50:13 -07:00
Michael Anderson
207254bf56 Adding check for filler size (#3495)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/3495

Avoid creating size-0 tensor "filler" in case src_len is the same as key_padding_mask_size or prev_key_padding_mask_size

Reviewed By: jackm321

Differential Revision: D27897778

fbshipit-source-id: 26fd95852da2cd932717c7abcac3e1fb43deaf77
2021-04-21 09:09:19 -07:00
Sam Shleifer
da0432a3cd MultiGPU test and --log-file workaround (#1793)
Summary:
The initial problem I set out to solve was that it's not easy to add a multigpu test. I solved that problem but it ruined log capturing, both with `self.assertLogs` and `with contextlib.redirect_stdout(StringIO())`.

After some brief digging, I gave up on trying to get those to work, and added support for `--log-file AGI_v0.log` which will write the `progress_bar.log()` statements to `log-file` as well as `stdout`. This functionality is used by the resumption test.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1793

Reviewed By: myleott

Differential Revision: D27671192

Pulled By: sshleifer

fbshipit-source-id: bcba5f9df7a965889a4cd6993f7eeb0f14b770c6
2021-04-21 06:39:00 -07:00
Sam Shleifer
f6f220e917 Delete line that breaks gh ci (#1814)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1814

Reviewed By: myleott

Differential Revision: D27867552

Pulled By: sshleifer

fbshipit-source-id: ed30e02c962b31797e003cb810c085934a53202c
2021-04-19 16:31:11 -07:00
Sujit Verma
fc90910314 Migrating fairseq-py from fvcore to iopath.
Summary: Migrating fairseq-py from fvcore to iopath.

Reviewed By: myleott

Differential Revision: D27109864

fbshipit-source-id: 041177c1bc9b5793b2ce0ecab87692097f3f353b
2021-04-14 21:54:08 -07:00
Guillaume Wenzek
436166a00c fix MultiHeadAttention assert (#1798)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/fairinternal/fairseq-py/issues/1538.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1798

Reviewed By: myleott

Differential Revision: D27710902

Pulled By: gwenzek

fbshipit-source-id: 2efdf645bb30e4cf6653c48371bfca8df6f94eaf
2021-04-14 04:59:59 -07:00
Guillaume Wenzek
c2e8904b60 Obt 2 (#1614)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.m)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
  too many of them actually ^^

## What does this PR do?
This is a rewrite of https://github.com/fairinternal/fairseq-py/issues/1538 following the discussion there, and taking into account the proposed https://github.com/fairinternal/fairseq-py/issues/1560 from Myle.
it brings online backtranslation to fairseq.
It adds a RobertaEncDec to fairseq. RobertaEncDec can be built from a pretrained Roberta model allowing to do transfer learning. This is crucial for backtranslation.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1614

Reviewed By: myleott

Differential Revision: D27157296

Pulled By: gwenzek

fbshipit-source-id: 43020bc27743419bd4b138716165bf5764117c21
2021-03-30 09:56:03 -07:00
Sam Shleifer
8c14a8f7df --nval only validate for a few steps (#1735)
Summary:
Afaict, it's easy to set --max-update 4 to run training quickly, but it's hard to control validation without changing the data.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1735

Reviewed By: myleott

Differential Revision: D27246062

Pulled By: sshleifer

fbshipit-source-id: 30a210cbbb45791647a050f49e6f38fbacd0d988
2021-03-22 20:52:51 -07:00
Sam Shleifer
2235f86b40 PlasmaView: don't materialize array in memory (#1645)
Summary:
### Changes:
- `PlasmaArray` saves the underlying data to `self.array`, `PlasmaView` never does that, instead it fetches the data from `plasma_store` shared memory when it is needed.
- `PlasmaArray` starts a new, ephemeral plasma_store and puts a new array in it when it is pickled. If `--use-plasma-view`, there is one server started before `spawn` and arrays are only put into it once, in `PlasmaArray.__init__` to accommodate this.
- user can now pass `--plasma-path` to explicitly control where server is started.
- We now make plasma keys based on `(split_path, (block_size, document_sep_len, str(break_mode), len(dataset)))`, so two jobs sharing plasma server but with different datasets, or same dataset but different clargs, will read each the other's array.

### Results [pre March 1]
This saves some CPU memory (5-15%), according to both `psutil` and `psrecord`:
here we run base_cmd (below) with num_workers=0,2,8, 2 GPUS and collect the logs. `branch` refers to `--use-plasma-view`, `master` uses `PlasmaArray`

```
+-------------------------+----------------+---------+-------+
| setting                 |   cpu_mem_used |     wps |   ppl |
+=========================+================+=========+=======+
| branch_nw0_gpu2_ddm.log |          12    | 55143.2 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw2_gpu2_ddm.log |          13.67 | 43377.6 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw8_gpu2_ddm.log |          18.36 | 53019.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw0_gpu2_ddm.log |          12.26 | 56733   | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw2_gpu2_ddm.log |          14.58 | 53337.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw8_gpu2_ddm.log |          21.1  | 53217.2 | 429.1 |
+-------------------------+----------------+---------+-------+
```

### Replication

1) get this branch
```bash
git fetch && git checkout share-plasma-server
```

2) Train tiny model and save logs

```bash

base_cmd () {
  fairseq-train --fp16 /private/home/sshleifer/data-bin/stories_mmap \
            --task language_modeling \
            --arch transformer_lm_gpt2_tiny \
            --sample-break-mode complete --tokens-per-sample 512 \
            --optimizer adam --clip-norm 0.0 --lr 0.0005 \
            --batch-size 1 \
            --max-update 200 --max-epoch 1 \
            --log-format simple --log-interval 100 \
            --restore-file x.pt --no-save \
            --skip-invalid-size-inputs-valid-test --disable-validation $@
}

USE_LOCK=1 CUDA_VISIBLE_DEVICES=0,1 base_cmd --num-workers 0 --use-plasma-view | tee branch_nw0_gpu2_ddm.log
```

### TODO:

- [x] test larger dataset
- [x] make it optional, cleanup
- [x] 1 GPU
- [x] unit-tests
- [x] ask hashing Q on stackoverflow https://stackoverflow.com/questions/66354598/deterministic-method-to-hash-np-array-int
- [ ] measure whether `PlasmaArray` disable for small array's logic helps
- [ x] test with fb_sweep
- [ x] measure 4 GPU savings

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1645

Test Plan: Read github PR description: https://github.com/fairinternal/fairseq-py/pull/1645

Reviewed By: myleott

Differential Revision: D26630365

Pulled By: sshleifer

fbshipit-source-id: b0c4163fbc97a7aefb116de70265fba11f6d7b42
2021-03-12 12:31:12 -08:00
Myle Ott
656d7e5779 Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded) (#1667)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1667

Add support for FullyShardedDataParallel (--ddp-backend=fully_sharded)

This enables fully parameter + optimizer state sharding by using
FullyShardedDataParallel (FSDP) from fairscale. The user just needs to provide
`--ddp-backend=fully_sharded` to enable. Other common options work
out-of-the-box (e.g., `--fp16`, `--memory-efficient-fp16`, `--update-freq`,
etc.). This should be a drop-in replacement for the "c10d" backend.

This yields pretty big speedups for small models and enables training ~13B
parameter models on 8 GPUs and 175B parameter models on 128 GPUs, without model
parallelism.

This also adds a new option `--cpu-offload` that offloads the optimizer state
and FP32 model copy to CPU, which is particularly useful when combined with
`--optimizer=cpu_adam`.

Note: after enabling this, each GPU will save a checkpoint file, since the
optimizer state is sharded. Each checkpoint will contain a single shard of the
optimizer state and the rank 0 checkpoint will contain the full model weights.

Note: a known limitation of the current implementation is that you cannot
resume training on a different world_size. This constraint will be relaxed in
future iterations.

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D26771144

Pulled By: myleott

fbshipit-source-id: 74c2f46f57719e24e2dcfc9d9ee7c2fc0aeedb46
2021-03-04 13:32:46 -08:00
Myle Ott
6d23cc7e7c Move checkpoint state_dict creation into Trainer (#1666)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1666

Context: the checkpoint saving call stack has become a bit convoluted:
```
train.py
+ checkpoint_utils.save_checkpoint
 + trainer.save_checkpoint
  + checkpoint_utils.save_state
   + checkpoint_utils.torch_persistent_save
```

This diff slightly simplifies the checkpoint saving logic by exposing a `state_dict` method inside the Trainer. This simplifies the call stack to:
```
train.py
+ checkpoint_utils.save_checkpoint
 + trainer.save_checkpoint
  + checkpoint_utils.torch_persistent_save
```

This new structure is important for the FullyShardedDataParallel diff (next diff in the stack), since it enables the Trainer to save multiple checkpoints for the different optimizer state shards.

Test Plan:
- unit tests
- trained WMT En-De models; confirmed checkpoints save/load properly, resuming from a checkpoint gives identical results
- `buck test fblearner/flow/projects/langtech/translation:tests` (2 failures are in trunk too): https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274840914654/

Reviewed By: zhengwy888

Differential Revision: D26771146

Pulled By: myleott

fbshipit-source-id: 10f91979cd42205c1d8abcaa9ab56f63eba31e93
2021-03-04 13:32:44 -08:00
Alex Xiao
fc2840de58 optimize sampling process of multi_corpus_dataset
Summary:
The sampling process in multi_corpus_dataset is very inefficient. Turns out we can signficantly optimize it by sampling in batches rather than one by one. this allows:

1. fast local development and iteration with corpus sampling, as the turnaround time was long before
2. makes it take less time for our jobs can start training, enabling earlier signal if for example there is a configuration issue

Reviewed By: zhengwy888

Differential Revision: D26187821

fbshipit-source-id: b4f7f6b7c187b3785499308226e2af671a6c354f
2021-03-03 19:31:40 -08:00
Alex Xiao
1fed7a8426 add unit test for multi_corpus_dataset
Reviewed By: vimalmanohar

Differential Revision: D26220694

fbshipit-source-id: ed13f8527a1b203e1a9d004fa8a86e1ad6423d60
2021-03-03 19:31:39 -08:00
Eric Lou
7d2394b56f ioPath async - Fairseq unittests (#1669)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1669

Unit tests for async writes integration done in D26467815 (3100d0b8e5).

Ongoing performance tests: https://fb.quip.com/kjM7Atb1kKbO

Reviewed By: myleott

Differential Revision: D26732660

fbshipit-source-id: faf8cac67b9167af4195358c1a2592804c13562c
2021-03-03 10:50:39 -08:00
Sam Shleifer
4f881a760e TokenBlockDataset np type promotion issue (#1658)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1658

Reviewed By: jxmsML

Differential Revision: D26701840

Pulled By: sshleifer

fbshipit-source-id: 90d631c3cd775ab847366fe7a05136c29d90cd63
2021-02-26 21:00:38 -08:00
Miguel Del-Agua
808b751597 Improve torchscript compatibility of transfomer and transformer pg (#3247)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

Fixes https://github.com/pytorch/fairseq/issues/3246
Fixes https://github.com/pytorch/fairseq/issues/3248

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/3247

Reviewed By: myleott

Differential Revision: D26513267

Pulled By: lematt1991

fbshipit-source-id: 958de0b3a58a0dd2a56bd6c6d7fb2644a89f6746
2021-02-22 14:22:54 -08:00
Onur Çelebi
7040ce71f3 LASER training code (#1207)
Summary:
Integrating LASER (Language-Agnostic SEntence Representations) training code

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ N/A] Did you make sure to update the docs?
- [ Y] Did you write any new necessary tests?  => an additional test in `test_iterators.py`

## What does this PR do?

This diff introduces the training code for LASER.
It includes a specific `laser` task in `laser_task.py` which reads a
json configuration file describing the binarized datasets of language
pairs.

`multitask_data_utils.py` defines dataset wrappers and iterators used by
`laser` task.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Yes. �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1207

Reviewed By: myleott

Differential Revision: D26454296

Pulled By: Celebio

fbshipit-source-id: c987672aa66abf31b039ee11867b06912d3486e5
2021-02-18 03:10:55 -08:00
Myle Ott
5a170841f2 Make checkpoint wrapper pickleable (#1603)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1603

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D26237760

Pulled By: myleott

fbshipit-source-id: 73c67bdea4b5b16e3159a5d4f0151e514e853357
2021-02-06 08:07:32 -08:00
Guillaume Wenzek
da83e2f356 add fast filter_indices_by_size for RoundRobinZipDatasets (#1555)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue?
    this has been extracted from https://github.com/fairinternal/fairseq-py/issues/1538
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

Implements a fast RoundRobinZipDataset.filter_indices_by_size.
Instead of filtering the dataset sample by sample, the different datasets that are part of the RoundRobinZipDataset,
are now filtered before being zipped together.
This might generate slightly different datasets.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1555

Reviewed By: myleott

Differential Revision: D25924464

Pulled By: gwenzek

fbshipit-source-id: bc64d9dc35eee62da7e3e17fd75a7f9facb60452
2021-02-02 09:26:16 -08:00
Myle Ott
148327d8c1 Add tests for fairseq.distributed.utils.all_gather_list (#1548)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1548

Test Plan: Imported from OSS

Reviewed By: girifb

Differential Revision: D25836857

Pulled By: myleott

fbshipit-source-id: 3fb844fa21640cbda989dafa6592ef3e5c59bfa7
2021-01-28 14:21:10 -08:00
Myle Ott
27b96eb698 Move fairseq.distributed_utils -> fairseq.distributed.utils (#1547)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1547

Test Plan: Imported from OSS

Reviewed By: girifb

Differential Revision: D25836855

Pulled By: myleott

fbshipit-source-id: addd8a7fe8dac43252b100d7331e04e95f555781
2021-01-28 14:21:09 -08:00
Myle Ott
d68a3530dd Refactor distributed code under fairseq.distributed (#1546)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1546

Test Plan: Imported from OSS

Reviewed By: girifb

Differential Revision: D25836853

Pulled By: myleott

fbshipit-source-id: c5076615d49774633ecfaf0aa68b68e8b2331bd9
2021-01-28 14:21:09 -08:00
Sam Shleifer
1e6323e934 Offload inputs to CPU (V2)
Reviewed By: myleott

Differential Revision: D26035523

fbshipit-source-id: 7dc08a38c10d1f26a871106f143f92fd11f6073c
2021-01-25 09:29:01 -08:00
Myle Ott
cfbf0dddbc Small changes to make tests more reliable (#1572)
Summary:
After this, `python setup.py test` should be more reliable (including when multiple GPUs are present)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1572

Reviewed By: alexeib

Differential Revision: D25984113

Pulled By: myleott

fbshipit-source-id: 7fef27ae90c079c07f592ed9fb350ccf8b56d23d
2021-01-21 07:33:54 -08:00
alexeib
15867e1284 migrate translation task (#1569)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1569

Test Plan:
Imported from OSS

tests + ran

```
python fairseq_cli/train.py \                                                           18:08:56
    ~/data/iwslt14.de-en \
    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric
```

Reviewed By: myleott

Differential Revision: D25967217

Pulled By: alexeib

fbshipit-source-id: 808f3cb0939fa13e1e05f39bfa02a7fb0b152940
2021-01-20 18:01:18 -08:00
Myle Ott
338aa57966 Add test for activation checkpointing (#1563)
Summary:
Forgot to merge this with the original code

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1563

Reviewed By: sshleifer

Differential Revision: D25948393

Pulled By: myleott

fbshipit-source-id: b083001015e97f7e21cfa02d4126eba79cc34bfa
2021-01-20 07:39:52 -08:00
Yuqing Tang
8c7793b9d9 Enable translation_multi_simple_epoch to load only two dictionaries for source and target only
Summary: In the default settings, the translation_multi_simple_epoch task load a dictionary per language which can result in huge amount of memory consumption if all languages share the same dictionary.

Reviewed By: shruti-bh

Differential Revision: D25265741

fbshipit-source-id: c5bc3664efd800b120f015b2525c9fba2b1be3c5
2021-01-12 21:38:27 -08:00
Sam Shleifer
bff7f85206 fastseq ngram blocking (#1509)
Summary:
Command:
```bash
fairseq-generate \
    ~myleott/data/data-bin/wmt16_en_de_bpe32k/ \
    --path /checkpoint/myleott/s3/models/wmt16.en-de.joined-dict.transformer/model.pt \
    --beam 4 --remove-bpe --lenpen 0.6 --batch-size 256 --no-repeat-ngram-size 3 \
    --gen-subset test --fp16
```

master/devfair: 297.8s (10.08 sentences/s, 286.47 tokens/s)
branch/devfair: 31.9s (94.27 sentences/s, 2678.66 tokens/s)

master/v100: 227.4s (13.21 sentences/s, 375.24 tokens/s)
branch/v100: 13.1s (228.68 sentences/s, 6497.99 tokens/s)
(all BLEU4=29.17)

### ToDo:
- tests

### Future Work
- test other fastseq proposed improvements.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1509

Reviewed By: myleott

Differential Revision: D25587857

Pulled By: sshleifer

fbshipit-source-id: d42af5c50e3f94c90e878f92da5ce5ef3fc8b988
2020-12-30 12:58:09 -08:00
Ruslan Mavlyutov
4e3895be1c batch_by_size refactoring: 100x speedup and optimization of memory footprint
Summary: Refactoring batch_by_size. You may be required to rebuild Cython components with: `python setup.py build_ext --inplace`.

Reviewed By: myleott

Differential Revision: D25705733

fbshipit-source-id: a263505276e3d820a9e44b93354ee5ace70d7fc5
2020-12-28 21:05:51 -08:00
Myle Ott
b8ea8a9b72 Fix --context-window and add test (#1526)
Summary:
This was broken in the recent refactoring: 36c63c826d

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1526

Reviewed By: sshleifer

Differential Revision: D25697706

Pulled By: myleott

fbshipit-source-id: 4d9a735c0071a0d71a4ae46e1c3fc3aba572117b
2020-12-23 18:35:54 -08:00
Myle Ott
36c63c826d Refactor eval_lm to support library usage (#1513)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1513

Test Plan: Imported from OSS

Reviewed By: alexeib

Differential Revision: D25570467

Pulled By: myleott

fbshipit-source-id: 062f748e287797f4f01c605e0b544ef3e698851f
2020-12-18 11:45:08 -08:00
Myle Ott
edc321e767 Support atomic saves for checkpoints (#1520)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1520

Test Plan: Imported from OSS

Reviewed By: stephenroller

Differential Revision: D25632782

Pulled By: myleott

fbshipit-source-id: bdbe2aed6254d0b023b33f8027dfbd939f1fd271
2020-12-18 07:40:49 -08:00
Sam Shleifer
c8a0659be5 Stronger --checkpoint-activations test (#1505)
Summary:
- captures and inspects train and valid logs using unittest's `assert_logs_equal`
- asserts that `--checkpoint-activations` does not change `train_loss` or `valid_loss`.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1505

Reviewed By: myleott

Differential Revision: D25544991

Pulled By: sshleifer

fbshipit-source-id: 2762095ab4e7c819a803b3556f5774db8c6b6f39
2020-12-16 19:07:49 -08:00
Myle Ott
72a25a4e52 Rename optimization.min_lr -> optimization.stop_min_lr (#1486)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1486

Test Plan: Imported from OSS

Reviewed By: alexeib

Differential Revision: D25342181

Pulled By: myleott

fbshipit-source-id: 7d1cfb26334fff26d688648724ab073e5fb956f5
2020-12-05 07:37:51 -08:00
Myle Ott
6f47704d4d Add distributed tests (#1481)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1481

Test Plan: Imported from OSS

Reviewed By: theweiho

Differential Revision: D25313776

Pulled By: myleott

fbshipit-source-id: 755bf4b77b2a7a3aee56e2344246ff2087a3af77
2020-12-04 10:58:52 -08:00
Myle Ott
9cf0bd96d6 Add/fix tests (#1468)
Summary:
- add test for loading ensemble checkpoints (and confirmed it fails if I revert: 265791b727)
- add test for LayerDrop (and fix it)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1468

Reviewed By: alexeib

Differential Revision: D25223272

Pulled By: myleott

fbshipit-source-id: 3f06f753605af251567c70d2961f5506ea423499
2020-11-30 14:20:36 -08:00
Myle Ott
d464af2feb Fix NAT code (#1454)
Summary:
D23752010 (add65adcc5) broke some GPU-only tests for NAT.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1454

Test Plan: Imported from OSS

Reviewed By: jmp84

Differential Revision: D25108461

Pulled By: myleott

fbshipit-source-id: f32b890221578c421944d6f9a49f06ef1dc075c6
2020-11-20 12:42:33 -08:00
Myle Ott
fa113ff1de Add test for activation checkpointing (#1453)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1453

Test Plan: Imported from OSS

Reviewed By: sshleifer

Differential Revision: D25108463

Pulled By: myleott

fbshipit-source-id: 3cebce9be7fe503401eabba3f483c26847e7a3c0
2020-11-20 12:42:33 -08:00
Myle Ott
94f59bb67b Remove unused train_masked_language_model helper (#1452)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1452

Test Plan: Imported from OSS

Reviewed By: lematt1991

Differential Revision: D25108462

Pulled By: myleott

fbshipit-source-id: 3c17a9937a4c3edb69f64130dfd866c5f42a4aaf
2020-11-20 12:42:29 -08:00
Yuqing Tang
d7dd683b3b Add option to skip virtual epoch
Summary:
The current translation_multi_simple_epoch will add extrac layer of virtual epoch abstracts to load part of data and start training earlier. However, for smaller dataset this is not necessary.

This diff makes it skip virtual epoch layer if --virtual-epoch-size is not specified.

Reviewed By: pipibjc

Differential Revision: D24962835

fbshipit-source-id: 7de4293a6996ed075a1ed0c1ff2de94c8ae3df14
2020-11-16 14:39:57 -08:00
Myle Ott
0a848245f3 Add Truncated BPTT example + TransformerXL (#1410)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1410

Test Plan:
- reproduced Transformer-XL results (see README)
- added integration test

Reviewed By: jingfeidu

Differential Revision: D24928966

Pulled By: myleott

fbshipit-source-id: 86376c17ab24d37e72e7c097b6dcec71b1a087a7
2020-11-15 19:47:42 -08:00
alexeib
b58f4f017e end to end hydra configs (#1393)
Summary:
this adds a hydra_train binary that uses hydra configs/command line overrides instead of argparse

use case 1: built in configs + overrides from command line

```
python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/transformer_lm_gpt task=language_modeling optimization.max_update=5000
```

use case 2: use an external config that is used instead of bundled configs (but dataclass defaults still work)

```
python fairseq_cli/hydra_train.py --config-path ~/fairseq-py-dev/lm --config-name wiki103
```

the config file contains this:

```
# package _group_

model:
  _name: transformer_lm
distributed_training:
  distributed_world_size: 1
dataset:
  batch_size: 2
task:
  _name: language_modeling
  data: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/
  add_bos_token: false
  max_target_positions: 1024
optimization:
  max_update: 50000
  lr: [ 0.25 ]
criterion: cross_entropy
optimizer: adam
lr_scheduler:
  _name: cosine
```

use case 3: use an external config directory that provides additional configs for e.g. models

python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/2_layers task=language_modeling optimization.max_update=5000 --config-dir ~/fairseq-py-dev/lm/hydra

where ~/fairseq-py-dev/lm/hydra has the following structure:

- model
-- transformer_lm
 --- 2_layers.yaml

and inside 2_layers.yaml is a copy of transformer_lm_gpt.yaml but with decoder_layers set to 2

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1393

Reviewed By: myleott

Differential Revision: D24722252

Pulled By: alexeib

fbshipit-source-id: 758ea431fa099cd7c0e4daf41eff680df1d3b841
2020-11-04 18:20:12 -08:00
Yuqing Tang
de859692ff Enable translation_multi_simple_epoch to have different source and target dictionaries
Summary: In past, we always use shared dictionary for multilingual experiments. This diff renables different dictionaries for source and target languages by changing the assertion criteria and reverts back to use specific languages to return source_dict and target_dict.

Reviewed By: chtran

Differential Revision: D24637682

fbshipit-source-id: a982e4f1e48395cc5bf10dc03b98fbe970062f8d
2020-10-30 18:25:25 -07:00
Myle Ott
a4356b1da2 Simplify --user-dir and require user-dir module name to be globally unique (#2815)
Summary:
This PR reverts recent changes that attempted to make `--user-dir` work with non-unique module names. But that new approach introduced other issues (e.g., poor compatibility with multiprocessing and Windows), so let's revert to the previous simpler implementation.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2815

Reviewed By: alexeib

Differential Revision: D24611571

Pulled By: myleott

fbshipit-source-id: cecfe28395585ca0401f844f10bd0d49d014c4d8
2020-10-29 17:08:20 -07:00
Myle Ott
1bc83c703a Misc fixes (#2786)
Summary:
- Rename type -> key in fairseq/tasks/sentence_prediction.py (fixes https://github.com/pytorch/fairseq/issues/2746)
- Update preprocessing docs (fixes https://github.com/pytorch/fairseq/issues/2565)
- Turn off logging in test_fp16_optimizer.TestGradientScaling
- Documentation updates
- Remove some unused code
- Fix noisychannel example (fixes https://github.com/pytorch/fairseq/issues/2213)

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2786

Reviewed By: shruti-bh

Differential Revision: D24515146

Pulled By: myleott

fbshipit-source-id: 86b0f5516c57610fdca801c60e58158ef052fc3a
2020-10-27 11:26:07 -07:00
alexeib
3b27ed7996 Enable Hydra configs in fairseq (#1343) (#1510)
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1510

this is the main pr that switches on hydra functionality in fairseq

we migrate "args" object into omegaconf "DictConfig" at all legacy entry points

in addition this migrates various components from secondary registries (like bpe encoders and tokenizers) to make the migration smoother

i am going through code that references migrated fairseq components and changing it to inherit from "Legacy*" components instead. hopefully tests will catch most of this

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1343

Reviewed By: myleott

Differential Revision: D23973928

Pulled By: alexeib

fbshipit-source-id: dd9554981fff51ea75c1ff343874d1d6e61793c9
2020-10-20 00:32:26 -07:00
Myle Ott
9b8b464070 Package config and examples with fairseq (#1356)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1356

Reviewed By: alexeib

Differential Revision: D24385688

Pulled By: myleott

fbshipit-source-id: 72c4a702d93d2854a6409d42913d7413207cb61e
2020-10-19 09:24:04 -07:00
Myle Ott
a48f235636 Apply black+isort (#1357)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357

Reviewed By: alexeib

Differential Revision: D24377772

fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c
2020-10-18 18:14:51 -07:00
Myle Ott
2d900bf308 Fix tests (#1352)
Summary:
We need to keep `--num-workers=0` during tests

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1352

Reviewed By: alexeib

Differential Revision: D24375411

Pulled By: myleott

fbshipit-source-id: 9975ed5405f3b19b4dd0877ca15ee3081b185942
2020-10-16 17:36:13 -07:00
Armen Aghajanyan
f2fa07106c RXF OS Implementation (#2455)
Summary:
## What does this PR do?
Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156

This code was used to generate all the results from the paper excluding probing results.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455

Reviewed By: myleott

Differential Revision: D23444863

Pulled By: AkshatSh

fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b
2020-10-16 14:32:12 -07:00
Xian Li
573c2f4b60 Opensource code for Deep Transformer with Latent Depth (#2703)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf).

New features and design choices made:

- New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`.
- Design choice: move  `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`.

- New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z.
- Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer.

- New feature: allow multilingual_translation task to train with latent depth (results in the paper).
- Design choice:
  - added additional arguments in the multilingual_translation task.
  - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder.
  - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703

Reviewed By: myleott

Differential Revision: D24155059

Pulled By: xianxl

fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668
2020-10-15 09:26:05 -07:00
Changhan Wang
1d1c145387 speech-to-text OSS
Summary:
Imported from https://github.com/fairinternal/fairseq-py/pull/1284. Updated according to PR comments.

Main changes:
* New task: `fairseq.tasks.speech_to_text`
  * Multilingual support: multiple train sub-splits, temperature-based sampling, language ID tokens
* New dataset: `fairseq.data.audio.speech_to_text_dataset`
* Added accuracy metrics and BOS prefix removal to label smoothed cross entropy
* New models: Transformer (`fairseq.models.speech_to_text.s2t_transformer`) and BLSTM (`fairseq.models.speech_to_text.berard`)
* Extended scorers:
  * Added a base scorer class: `fairseq.scorers.BaseScorer` (the parent class for all scorers except the BLEU scorer in CPP)
  * Added an evaluation tokenizer: `fairseq.scorers.eval_tokenizer` which leverages sacreBLEU's built-in tokenizers and allows character-level tokenization as well as punctuation removal (for WER scoring).
  * Added chrF scorer: `fairseq.scorers.chrf`
* Online Mel-filter bank speech feature extraction (via CPP-based pyKaldi or Python-based TorchAudio): `fairseq.data.audio.audio_utils`
* Online speech feature transforms: `fairseq.data.audio.feature_transforms.*`
* Fixed the subsampled sequence lengths in VGGTransformer (`examples.speech_recognition.models.vggtransformer`)
* Examples under `examples/speech_to_text`:
  * LibriSpeech (ASR): better results than VGGTransformer with smaller Transformer-based models
  * MuST-C (ST): comparable to [SOTA results](https://arxiv.org/pdf/2004.10234.pdf) but with less tricks

Reviewed By: jmp84

Differential Revision: D24065273

fbshipit-source-id: 5f842ca9c826f92d4af660705611885fe440a9ab
2020-10-14 12:30:05 -07:00
alexeib
e3c4282551 remove max_sentences from args, use batch_size instead (#1333)
Summary:
now that we are moving to using dataclasses to define fairseq configuration, having aliases for options is no longer practical. this pr removes "max-sentences" argument while keeping its alias "batch-size", which is more appropriate

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1333

Reviewed By: shruti-bh

Differential Revision: D24121305

Pulled By: alexeib

fbshipit-source-id: 34343cea54c8f2c8b059c38ef9f29b66e76df9fb
2020-10-05 19:09:01 -07:00
Myle Ott
7c292af66f Fix hub (#2687)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2687

Reviewed By: alexeib

Differential Revision: D24095130

Pulled By: myleott

fbshipit-source-id: 7d371bccb550ec68b2b9b39dfa4c0718356508d6
2020-10-02 19:02:01 -07:00
Seppo Enarvi
c049749c7a Fix full-context alignment with transformer_align model (#2675)
Summary:
Fixes https://github.com/pytorch/fairseq/issues/2673.

# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2673 (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2675

Reviewed By: ngoyal2707

Differential Revision: D24001793

Pulled By: myleott

fbshipit-source-id: 6b4e9270e5f5a31ba1b65ae2ae717019108af913
2020-10-01 12:37:16 -07:00
Myle Ott
caea771afa Fix tests (#2670)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2670

Reviewed By: ngoyal2707

Differential Revision: D23982491

Pulled By: myleott

fbshipit-source-id: 629b791d6c05dd67b63dcc2da0313c6799f777f8
2020-09-29 07:27:56 -07:00
Myle Ott
a524832d1d Publish Linformer to public fairseq
Summary: Initial open source release for Linformer

Reviewed By: madian9

Differential Revision: D22771263

fbshipit-source-id: bf08c64c5ecb899db9da00b79d09f6308347c915
2020-09-28 15:32:20 -07:00
Seppo Enarvi
3b7d85c91f Transformer with integrated pointer-generator network (#2529)
Summary:
This pull request implements a variant of the Transformer model that uses an attention distribution for pointing to input words. The attention distribution over the input words is interpolated with the normal output distribution over the vocabulary words, as in [See et al. (2017)](https://arxiv.org/abs/1704.04368). This allows the model to generate words that appear in the input, even if they don't appear in the vocabulary, helping especially with small vocabularies.

The mechanism for copying out-of-vocabulary words from the input has been implemented differently to See et al. In their [implementation](https://github.com/abisee/pointer-generator) they convey the word identities through the model in order to be able to produce out-of-vocabulary words. We wanted to minimize changes to the Fairseq code base and took a different approach, which I'll describe below. The entire implementation is contained in one file (plus there's one new test).

Copying out-of-vocabulary words is possible by pre-processing the input and post-processing the output. The user may add special words to the end of the vocabulary that can be used in place of `<unk>` tokens to identify different input positions (e.g. `<unk-0>`, `<unk-1>`, `<unk-2>`, ...). The number of these special words is given to the model with the `--source-position-markers` argument—the model simply maps all of these to the same word embedding as `<unk>`. With a simple post-processing the user may retrieve word at position N in the original text and use it in place of `<unk-N>`.

I didn't find a good place to document this usage of this model, so let me know if you think I should improve documentation somewhere.

This feature has not yet been discussed via a GitHub issue, but I'll open a new issue for discussion.

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2529

Reviewed By: ngoyal2707

Differential Revision: D23398430

Pulled By: myleott

fbshipit-source-id: f2f26c8ce8802ae6cf95515637660348ff3fc457
2020-09-25 08:29:10 -07:00
Mu Tian
42c5dcbd18 hydra fairseq 3 - inherit from legacy for fairseq classes
Summary: hydra fairseq 3 - inherit from legacy for fairseq classes

Reviewed By: alexeib

Differential Revision: D23375457

fbshipit-source-id: ef9d19f2d02f2326eea44a70f1f6e1668b420840
2020-09-09 17:02:13 -07:00
Alex Xiao
e171c8d86a Account for checkpoint updates when calling take on CountingIterator
Summary:
Recently some of our runs are getting:

"RuntimeError: Mismatch between actual and expected iterable length. Please report this to the fairseq developers."

f214567466

We never ran into this before because this is a new check by fairseq to be more strict with iterators.

Fix is to:

1. Account for the offset (i.e. load from checkpoint mid epoch) when propagating `take`. This fixes the issue of `next` returning too many things, which is what causes the error.

2. Update the underlying iterator when calling `take` on `BufferedIterator` and the length of the `BufferedIterator`. Although this doesn't cause the error, it is necessary to maintain consistency.

Reviewed By: myleott

Differential Revision: D23443012

fbshipit-source-id: 73c26db8392e5508a61acfda7ca40a24df89fabb
2020-09-04 14:26:53 -07:00
Yuqing Tang
0cde6b4e50 Added shared dictionary check for translation_multi_simple_epoch task.
Summary: translation_multi_simple_epoch task only supports shared dictionary across all languages, so add the check in the task setup.

Reviewed By: pipibjc

Differential Revision: D23288388

fbshipit-source-id: 4236a096bcb75429b486ef8a9244e3ef0d5095f0
2020-08-28 10:11:30 -07:00
Alex Xiao
49940c8d25 fix mismatch length of counting iterator when truncated
Summary:
PySpeech integration training tests have recently been stuck at end of epoch.

Digging into it, it looks like this is because the end of epoch check relies on this (https://fburl.com/diffusion/xt09z6n9):

```
def end_of_epoch(self) -> bool:
     """Returns whether the most recent epoch iterator has been exhausted"""
     return not self._cur_epoch_itr.has_next()
```

which is implemented like this in CountingIterator:

    def has_next(self):
        """Whether the iterator has been exhausted."""
        return self.n < len(self)

It seems like D23172408 (110f9f0cc7) modified CountingIterator such that `len(self) > len(iter(self))` when `take()` is used. This mismatch causes `has_next` to return `True` for some  PySpeech processes even when all elements in `iter(self))` have been consumed, causing training to get stuck.

My proposed fix is to remove the `self.early_stop`  variable and just directly modify `self.total` and `self.iterable`, ensuring `len(self) == len(iter(self))`

Reviewed By: myleott

Differential Revision: D23250734

fbshipit-source-id: efb5a38216783bded67f501135b2f68b9246b9dd
2020-08-20 20:08:38 -07:00
Matt Post
bd1b35d9b7 Added constrained decoding (#1536) (#2402)
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?

This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add *ordered constraints*, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).

### Usage and quick start

It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)):

```bash
echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \
  | [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \
  | [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \
  | PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \
  --bpe fastbpe \
  --bpe-codes $modeldir/bpecodes \
  --constraints \
  --constraints-both
  -s de -t en \
  --path $modeldir/model1.pt \
  --max-tokens 1000 \
  --beam 5 \
```

Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)

```text
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
C-0     hard
C-0     influence
H-0     -1.5333266258239746     Mach@@ ine trans@@ lation is hard to influence .
D-0     -1.5333266258239746     Machine translation is hard to influence .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
H-0     -0.3731671869754791     Mach@@ ine trans@@ lation is difficult to control .
D-0     -0.3731671869754791     Machine translation is difficult to control .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514
2020-07-31 12:17:55 | INFO | fairseq_cli.interactive | Total time: 12.803 seconds; translation time: 3.688
```

Note the new tags present in the output:

* `C-#` records active constraints (after applying preprocessing) for a sentence
* `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)

Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.

### Implementation notes

This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.

Addresses https://github.com/pytorch/fairseq/issues/1536.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402

Reviewed By: alexeib

Differential Revision: D23188945

Pulled By: myleott

fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6
2020-08-20 11:59:53 -07:00
Myle Ott
adbd89fd4b Misc fixes (#2492)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2492

Reviewed By: ngoyal2707

Differential Revision: D23177728

Pulled By: myleott

fbshipit-source-id: 32424f61cab57f759f87e16e8d5144d3eed5ae36
2020-08-20 06:42:10 -07:00
Jun Ru Anderson
68c87f0abf optimize mixed precision (#1248)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Implements the multiply_factor optimization used in memory efficient fp16 training to mixed precision training. The methods multiply_grads and clip_grad_norm do not touch each gradient, but rather a "multiply factor" that is then factored in when unscaling gradients.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1248

Reviewed By: myleott

Differential Revision: D23201396

Pulled By: andersonic

fbshipit-source-id: 6c6f64542893e0ecac72e132464bb334dcb9874d
2020-08-19 16:04:40 -07:00
Myle Ott
9831634946 Misc fixes (#2448)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2448

Reviewed By: ngoyal2707

Differential Revision: D23011193

Pulled By: myleott

fbshipit-source-id: 1a29481707108e4465aca78ec1581fb79f05efba
2020-08-14 10:24:51 -07:00
Yuqing Tang
0bb7bc3777 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: add finetuning options
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- Minor changes to
    - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Reviewed By: myleott

Differential Revision: D22483494

fbshipit-source-id: 733300fd6a4d185e561c793ea668047c96f616c6
2020-08-06 10:20:39 -07:00
Rakesh Chada
b040dae714 Fixes checkpoint_path while loading a model-parallel checkpoint (#2365)
Summary:
Fixes https://github.com/pytorch/fairseq/issues/2351

Pull Request resolved: https://github.com/pytorch/fairseq/pull/2365

Reviewed By: pipibjc

Differential Revision: D22727384

Pulled By: myleott

fbshipit-source-id: e2ff703181a6b8f10df9b4ee7aa3f9e128c04b4e
2020-08-04 08:25:50 -07:00
Stanislau Hlebik
698e3b91ff remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:21:51 -07:00
Stanislau Hlebik
7ea5e3b341 remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:21:45 -07:00
Yuqing Tang
e52d071ee8 Multilingual v1: Multilingual Training with multiple bitext and monolingual datasets: new multiligual task
Summary:
A first version of XLNMT multilingual project code release: Multilingual Training with multiple bitext

- A new task to glue all things together: fairseq/tasks/translation_multi_simple_epoch.py
- Minor changes to
    - fairseq/data/iterators.py to allow dynamic batch sampler
    - fairseq/checkpoint_utils.py to add finetuning option instead of using restore_file which will restore from original model when being requeued.

Reviewed By: pipibjc

Differential Revision: D22483484

fbshipit-source-id: 283b67e538508f330b0968609b7dae64d26bea05
2020-07-16 09:34:29 -07:00
m_fomicheva
2887663811 Implemented applying dropout at inference time (#2308)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2308

Implemented Monte Carlo dropout. Added README to reproduce the results from our paper
that applies this idea for unsupervised quality estimation of NMT (joint work of Facebook AI and the University of Sheffield):

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia. Unsupervised Quality Estimation for Neural Machine Translation. Accepted to TACL

Retaining dropout at test time is not possible in the current code base. The statement
```
if not self.retain_dropout:
  model.eval()
```
in `SequenceGenerator` does not have any effect, since model `training` attribute is already set to False by the method `make_generate_fast_`, which is applied before initializing `SequenceGenerator` in `generate.py`. `make_generate_fast_` throws an exception when trying to set `training` to True after its application. Also, if I am not mistaken `self.training=True` can have other effects, so setting it to True only for the purpose of retaining dropout at test time might be confusing. I propose an alternative implementation where `retain_dropout` is an attribute of FairseqModel class.

# Before submitting

- [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [Y] Did you make sure to update the docs?
- [Y] Did you write any new necessary tests?

## What does this PR do?
New feature.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2151

Reviewed By: ngoyal2707

Differential Revision: D22048889

Pulled By: myleott

fbshipit-source-id: 0d0d4784a7314fc7a45b76341fd3b8232b3e2cf0
2020-07-08 13:06:13 -07:00
Mike Ruberry
320bf8cf96 Updates full to no longer use deprecated integer fill_value type inference
Summary:
In PyTorch 1.5 using an integer fill_value and not setting the dtype or out kwarg with torch.full was deprecated, and soon will throw a runtime error. In the future, torch.full will infer its dtype from the fill_value, and these would produce integer, not float, tensors. This update maintains the current behavior.

Created from Diffusion's 'Open in Editor' feature.

Reviewed By: myleott

Differential Revision: D22161456

fbshipit-source-id: b5d687e4de83dba6e76cae6e61b5106bf5b320db
2020-06-22 11:56:58 -07:00
Joshua Meier
152a3fe143 Support residual connections in LSTM models (#1103)
Summary:
Adds support for residual connections in LSTM models.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1103

Reviewed By: myleott

Differential Revision: D21639942

Pulled By: joshim5

fbshipit-source-id: a02ddfe080a847fd91a9c6a5074cb6dc782f7727
2020-06-05 12:15:10 -07:00
Myle Ott
5abc774eea Re-enable test_transformer_fp16 GPU test
Reviewed By: theweiho

Differential Revision: D21890628

fbshipit-source-id: 4088884dd2a82a831f1c129e675eb233c469242a
2020-06-05 06:06:20 -07:00
Wei Ho
ea092c2aa6 Split out fairseq GPU tests & add new deeplearning_fairseq_gpu contbuild using remote execution
Reviewed By: myleott

Differential Revision: D21472387

fbshipit-source-id: efde278baf6a05e8a81a9630b44c7e7e7c7fe7fc
2020-06-03 18:53:35 -07:00
Marco Gaido
5453e4355b Avoid NaN in speech_recognition with input having only 1 spec… (#1864)
Summary:
…trogram

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/1863.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1864

Reviewed By: yqwangustc

Differential Revision: D21663642

Pulled By: myleott

fbshipit-source-id: f411c5c01c7505375bec6d47554e85fb70877e9c
2020-05-27 07:50:34 -07:00
Myle Ott
803c0a6d11 Update iterators to support counting, rename CountingIterator.count -> n and add tests (#1166)
Summary:
A few changes here:
- update GroupedIterator and ShardedIterator to support counting. This will be useful on TPUs, since the TPU dataloading threads may advance faster than we can process them.
- add tests for the above
- in CountingIterator, rename `count` -> `n`. This is needed because `count` is overloaded for iterables (e.g., `list` defines a different `count` method, which is actually a search function).
- in CountingIterator, rename `override_len` -> `total` to be more consistent with other iterators (e.g., tqdm). This functionality was unused previously (it's only needed for TPUs), so the rename is easy.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1166

Reviewed By: ngoyal2707

Differential Revision: D21373525

Pulled By: myleott

fbshipit-source-id: 102f3d50ed1a5163a7d1216ca5a179564a05dfe4
2020-05-14 13:57:04 -07:00
Myle Ott
9a718e2985 Various fixes (#2127)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2127

Reviewed By: ngoyal2707

Differential Revision: D21550962

Pulled By: myleott

fbshipit-source-id: ddbe3f287f170862378e0702fc378a4fe400793a
2020-05-14 10:23:34 -07:00
Myle Ott
6209d7d6b2 Fix eval_lm (fixes #2083) and a few other small things (#2100)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2100

Reviewed By: ngoyal2707

Differential Revision: D21456309

Pulled By: myleott

fbshipit-source-id: 291711589fca9f158e0fdbf01194da3e66fbd0aa
2020-05-11 12:43:14 -07:00
Marco Gaido
11345a7608 Pass all net_inputs in SequenceGenerator (#2090)
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/2022.

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2090

Reviewed By: cndn

Differential Revision: D21385984

Pulled By: myleott

fbshipit-source-id: 1428e02e625b8625df71a83c05dcf933c3f899df
2020-05-10 06:13:06 -07:00
Myle Ott
89d18af127 Cleanup transformer (#1160)
Summary:
This also fixes https://github.com/pytorch/fairseq/issues/2079
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1160

Reviewed By: ngoyal2707

Differential Revision: D21338290

Pulled By: myleott

fbshipit-source-id: 266bda0921a42b218127f83ab7aa8cc8282582cd
2020-05-04 07:16:30 -07:00
Myle Ott
7a6519f84f Bugfixes (#1159)
Summary:
Several bugfixes to get tests passing on OSS master
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1159

Reviewed By: ngoyal2707

Differential Revision: D21331993

Pulled By: myleott

fbshipit-source-id: 327ae19f6797f92b8c6083a49d5f5edb0872223e
2020-05-01 04:09:37 -07:00
Halil Akin
c748571417 Add a test for fp16 to fairseq
Reviewed By: myleott

Differential Revision: D21315518

fbshipit-source-id: df17efeec6fb2b576371b124d78e9294cef3e74c
2020-04-30 10:35:15 -07:00
Ning Dong
b1af3e33d5 Modify gated unit tests to fix Fairseq OSS (#2059)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/2059

test_ensemble_sequence_generator and test_export_ensemble_model are green on fbcode master but Pytorch 1.5 release cut happened before the TorchScript fix, so updating the gate to 1.6
Remove quantization test from fairseq as FBGEMMS is binded at OSS side. Will add the test back in fbtranslate but land this first to fix OSS side failures.

Reviewed By: myleott

Differential Revision: D21231873

fbshipit-source-id: 8a2ad7dbed118ca8e3f4c351c399a82fd9740445
2020-04-24 13:29:50 -07:00
Myle Ott
d502958b4d Fix LSTM LM unit tests (#2021)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2021

Reviewed By: cndn

Differential Revision: D21092383

Pulled By: myleott

fbshipit-source-id: c6074fe14cc977b3674d77c1c1bc8fb726108934
2020-04-21 10:48:37 -07:00
Angela Fan
1c8ab79ca5 quant noise code, readme, start of adding quantization (#1896)
Summary:
FUNCTIONALITY:
This diff provides two core pieces of functionality
- Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training
- Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training.

TODO:
-> Pierre, look at quantization code
-> int4 and int8 quantization will be added soon.

EVALUATED TEST CASES:

0. Training of LM and BERT models starts from scratch with no errors -> yes

1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise

2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes

3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline

4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes

5. Check wps during training and wps during inference, no large change from before -> yes

6. Check structured dropout isn't being applied at eval time -> yes

7. Works in combination with LayerDrop -> yes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896

Reviewed By: myleott

Differential Revision: D20609420

Pulled By: huihuifan

fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955
2020-04-21 09:28:56 -07:00