Commit Graph

309 Commits

Author SHA1 Message Date
Nayan Singhal
b5f41f828b Add Unit test cases for BMUF
Summary:
This unit test guards the bmuf code.

change:
1. distributed_init assumes we are always using cuda device which is not the case if you are using "gloo" backend on CPU machine.

Reviewed By: jay-mahadeokar

Differential Revision: D17821391

fbshipit-source-id: 28e1bb39f7a4889b1dc6bd636b7c499e55bfc69a
2019-10-15 09:59:36 -07:00
Sarthak Garg
1c66792948 Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in  "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859
2019-09-30 06:57:32 -07:00
Stephan Peitz
4ac2c5f2cc Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097)
Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c
2019-09-29 05:09:42 -07:00
Changhan Wang
86857a58bf Levenshtein Transformer paper code
Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7
2019-09-27 13:58:45 -07:00
Jerry Ma
a8a85c2676 Add dataset class for weighted sampling with replacement. (#861)
Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658
2019-09-19 10:36:00 -07:00
Myle Ott
6ce55e4b01 Small fixes
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a
2019-08-19 15:08:25 -07:00
Myle Ott
7c89e13f64 Fix tests
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822

Differential Revision: D16800078

Pulled By: myleott

fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7
2019-08-13 20:36:00 -07:00
Myle Ott
d015d23a1f Add fairseq-validate
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765

Differential Revision: D16763357

Pulled By: myleott

fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503
2019-08-13 13:07:04 -07:00
Dmytro Okhonko
72f9364cc6 Asr initial push (#810)
Summary:
Initial code for speech recognition task.
Right now only one ASR model added - https://arxiv.org/abs/1904.11660

unit test testing:
python -m unittest discover tests

also run model training with this code and obtained
5.0 test_clean | 13.4 test_other
on librispeech with pytorch/audio features
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810

Reviewed By: cpuhrsch

Differential Revision: D16706659

Pulled By: okhonko

fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514
2019-08-08 02:46:12 -07:00
Myle Ott
4abadbdf77 Fix sampling with beam>1
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792

Differential Revision: D16591987

Pulled By: myleott

fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694
2019-08-01 07:34:06 -07:00
Myle Ott
e75cff5f2c Relicense fairseq under MIT license (#786)
Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034
2019-07-30 07:48:23 -07:00
Sara Hanson
a03fe6faf3 Implement sparse transformer fixed attention pattern (#804)
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
2019-07-22 16:42:55 -07:00
Myle Ott
47fd985269 Move Masked LM components to legacy/ -- new ones are coming
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f
2019-07-21 19:38:00 -07:00
Xing Zhou
e46b924dea Nucleus (top-P) sampling (#710)
Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
2019-07-17 06:21:33 -07:00
Myle Ott
efb4345042 Fix resuming training when using --memory-efficient-fp16
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396
2019-06-23 14:19:16 -07:00
Bairen Yi
a8f28ecb63 Python3.5 compat (#794)
Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b
2019-06-11 04:10:08 -07:00
Matt Le
fa7791df9a Change encoder_learned_pos default back to True for xlm_base
Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde
2019-06-06 07:38:17 -07:00
Matt Le
5408bc0821 Fix loading XLM pretraining
Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm.  Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb
2019-06-04 15:36:55 -07:00
Myle Ott
ffc3bb5806 Add --reset-dataloader
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d
2019-05-30 11:41:40 -07:00
Yongqiang Wang
8ce2c35d8e Implement reducing footprint of average checkpoint correctly (#747)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522
2019-05-24 12:12:24 -07:00
Ning Dong
ee28411f76 Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/730

Pull Request resolved: https://github.com/pytorch/translate/pull/528

Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch.

Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset.

Reviewed By: liezl200

Differential Revision: D15260872

fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f
2019-05-20 11:31:53 -07:00
Myle Ott
3bfbb49ba5 Clean up sharded train iterator
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da
2019-05-16 21:03:08 -07:00
Myle Ott
dffb167449 Updates to model API (#561)
Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
2019-05-15 07:12:41 -07:00
Myle Ott
7432130eb0 rm default_key from MultiCorpusSampledDataset
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9
2019-05-14 16:45:21 -07:00
Dmytro Okhonko
cd1e5c09fa Move save/load checkpoint functions to utils
Summary:
Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py
Move `get_perplexity` from train.py to utils.py.
This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library.

Reviewed By: myleott

Differential Revision: D15289607

fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1
2019-05-14 12:57:12 -07:00
Jingfei Du
93ec8d0bc6 expose arguments for bias_kv and zero_attn for masked_lm
Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.

Reviewed By: myleott

Differential Revision: D15266154

fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8
2019-05-08 17:48:29 -07:00
Davide Caroselli
a1c997bd9a Memory-Mapped IndexedDataset implementation (#589)
Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
2019-05-07 07:13:52 -07:00
Myle Ott
e4edf27a97 Improve init speed of TokenBlockDataset and EpochBatchIterator
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/704

Differential Revision: D15221549

Pulled By: myleott

fbshipit-source-id: b0021acdc2d7792ce51421f1432e1f2bd8218f7b
2019-05-07 07:08:53 -07:00
Naman Goyal
0add50c2e0 allowing sharded dataset (#696)
Summary:
Co-authored-by: myleott <myleott@fb.com>

Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.

For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.

myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/696

Differential Revision: D15214049

fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013
2019-05-06 15:27:17 -07:00
Myle Ott
96ac28d33d Fix and generalize --temperature option (#508)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0
2019-05-04 16:39:32 -07:00
Myle Ott
d45db80431 Merge internal changes (#654)
Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
2019-04-29 19:50:58 -07:00
Liezl Puzon
57b6a6dbfb Fix fairseq unittest timeouts (#667)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/667

Use smaller models so that unittests won't timeout

Reviewed By: pipibjc

Differential Revision: D15056894

fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013
2019-04-25 08:39:36 -07:00
Liezl Puzon
5008fd4e5a XLM for NMT: option to only load encoder or decoder (#666)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/666

Option to load the XLM weights into only the encoder or the decoder

Reviewed By: pipibjc

Differential Revision: D14881004

fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac
2019-04-25 05:57:02 -07:00
Liezl Puzon
8da9b1c530 Load a XLM model into transformer encoder / decoder for MT training (#629)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15
2019-04-25 05:57:02 -07:00
Ning Dong
90d6eac2b3 Enable custom sampling strategy in MultiCorpusSampledDataset (#639)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/639

Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously.

Reviewed By: liezl200

Differential Revision: D14965774

fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e
2019-04-16 23:29:02 -07:00
Myle Ott
e12e1d254c Simplify and generalize utils.make_positions
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625

Differential Revision: D14822123

Pulled By: myleott

fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103
2019-04-15 07:32:11 -07:00
Peng-Jen Chen
d7e19573fa Back translation + denoising in MultilingualTranslation task (#620)
Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7
2019-04-10 10:56:51 -07:00
Dmytro Okhonko
860010e907 Handle 3+ dimensional input in sequence_generator + nits
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
2019-03-12 15:12:21 -07:00
Dmytro Okhonko
d17fa85135 Adadelta optimizer
Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta

Reviewed By: myleott

Differential Revision: D14418635

fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f
2019-03-12 15:12:21 -07:00
Vladimir Karpukhin
f296824f40 Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541

Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo

Differential Revision: D14251048

fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8
2019-02-28 09:19:12 -08:00
Myle Ott
bc919276a1 Add test for mixture of experts
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543

Differential Revision: D14259481

Pulled By: myleott

fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e
2019-02-28 08:56:24 -08:00
Myle Ott
44d27e645b Add Tensorboard support (#530)
Summary:
Enable with the `--tensorboard-logdir` option.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/530

Differential Revision: D14218430

Pulled By: myleott

fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053
2019-02-25 18:40:18 -08:00
Myle Ott
b65c579bed Modularize generate.py (#351)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/351

This makes it easier for tasks to plugin to generate.py/interactive.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/520

Differential Revision: D14183881

Pulled By: myleott

fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33
2019-02-22 10:08:52 -08:00
Davide Caroselli
bbb4120b00 Support custom Dictionary implementations in 'preprocess.py' (#448)
Summary:
The `preprocess.py` script has been refactored in order to:

1. Use the `options` module for command line arguments  parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/448

Differential Revision: D13674819

Pulled By: myleott

fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
2019-02-01 09:45:59 -08:00
Myle Ott
3dce7c9fc0 Add --input option to interactive.py to support reading from file
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484

Differential Revision: D13880636

Pulled By: myleott

fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e
2019-01-30 09:46:05 -08:00
Myle Ott
42be3ebd41 Merge internal changes (#483)
Summary:
Changelog:
- `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
- `0d76427`: fix assertion error when training language model with dataset containing empty sentences
- minor bug and style fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/483

Differential Revision: D13867899

Pulled By: myleott

fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda
2019-01-30 09:01:10 -08:00
Myle Ott
b41c74dc5b Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473)
Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473

Differential Revision: D13819717

Pulled By: myleott

fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
2019-01-25 15:40:26 -08:00
Myle Ott
7633129ba8 Merge internal changes (#283)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
2019-01-04 20:03:19 -08:00
Myle Ott
3c19878f71 Refactor BacktranslationDataset to be more reusable (#354)
Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
2018-11-25 21:26:03 -08:00
Myle Ott
0864a9c49d Fix build for docs
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372

Differential Revision: D13114426

Pulled By: myleott

fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3
2018-11-18 08:32:59 -08:00
Liezl Puzon
2b13f3c036 Support BPE end of word marker suffix in fairseq noising module
Summary:
There are 2 ways to implement BPE:
1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
2. use a end of word marker suffix to indicate that there is no more subtokens left in the word

This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.

Reviewed By: xianxl

Differential Revision: D12919428

fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
2018-11-06 20:40:36 -08:00
Liezl Puzon
b1521f962e Refactor fairseq/test_noising with a word shuffle helper function (#340)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/340

This allows us to do a lot less copy paste when adding new word shuffle function tests

Reviewed By: xianxl

Differential Revision: D12810304

fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70
2018-11-01 17:13:05 -07:00
Liezl Puzon
0b05467dd8 Black formatting in fairseq/test_noising (#341)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/341

Use black formatting in test_noising.py

Reviewed By: xianxl

Differential Revision: D12810285

fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297
2018-11-01 17:13:05 -07:00
Myle Ott
5bbd148e6e Fix tests + style nits + Python 3.5 compat
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336

Differential Revision: D12876709

Pulled By: myleott

fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86
2018-11-01 01:28:30 -07:00
Xian Li
90c01b3a0b Extend WordShuffle noising function to apply to non-bpe tokens
Summary:
We'd like to resue the noising functions and DenoisingDataset in
adversarial training. However, current noising functions assume the input are
subword tokens. The goal of this diff is to extend it so the noising can be
applied to word tokens. Since we're mostly interested in the word shuffle
noising, so I only modified the WordShuffle class.

Reviewed By: liezl200

Differential Revision: D10523177

fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
2018-10-26 18:20:11 -07:00
Deepak Gopinath
613ffeea9c Add size method to BacktranslationDataset + misc fixes (#325)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/325

RoundRobinZipDataset requires size(index) method implemented in every dataset used. Also added missing return statements in a few methods.

Reviewed By: liezl200

Differential Revision: D10457159

fbshipit-source-id: 01856eb455f2f3a21e7fb723129ff35fbe29e0ae
2018-10-22 22:27:59 -07:00
Liezl Puzon
e286243c68 Add denoising dataset for denoising autoencoder (#306)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/306

This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model.

Reviewed By: xianxl

Differential Revision: D10078981

fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c
2018-10-05 18:21:27 -07:00
Liezl Puzon
8798a24031 Have noising account for sentences with and without EOS (#305)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/305

Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS

Reviewed By: xianxl

Differential Revision: D10114425

fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4
2018-10-05 18:21:26 -07:00
Liezl Puzon
b9e29a4711 Option to remove EOS at source in backtranslation dataset
Summary:
If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.

Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.

Reviewed By: jmp84

Differential Revision: D10157725

fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
2018-10-03 18:23:32 -07:00
Myle Ott
fc677c945e Fix proxying in DistributedFairseqModel
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302

Differential Revision: D10174608

Pulled By: myleott

fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
2018-10-03 16:22:35 -07:00
Liezl Puzon
f766c9a0d5 Pass in kwargs and SequenceGenerator class to init BacktranslationDataset
Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs.

Reviewed By: xianxl

Differential Revision: D10156552

fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
2018-10-02 18:22:27 -07:00
Liezl Puzon
86e93f2bcf Explicitly list out generation args for backtranslation dataset
Summary:
Using argparse Namespace hides the actual args that are expected and makes code harder to read.

Note the difference in style for the args list

    def __init__(
        self,
        tgt_dataset,
        tgt_dict,
        backtranslation_model,
        unkpen,
        sampling,
        beam,
        max_len_a,
        max_len_b,
    ):

instead of

    def __init__(
        self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling,
        beam,  max_len_a, max_len_b,
    ):

Reviewed By: dpacgopinath

Differential Revision: D10152331

fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
2018-10-02 15:45:38 -07:00
myleott
f8377a704c fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11 2018-09-30 12:28:20 -07:00
Myle Ott
864b89d044 Online backtranslation module
Co-authored-by: liezl200 <lie@fb.com>
2018-09-25 17:36:43 -04:00
Alexei Baevski
cfd2a3a048 core changes to support latte collab 2018-09-25 17:36:43 -04:00
Myle Ott
fbe8ce65d3 Better support for various c10d API changes 2018-09-25 17:36:43 -04:00
Myle Ott
e775877f68 Add unit test to verify reproducibility after reloading checkpoints 2018-09-25 17:36:43 -04:00
Stephen Roller
bfeb773214 Pass encoder_input to generator, rather than src_tokens/src_lengths. 2018-09-25 17:36:43 -04:00
Myle Ott
8bd8ec8fa8 Update LM test with --no-c10d 2018-09-25 17:36:43 -04:00
Myle Ott
311d2c6ca9 Revert sequence generator changes 2018-09-25 17:36:43 -04:00
Stephen Roller
e6d45d5cd7 Generator: net_input instead of manual src_tokens. 2018-09-25 17:36:43 -04:00
Myle Ott
d473620e39 Test max_positions 2018-09-03 19:15:23 -04:00
Myle Ott
0a7f9e64bb Further generalize EpochBatchIterator and move iterators into new file 2018-09-03 19:15:23 -04:00
Myle Ott
2e507d3cb4 Clean up FairseqTask so that it's easier to extend/add new tasks 2018-09-03 19:15:23 -04:00
Myle Ott
8c0ca1a0c1 Diverse Beam Search 2018-09-03 19:15:23 -04:00
alexeib
f1d81db8b7 fix tests 2018-09-03 19:15:23 -04:00
Myle Ott
ef43da72d3 Factor out search logic in SequenceGenerator 2018-09-03 19:15:23 -04:00
alexeib
0b5166db2e fix tests 2018-09-03 19:15:23 -04:00
alexeib
2dc074d8f2 add flag that allows keeping optimizer config
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
2018-09-03 19:15:23 -04:00
Alexei Baevski
885e7ec9ec character token embeddings for word level predictions 2018-09-03 19:15:23 -04:00
Myle Ott
bb5f15d137 Iterate on need_attn and fix tests 2018-07-25 07:26:08 -07:00
Myle Ott
6edf81ddfe
Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
Myle Ott
74efc21403
Fix attention order in unit tests (fixes #195) (#197) 2018-06-25 12:16:10 -04:00
Myle Ott
c6fe9fc5e0 Fix for Dictionary.finalize 2018-06-24 13:19:07 -04:00
Myle Ott
6ec5022e57 Move reorder_encoder_out to FairseqEncoder and fix non-incremental decoding 2018-06-21 14:58:50 -04:00
Myle Ott
572a1d55df
Fix --output-format raw option to preprocess.py (Fixes #188) (#190) 2018-06-21 08:19:16 -04:00
Myle Ott
bfcc6ec739 Fix bidirectional lstm 2018-06-15 13:05:23 -06:00
Myle Ott
e89329d665 Updates for latest PyTorch 2018-06-15 13:05:22 -06:00
Myle Ott
ff68a9ef50 Add FairseqTask
A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.

Changes:
- Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
- Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
- Remove LEFT_PAD_* constants and make them configurable per task
2018-06-15 13:05:22 -06:00
Myle Ott
16a72b4dd1 Add more integration tests (LM, stories, transformer, lstm) 2018-06-15 13:05:20 -06:00
Myle Ott
736fbee2a1 Suppress stdout in test_train 2018-06-15 13:05:20 -06:00
Myle Ott
cf1c64a5f7 Nits 2018-06-15 13:05:19 -06:00
alexeib
7d5604024b record end_of_epoch in checkpoint 2018-06-15 13:05:17 -06:00
alexeib
978c125aee fix restoring from middle of epoch; fix defaulting transformer dropout params 2018-06-15 13:05:17 -06:00
alexeib
4c2ef2de74 Conv lm implementation
This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
2018-06-15 13:05:16 -06:00
Myle Ott
ae2585d9fd Fix tests 2018-06-15 13:05:15 -06:00
Myle Ott
8afb77612c Fix tests 2018-06-15 13:05:11 -06:00
Myle Ott
ec0031df7b
Merge internal changes (#163) 2018-05-24 13:38:12 -04:00
Myle Ott
d3795d6cd1
Merge internal changes (#136)
Changes:
- 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
- c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
- 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
- small bugfixes for distributed training, LSTM, inverse square root LR scheduler
2018-04-02 10:13:07 -04:00
Myle Ott
e73fddf453 Filter padding properly in LabelSmoothedCrossEntropyCriterion (#229) 2018-03-05 14:20:29 -08:00
Myle Ott
6e4d370af9
More updates for PyTorch (#114) 2018-03-01 14:04:08 -05:00
Myle Ott
9438019ff0 Refactor incremental generation to be more explicit and less magical (#222) 2018-02-27 14:28:24 -08:00
Myle Ott
0d90e35f3b More unit test fixes 2018-02-27 14:28:24 -08:00
Myle Ott
29c8274128 Fix tests and flake8 2018-02-27 14:28:24 -08:00
Myle Ott
6641520612
fairseq-py goes distributed (#106)
This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.

Changes:
- c7033ef: add support for distributed training! See updated README for usage.
- e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
- 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
- 90c2973 and 1da6265: improve unit test coverage
2018-02-27 17:09:42 -05:00
Myle Ott
e1f49695ee Rename LabelSmoothedCrossEntropy to LabelSmoothedNLLLoss 2017-11-08 08:01:31 -07:00
Myle Ott
6e4b7e22ee Refactor model definitions
* Move some functionality out of FConvModel into FairseqModel base class
* Move incremental decoding functionality into FairseqIncrementalDecoder module
* Refactor positional embeddings to be more specific to FConvModel
2017-11-08 07:59:22 -07:00
Sam Gross
ae0c05d920 Fix call ordering to ATen addmm and sum (#22) 2017-10-11 10:14:19 -04:00
Sergey Edunov
e734b0fa58 Initial commit 2017-09-14 17:22:43 -07:00