Commit Graph

309 Commits

Author SHA1 Message Date
Liezl Puzon
2b13f3c036 Support BPE end of word marker suffix in fairseq noising module
Summary:
There are 2 ways to implement BPE:
1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
2. use a end of word marker suffix to indicate that there is no more subtokens left in the word

This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.

Reviewed By: xianxl

Differential Revision: D12919428

fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
2018-11-06 20:40:36 -08:00
Liezl Puzon
b1521f962e Refactor fairseq/test_noising with a word shuffle helper function (#340)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/340

This allows us to do a lot less copy paste when adding new word shuffle function tests

Reviewed By: xianxl

Differential Revision: D12810304

fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70
2018-11-01 17:13:05 -07:00
Liezl Puzon
0b05467dd8 Black formatting in fairseq/test_noising (#341)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/341

Use black formatting in test_noising.py

Reviewed By: xianxl

Differential Revision: D12810285

fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297
2018-11-01 17:13:05 -07:00
Myle Ott
5bbd148e6e Fix tests + style nits + Python 3.5 compat
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336

Differential Revision: D12876709

Pulled By: myleott

fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86
2018-11-01 01:28:30 -07:00
Xian Li
90c01b3a0b Extend WordShuffle noising function to apply to non-bpe tokens
Summary:
We'd like to resue the noising functions and DenoisingDataset in
adversarial training. However, current noising functions assume the input are
subword tokens. The goal of this diff is to extend it so the noising can be
applied to word tokens. Since we're mostly interested in the word shuffle
noising, so I only modified the WordShuffle class.

Reviewed By: liezl200

Differential Revision: D10523177

fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
2018-10-26 18:20:11 -07:00
Deepak Gopinath
613ffeea9c Add size method to BacktranslationDataset + misc fixes (#325)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/325

RoundRobinZipDataset requires size(index) method implemented in every dataset used. Also added missing return statements in a few methods.

Reviewed By: liezl200

Differential Revision: D10457159

fbshipit-source-id: 01856eb455f2f3a21e7fb723129ff35fbe29e0ae
2018-10-22 22:27:59 -07:00
Liezl Puzon
e286243c68 Add denoising dataset for denoising autoencoder (#306)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/306

This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model.

Reviewed By: xianxl

Differential Revision: D10078981

fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c
2018-10-05 18:21:27 -07:00
Liezl Puzon
8798a24031 Have noising account for sentences with and without EOS (#305)
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/305

Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS

Reviewed By: xianxl

Differential Revision: D10114425

fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4
2018-10-05 18:21:26 -07:00
Liezl Puzon
b9e29a4711 Option to remove EOS at source in backtranslation dataset
Summary:
If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.

Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.

Reviewed By: jmp84

Differential Revision: D10157725

fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
2018-10-03 18:23:32 -07:00
Myle Ott
fc677c945e Fix proxying in DistributedFairseqModel
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302

Differential Revision: D10174608

Pulled By: myleott

fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
2018-10-03 16:22:35 -07:00
Liezl Puzon
f766c9a0d5 Pass in kwargs and SequenceGenerator class to init BacktranslationDataset
Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs.

Reviewed By: xianxl

Differential Revision: D10156552

fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
2018-10-02 18:22:27 -07:00
Liezl Puzon
86e93f2bcf Explicitly list out generation args for backtranslation dataset
Summary:
Using argparse Namespace hides the actual args that are expected and makes code harder to read.

Note the difference in style for the args list

    def __init__(
        self,
        tgt_dataset,
        tgt_dict,
        backtranslation_model,
        unkpen,
        sampling,
        beam,
        max_len_a,
        max_len_b,
    ):

instead of

    def __init__(
        self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling,
        beam,  max_len_a, max_len_b,
    ):

Reviewed By: dpacgopinath

Differential Revision: D10152331

fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
2018-10-02 15:45:38 -07:00
myleott
f8377a704c fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11 2018-09-30 12:28:20 -07:00
Myle Ott
864b89d044 Online backtranslation module
Co-authored-by: liezl200 <lie@fb.com>
2018-09-25 17:36:43 -04:00
Alexei Baevski
cfd2a3a048 core changes to support latte collab 2018-09-25 17:36:43 -04:00
Myle Ott
fbe8ce65d3 Better support for various c10d API changes 2018-09-25 17:36:43 -04:00
Myle Ott
e775877f68 Add unit test to verify reproducibility after reloading checkpoints 2018-09-25 17:36:43 -04:00
Stephen Roller
bfeb773214 Pass encoder_input to generator, rather than src_tokens/src_lengths. 2018-09-25 17:36:43 -04:00
Myle Ott
8bd8ec8fa8 Update LM test with --no-c10d 2018-09-25 17:36:43 -04:00
Myle Ott
311d2c6ca9 Revert sequence generator changes 2018-09-25 17:36:43 -04:00
Stephen Roller
e6d45d5cd7 Generator: net_input instead of manual src_tokens. 2018-09-25 17:36:43 -04:00
Myle Ott
d473620e39 Test max_positions 2018-09-03 19:15:23 -04:00
Myle Ott
0a7f9e64bb Further generalize EpochBatchIterator and move iterators into new file 2018-09-03 19:15:23 -04:00
Myle Ott
2e507d3cb4 Clean up FairseqTask so that it's easier to extend/add new tasks 2018-09-03 19:15:23 -04:00
Myle Ott
8c0ca1a0c1 Diverse Beam Search 2018-09-03 19:15:23 -04:00
alexeib
f1d81db8b7 fix tests 2018-09-03 19:15:23 -04:00
Myle Ott
ef43da72d3 Factor out search logic in SequenceGenerator 2018-09-03 19:15:23 -04:00
alexeib
0b5166db2e fix tests 2018-09-03 19:15:23 -04:00
alexeib
2dc074d8f2 add flag that allows keeping optimizer config
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
2018-09-03 19:15:23 -04:00
Alexei Baevski
885e7ec9ec character token embeddings for word level predictions 2018-09-03 19:15:23 -04:00
Myle Ott
bb5f15d137 Iterate on need_attn and fix tests 2018-07-25 07:26:08 -07:00
Myle Ott
6edf81ddfe
Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
Myle Ott
74efc21403
Fix attention order in unit tests (fixes #195) (#197) 2018-06-25 12:16:10 -04:00
Myle Ott
c6fe9fc5e0 Fix for Dictionary.finalize 2018-06-24 13:19:07 -04:00
Myle Ott
6ec5022e57 Move reorder_encoder_out to FairseqEncoder and fix non-incremental decoding 2018-06-21 14:58:50 -04:00
Myle Ott
572a1d55df
Fix --output-format raw option to preprocess.py (Fixes #188) (#190) 2018-06-21 08:19:16 -04:00
Myle Ott
bfcc6ec739 Fix bidirectional lstm 2018-06-15 13:05:23 -06:00
Myle Ott
e89329d665 Updates for latest PyTorch 2018-06-15 13:05:22 -06:00
Myle Ott
ff68a9ef50 Add FairseqTask
A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.

Changes:
- Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
- Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
- Remove LEFT_PAD_* constants and make them configurable per task
2018-06-15 13:05:22 -06:00
Myle Ott
16a72b4dd1 Add more integration tests (LM, stories, transformer, lstm) 2018-06-15 13:05:20 -06:00
Myle Ott
736fbee2a1 Suppress stdout in test_train 2018-06-15 13:05:20 -06:00
Myle Ott
cf1c64a5f7 Nits 2018-06-15 13:05:19 -06:00
alexeib
7d5604024b record end_of_epoch in checkpoint 2018-06-15 13:05:17 -06:00
alexeib
978c125aee fix restoring from middle of epoch; fix defaulting transformer dropout params 2018-06-15 13:05:17 -06:00
alexeib
4c2ef2de74 Conv lm implementation
This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
2018-06-15 13:05:16 -06:00
Myle Ott
ae2585d9fd Fix tests 2018-06-15 13:05:15 -06:00
Myle Ott
8afb77612c Fix tests 2018-06-15 13:05:11 -06:00
Myle Ott
ec0031df7b
Merge internal changes (#163) 2018-05-24 13:38:12 -04:00
Myle Ott
d3795d6cd1
Merge internal changes (#136)
Changes:
- 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
- c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
- 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
- small bugfixes for distributed training, LSTM, inverse square root LR scheduler
2018-04-02 10:13:07 -04:00
Myle Ott
e73fddf453 Filter padding properly in LabelSmoothedCrossEntropyCriterion (#229) 2018-03-05 14:20:29 -08:00
Myle Ott
6e4d370af9
More updates for PyTorch (#114) 2018-03-01 14:04:08 -05:00
Myle Ott
9438019ff0 Refactor incremental generation to be more explicit and less magical (#222) 2018-02-27 14:28:24 -08:00
Myle Ott
0d90e35f3b More unit test fixes 2018-02-27 14:28:24 -08:00
Myle Ott
29c8274128 Fix tests and flake8 2018-02-27 14:28:24 -08:00
Myle Ott
6641520612
fairseq-py goes distributed (#106)
This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.

Changes:
- c7033ef: add support for distributed training! See updated README for usage.
- e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
- 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
- 90c2973 and 1da6265: improve unit test coverage
2018-02-27 17:09:42 -05:00
Myle Ott
e1f49695ee Rename LabelSmoothedCrossEntropy to LabelSmoothedNLLLoss 2017-11-08 08:01:31 -07:00
Myle Ott
6e4b7e22ee Refactor model definitions
* Move some functionality out of FConvModel into FairseqModel base class
* Move incremental decoding functionality into FairseqIncrementalDecoder module
* Refactor positional embeddings to be more specific to FConvModel
2017-11-08 07:59:22 -07:00
Sam Gross
ae0c05d920 Fix call ordering to ATen addmm and sum (#22) 2017-10-11 10:14:19 -04:00
Sergey Edunov
e734b0fa58 Initial commit 2017-09-14 17:22:43 -07:00