fairseq/tests
Sara Hanson a03fe6faf3 Implement sparse transformer fixed attention pattern (#804)
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
2019-07-22 16:42:55 -07:00
..
__init__.py fairseq-py goes distributed (#106) 2018-02-27 17:09:42 -05:00
test_average_checkpoints.py Implement reducing footprint of average checkpoint correctly (#747) 2019-05-24 12:12:24 -07:00
test_backtranslation_dataset.py Back translation + denoising in MultilingualTranslation task (#620) 2019-04-10 10:56:51 -07:00
test_binaries.py Move Masked LM components to legacy/ -- new ones are coming 2019-07-21 19:38:00 -07:00
test_character_token_embedder.py fix tests 2018-09-03 19:15:23 -04:00
test_concat_dataset.py Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730) 2019-05-20 11:31:53 -07:00
test_convtbc.py Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
test_dictionary.py Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541) 2019-02-28 09:19:12 -08:00
test_iterators.py Further generalize EpochBatchIterator and move iterators into new file 2018-09-03 19:15:23 -04:00
test_label_smoothing.py Add FairseqTask 2018-06-15 13:05:22 -06:00
test_memory_efficient_fp16.py Fix resuming training when using --memory-efficient-fp16 2019-06-23 14:19:16 -07:00
test_multi_corpus_sampled_dataset.py rm default_key from MultiCorpusSampledDataset 2019-05-14 16:45:21 -07:00
test_noising.py Refactor BacktranslationDataset to be more reusable (#354) 2018-11-25 21:26:03 -08:00
test_reproducibility.py Merge internal changes (#654) 2019-04-29 19:50:58 -07:00
test_sequence_generator.py Nucleus (top-P) sampling (#710) 2019-07-17 06:21:33 -07:00
test_sequence_scorer.py Modularize generate.py (#351) 2019-02-22 10:08:52 -08:00
test_sparse_multihead_attention.py Implement sparse transformer fixed attention pattern (#804) 2019-07-22 16:42:55 -07:00
test_token_block_dataset.py Improve init speed of TokenBlockDataset and EpochBatchIterator 2019-05-07 07:08:53 -07:00
test_train.py Add --reset-dataloader 2019-05-30 11:41:40 -07:00
test_utils.py Simplify and generalize utils.make_positions 2019-04-15 07:32:11 -07:00
utils.py Updates to model API (#561) 2019-05-15 07:12:41 -07:00