fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

History

Sara Hanson a03fe6faf3 Implement sparse transformer fixed attention pattern (#804 ) Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746 Pull Request resolved: https://github.com/pytorch/fairseq/pull/894 Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values. Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary. Reviewed By: borguz Differential Revision: D16042988 fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5		2019-07-22 16:42:55 -07:00
..
__init__.py	fairseq-py goes distributed (#106 )	2018-02-27 17:09:42 -05:00
test_average_checkpoints.py	Implement reducing footprint of average checkpoint correctly (#747 )	2019-05-24 12:12:24 -07:00
test_backtranslation_dataset.py	Back translation + denoising in MultilingualTranslation task (#620 )	2019-04-10 10:56:51 -07:00
test_binaries.py	Move Masked LM components to legacy/ -- new ones are coming	2019-07-21 19:38:00 -07:00
test_character_token_embedder.py	fix tests	2018-09-03 19:15:23 -04:00
test_concat_dataset.py	Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730 )	2019-05-20 11:31:53 -07:00
test_convtbc.py	Remove more Variable() calls (#198 )	2018-06-25 12:23:04 -04:00
test_dictionary.py	Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541 )	2019-02-28 09:19:12 -08:00
test_iterators.py	Further generalize EpochBatchIterator and move iterators into new file	2018-09-03 19:15:23 -04:00
test_label_smoothing.py	Add FairseqTask	2018-06-15 13:05:22 -06:00
test_memory_efficient_fp16.py	Fix resuming training when using --memory-efficient-fp16	2019-06-23 14:19:16 -07:00
test_multi_corpus_sampled_dataset.py	rm default_key from MultiCorpusSampledDataset	2019-05-14 16:45:21 -07:00
test_noising.py	Refactor BacktranslationDataset to be more reusable (#354 )	2018-11-25 21:26:03 -08:00
test_reproducibility.py	Merge internal changes (#654 )	2019-04-29 19:50:58 -07:00
test_sequence_generator.py	Nucleus (top-P) sampling (#710 )	2019-07-17 06:21:33 -07:00
test_sequence_scorer.py	Modularize generate.py (#351 )	2019-02-22 10:08:52 -08:00
test_sparse_multihead_attention.py	Implement sparse transformer fixed attention pattern (#804 )	2019-07-22 16:42:55 -07:00
test_token_block_dataset.py	Improve init speed of TokenBlockDataset and EpochBatchIterator	2019-05-07 07:08:53 -07:00
test_train.py	Add --reset-dataloader	2019-05-30 11:41:40 -07:00
test_utils.py	Simplify and generalize utils.make_positions	2019-04-15 07:32:11 -07:00
utils.py	Updates to model API (#561 )	2019-05-15 07:12:41 -07:00