fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

History

Xian Li 573c2f4b60 Opensource code for Deep Transformer with Latent Depth (#2703 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf). New features and design choices made: - New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`. - Design choice: move `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`. - New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z. - Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer. - New feature: allow multilingual_translation task to train with latent depth (results in the paper). - Design choice: - added additional arguments in the multilingual_translation task. - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder. - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703 Reviewed By: myleott Differential Revision: D24155059 Pulled By: xianxl fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668		2020-10-15 09:26:05 -07:00
..
gpu	Fix hub (#2687 )	2020-10-02 19:02:01 -07:00
speech_recognition	hydra fairseq 3 - inherit from legacy for fairseq classes	2020-09-09 17:02:13 -07:00
__init__.py	remediation of S205607	2020-07-17 17:21:51 -07:00
test_average_checkpoints.py	Small fixes	2019-08-19 15:08:25 -07:00
test_backtranslation_dataset.py	Deprecate the SequenceGenerator with the Scripted vision (#1120 )	2020-04-07 13:28:30 -07:00
test_binaries.py	Opensource code for Deep Transformer with Latent Depth (#2703 )	2020-10-15 09:26:05 -07:00
test_bmuf.py	Fix BMUF using 1 GPU	2020-04-16 11:25:35 -07:00
test_character_token_embedder.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_concat_dataset.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_constraints.py	Added constrained decoding (#1536 ) (#2402 )	2020-08-20 11:59:53 -07:00
test_convtbc.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_dictionary.py	Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073 )	2020-03-08 06:52:00 -07:00
test_export.py	hydra fairseq 3 - inherit from legacy for fairseq classes	2020-09-09 17:02:13 -07:00
test_file_io.py	Added unit test for PathManager file io (with or without fvcore).	2019-12-09 14:19:51 -08:00
test_fp16_optimizer.py	Fix hub (#2687 )	2020-10-02 19:02:01 -07:00
test_inference_dropout.py	Misc fixes (#2492 )	2020-08-20 06:42:10 -07:00
test_iterators.py	Account for checkpoint updates when calling take on CountingIterator	2020-09-04 14:26:53 -07:00
test_label_smoothing.py	speech-to-text OSS	2020-10-14 12:30:05 -07:00
test_lstm_jitable.py	hydra fairseq 3 - inherit from legacy for fairseq classes	2020-09-09 17:02:13 -07:00
test_memory_efficient_fp16.py	Clean up tests	2020-01-22 11:29:20 -08:00
test_metrics.py	Fix logging of training sets (fixes #1632 ) (#1634 )	2020-01-20 16:34:33 -08:00
test_multi_corpus_sampled_dataset.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_multihead_attention.py	Fixing key padding mask during transformer generation	2019-11-05 06:50:53 -08:00
test_noising.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_reproducibility.py	Fix validation happening twice at the end of epoch (#1934 )	2020-04-03 16:38:39 -07:00
test_resampling_dataset.py	Add dataset class for weighted sampling with replacement. (#861 )	2019-09-19 10:36:00 -07:00
test_sequence_generator.py	hydra fairseq 3 - inherit from legacy for fairseq classes	2020-09-09 17:02:13 -07:00
test_sequence_scorer.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_sparse_multihead_attention.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_token_block_dataset.py	Relicense fairseq under MIT license (#786 )	2019-07-30 07:48:23 -07:00
test_train.py	Misc fixes (#2492 )	2020-08-20 06:42:10 -07:00
test_utils.py	Updates full to no longer use deprecated integer fill_value type inference	2020-06-22 11:56:58 -07:00
utils.py	remove max_sentences from args, use batch_size instead (#1333 )	2020-10-05 19:09:01 -07:00