fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 20:10:40 +03:00

Author	SHA1	Message	Date
Marco Gaido	4ec169b988	Fix max_position resolution with tuples having len > 2 (#2028 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/2027 . ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2028 Reviewed By: ngoyal2707 Differential Revision: D21134466 Pulled By: myleott fbshipit-source-id: 070d7f971bc8d88ec1ca43d52797e2f0b07fb6af	2020-04-21 06:01:14 -07:00
Xianfeng Rui	57526c6343	Update Fairseq LSTM to jitable version (#2016 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2016 It is to update Fairseq LSTM to jitable version Reviewed By: cndn Differential Revision: D20937370 fbshipit-source-id: 26f677fcb58bbeaa507d303e9a81060ff78f0502	2020-04-16 15:49:56 -07:00
Nayan Singhal	89e75fa315	Fix BMUF using 1 GPU Summary: With 1 GPU, BMUF is no longer required. Instead, it just works like a simple model training. Add unit test too for Single GPU BMUF Reviewed By: jay-mahadeokar Differential Revision: D21033060 fbshipit-source-id: 9030187c05d49548222c8d1e2fe9534a6c6c4389	2020-04-16 11:25:35 -07:00
James Cross	c4697e83cb	TorchScript support for AANTransformer Summary: Moving ``_test_save_and_load()` up top top-level for possible reuse across classes. Reviewed By: cndn Differential Revision: D20971566 fbshipit-source-id: b9d9c554d03f26cd43eee9f209e1c1367679af72	2020-04-10 18:23:50 -07:00
Ning Dong	b142b7d9ec	Script _no_repeat_ngram in fb_simple_sequence_generator (#1963 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1963 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1128 It's a common issue that short inputs (< 5 tokens) get repeated due to default length constraint (max_len_a=1.1, max_len_b=5) https://fb.workplace.com/groups/2286753504877951/permalink/2674177509468880/. In the future we want to use no_ngram_repeat to handle the issue. The functionality is in sequence generator but it needs to be scripted for production use. Reviewed By: liuchen9494 Differential Revision: D20801865 fbshipit-source-id: c3085f19921adb85415636d16ce31e3826642335	2020-04-10 14:44:42 -07:00
Ning Dong	08691f8d0b	Support quantization in Fairseq Sequence generator Summary: The fix in MHA is suggested by driazati, to avoid JIT compilation for if branch in MHA forward when in scripting. Without this quantization wouldn't work. Details in https://fb.workplace.com/groups/2240361332735959/permalink/626166461295703/ Reviewed By: jhcross Differential Revision: D20881076 fbshipit-source-id: b50347b45cd7dbdef02ac7b71316ba734019f57e	2020-04-08 17:48:54 -07:00
Chen Liu	d37529ed23	Script reoder_incremental_state in fairseq baseline model (#1127 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1127 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1953 Script the `reorder_incremental_states` in the base FairseqModel Remove the overwrite scriptable `reorder_incremental_states` in the TransformerModel Change the decoder_len, since len(Tuple) is supported in Script Relanded reverted diff D20797390 Reviewed By: myleott Differential Revision: D20896200 fbshipit-source-id: cc4ae34f89f16007656cce6ec6f7e01b13899278	2020-04-07 15:01:31 -07:00
Chen Liu	1b749f4a34	Deprecate the SequenceGenerator with the Scripted vision (#1120 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940 Deprecate the SequenceGenerator in Fairseq with the Scripted vision. Pass all integration unit tests - Copy ScriptSequenceGenerator to SequenceGenerator: - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment) - Add support for other EnsembleModels as input arg (marked with the inline comment) - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel - Add `test_fb_ensemble_model` to test folk/join feature - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross - Modified SequenceGenerator initialization the interface - Clear up the codes: delete unused functions `get_normalized_probs` and `_decode` Reland reverted diff D20685075 Reviewed By: cndn Differential Revision: D20895977 fbshipit-source-id: 424ee318e67d5d6ffed3edb92c7fa78485ba34af	2020-04-07 13:28:30 -07:00
Aapo Kyrola	966436403e	Revert D20685075: Deprecate the SequenceGenerator with the Scripted vision Differential Revision: D20685075 Original commit changeset: 046b76874465 fbshipit-source-id: 7ec2a2ca3b90251a560e2323c22b52ec7436fecb	2020-04-07 00:59:53 -07:00
Aapo Kyrola	8a528888e4	Revert D20797390: Script reoder_incremental_state in fairseq baseline model Differential Revision: D20797390 Original commit changeset: ab29874973ad fbshipit-source-id: efd2d720c96ee90d1e8dc36178e04f0bf5510278	2020-04-07 00:59:48 -07:00
Chen Liu	d369c88019	Script reoder_incremental_state in fairseq baseline model (#1127 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1127 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1953 Script the `reorder_incremental_states` in the base FairseqModel Remove the overwrite scriptable `reorder_incremental_states` in the TransformerModel Change the decoder_len, since len(Tuple) is supported in Script Reviewed By: myleott Differential Revision: D20797390 fbshipit-source-id: ab29874973adc5dbd556c591942a0e071c81fc52	2020-04-06 20:40:40 -07:00
Chen Liu	bc93681348	Deprecate the SequenceGenerator with the Scripted vision (#1120 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940 Deprecate the SequenceGenerator in Fairseq with the Scripted vision. Pass all integration unit tests - Copy ScriptSequenceGenerator to SequenceGenerator: - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment) - Add support for other EnsembleModels as input arg (marked with the inline comment) - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel - Add `test_fb_ensemble_model` to test folk/join feature - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross - Modified SequenceGenerator initialization the interface - Clear up the codes: delete unused functions `get_normalized_probs` and `_decode` Reviewed By: myleott Differential Revision: D20685075 fbshipit-source-id: 046b76874465a70d8118a97ad670311c6ce1d1c8	2020-04-06 17:47:47 -07:00
Louis MARTIN	18831f9f83	Fix validation happening twice at the end of epoch (#1934 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes validation happening twice at the end of epoch after refactor. Spotted by freewym here: `b5dad3b7e0 (r38103577)` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1934 Reviewed By: myleott Differential Revision: D20724205 Pulled By: louismartin fbshipit-source-id: 8c26c39b9904508780e8542813797c8e1306ca80	2020-04-03 16:38:39 -07:00
Anchit Gupta	f6f092f489	Make TransformerDecoupled model scriptable (#1125 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1125 Pull Request resolved: https://github.com/pytorch/translate/pull/695 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1927 - Switches the model to the scripted sequence generator recently implemented in fairseq. Involved making the input/ouput format of this model to conform to that in Fairseq TransformerEncoder/Decoder - Modify the `EncoderOut` format for fairseq transformer and added optional fields needed for copy ptr decoder - Switches to using WordEmbedding directly instead of the non scriptable EmbeddingList for src/trg embedding layer - Small assorted syntactic changes to make it jit scriptable - Adds a torchscriptify method for this model. Preliminarily latency seems similar to the unexported model. Also verified that the outputs match - Currently the Roberta decoupled model is not scriptable because the base TransformerSentenceEncoder it is based on is not scriptable. We can look at adding that later Reviewed By: einolghozati Differential Revision: D20687247 fbshipit-source-id: 8232972bba2f1b2df4100f3c1776b6bad08a71db	2020-04-01 17:53:49 -07:00
Myle Ott	f2ae57908b	Fix tests (#1110 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1110 Reviewed By: ngoyal2707 Differential Revision: D20649232 Pulled By: myleott fbshipit-source-id: 55bc18284ac792012aaa794d5102c877ff781f8c	2020-03-26 07:59:28 -07:00
James Cross	fd76cb5b41	TestEncoder to return type EncoderOut (#1894 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1894 Having a uniform return type for `FairseqEncoder` makes these test models function more similarly to real models. Reviewed By: myleott, cndn Differential Revision: D20596971 fbshipit-source-id: a744614c015af9b150f2b0ae8381b1368556f738	2020-03-23 16:10:02 -07:00
David Příhoda	42f65d6577	Support multiple regression targets in sentence prediction (#1831 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1830 Adds tests for RoBERTa (masked_lm, classification, single regression, multiple regression) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1831 Reviewed By: ngoyal2707 Differential Revision: D20446010 Pulled By: myleott fbshipit-source-id: 9f37bcedf0910d85446245d71bc234bc74c62da5	2020-03-21 16:55:26 -07:00
Myle Ott	5028ed1b6b	Reduce device-to-host transfers (#1082 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1082 Differential Revision: D20365765 Pulled By: myleott fbshipit-source-id: 7b6c14303b46b42db1a1e279c84dbe9cb2cf72f2	2020-03-11 05:57:16 -07:00
Marco Gaido	431d604f69	Fix generation with encoder which return an output of shape different from the input (#1792 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1791. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1792 Reviewed By: jmp84 Differential Revision: D20322704 Pulled By: myleott fbshipit-source-id: 3cfa1bddda06b966e9dc9bc8ff183009d844b23c	2020-03-10 11:51:08 -07:00
Myle Ott	937535dba0	Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073 ) Summary: [This commit](`dd1298e15f`) made it so that duplicate entries in a dictionary are ignored. Unfortunately the Camembert model depends on overwriting `<unk>`, `<s>` and `</s>`. The proposed solution here is to allow the dictionary to have entries like: ``` <unk> 999 #fairseq:overwrite <s> 999 #fairseq:overwrite </s> 999 #fairseq:overwrite , 999 ▁de 999 . 999 (...) ``` These will preserve the old overwriting behavior. Thus we can release a new `camembert.v0.tar.gz` with a dictionary like above and it works. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1073 Reviewed By: kahne Differential Revision: D20284569 Pulled By: myleott fbshipit-source-id: bf78fbff13c94bf8a6485cbdda62305ddc30c056	2020-03-08 06:52:00 -07:00
Elijah Rippeth	46b773a393	refactor namespaces in criterion interface (#1729 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1672 in part (part 1: [context](https://github.com/pytorch/fairseq/pull/1714#issuecomment-587507040)) ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1729 Differential Revision: D20049353 Pulled By: myleott fbshipit-source-id: 732077a1cc339c9f7ebe26dae42a7e8d7b5a07b4	2020-03-04 16:43:59 -08:00
Myle Ott	aa79bb9c37	Use 1-based indexing for epochs everywhere (#1053 ) Summary: We are somewhat inconsistent in whether we're using 0-based or 1-based indexing for epochs. This should fix things to be 0-based internally, with logging and checkpoint naming still using 1-based indexing. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1053 Reviewed By: spencerp Differential Revision: D20160715 Pulled By: myleott fbshipit-source-id: 4ed94f9c371e1bfe29bcfa087fa6756507d6e627	2020-03-04 16:37:24 -08:00
alexeib	3335de5f44	add vq-wav2vec (#1029 ) Summary: sanitized vq-wav2vec implementation. i will also add docs to this. i have a fixed-up checkpoint that this code can load and verified that it produces same results as what we used in paper Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1029 Differential Revision: D20129246 Pulled By: alexeib fbshipit-source-id: f72f455e0c309168e644ab86ec18c768c308da98	2020-02-29 18:25:34 -08:00
Chen Liu	fdfdbec9e2	Rewrite the unit test of sequence generator Summary: 1. Overwrite the base class function `get_normalized_probs` in scriptable TransformerModel 2. Change the unit test setup to match the Transformer decoder output format 3. Initialze the buffer in the simple sequence generator [WIP] 1. It is the initial step to script the sequence generator from simple scriptable version. 4. Refactor the unit test of simple sequence generator. 5. Change the input format of simple sequence generator and unit test. Reviewed By: myleott Differential Revision: D20017859 fbshipit-source-id: a3e93b57c22e49840e460469fa2b1c530346886d	2020-02-26 11:09:20 -08:00
Myle Ott	8845dcf5ff	Move MoE files into examples (#1040 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1040 Differential Revision: D20030279 Pulled By: myleott fbshipit-source-id: 76b48a62409020039225cf98e8fcf7a494d0b7f8	2020-02-21 14:13:37 -08:00
Myle Ott	12ab22e06c	Fix deprecation warnings in unit tests (#1043 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1043 Differential Revision: D20030274 Pulled By: myleott fbshipit-source-id: 34962c1caaf3879af8b527a266852e443b59ffe4	2020-02-21 11:44:28 -08:00
Ning Dong	3df10a9529	Add save and load tests to fairseq export test (#1653 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1653 Earlier we had some issues at pickling. Type information gets lost. Fixed in https://github.com/pytorch/pytorch/pull/32569. These save_and_load tests are added for protection in the future. Reviewed By: myleott Differential Revision: D19435988 fbshipit-source-id: 560ea65ed3493bebcf394327818364b3fcd6fc92	2020-01-30 16:14:35 -08:00
Ning Dong	a07cb6f404	Script Fairseq transformer (#1011 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1011 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1620 Make Fairseq transformer scriptable. Discussion points on possible code refactoring: (1) Original decoder output is a tuple (x, {"attn": attn, "inner_states": inner_states}). TorchScript does not support dictionary with values of different types (attn: Tensor, inner_states: List[Tensor]). Current workaround is to use [attn] for attention field and access via output["attn"][0] in downstream. This is currently used in fairspeq custom transformer code. Another (maybe) cleaner alternative is to use namedtuple for decoder output but involves tons of downstream changes too. (2) Currently TorchScript doesn't support **kwargs. Some unused arguments might get passed in due to polymorphism. Now the only workaround I can think of is to add possible unused arguments, (e.g. line 666 in transformer.py) Reviewed By: myleott Differential Revision: D19234599 fbshipit-source-id: db3dd364ecf3ae14fb7ac8c0928bd0ebe250f19d	2020-01-30 15:59:15 -08:00
Myle Ott	61aad8f9cd	Force certain optimizers to set --fp16-no-flatten-grads (#1010 ) Summary: When training with `--fp16` we usually flatten the grads since it's faster. But flat grads are not semantically equivalent for certain optimizers (e.g., Adafactor, LAMB), thus the user needed to be aware of this and set `--fp16-no-flatten-grads`. Let's raise a RuntimeError in this case instead. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1010 Differential Revision: D19575773 Pulled By: myleott fbshipit-source-id: bac99c3026f9870e6127e0fa55f70e8a3e4507dc	2020-01-28 08:02:30 -08:00
Myle Ott	88185fcc3f	Cleanup new incremental state API (#1005 ) Summary: * Now that we have `FairseqIncrementalState`, we can move `get_incremental_state` and `set_incremental_state` as methods in that class, instead of having the helper functions in `utils.py`. I think this will eventually help with type checking too. * The incremental ID logic was overly complicated, we can just use `uuid` to generate a unique ID for every instance. * Add missing `with_incremental_state` to light/dynamic conv modules. * Add additional unit test: `test_incremental_state_multihead_attention` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1005 Test Plan: * unit tests Also confirmed this matches master: ``` $ python generate.py ~/data/data-bin/wmt16_en_de_bpe32k --path /checkpoint/myleott/s3/models/wmt16.en-de.joined-dict.transformer/model.pt --beam 4 --lenpen 0.6 --remove-bpe --quiet (...) 2020-01-22 09:53:38 \| INFO \| fairseq_cli.generate \| Generate test with beam=4: BLEU4 = 29.28, 60.8/35.1/22.8/15.3 (BP=0.997, ratio=0.997, syslen=62859, reflen=63078) ``` Reviewed By: cndn Differential Revision: D19517908 Pulled By: myleott fbshipit-source-id: a406490e342d0d30a9231bf823d3350999bda4c0	2020-01-27 10:25:33 -08:00
Joshua Meier	9f4256edf6	Standalone LSTM decoder language model (#934 ) Summary: Currently, the LSTM models in Fairseq master can only be used in an encoder/decoder setting, for example, in `class LSTMModel(FairseqEncoderDecoderModel)`. This PR adds a standalone LSTM decoder language model. Changes: - adds support for `LSTMDecoder` in cases where an encoder is not present, for instance, where `encoder_output_units=0`. - fixes bugs in `LSTMDecoder` that only become apparent when using it in a standalone fashion, for example, not handling `src_lengths` as an optional argument. - adds `class LSTMLanguageModel(FairseqLanguageModel)` for training LSTM language models. - tests for the `LSTMLanguageModel`. Changes to the `LSTMDecoder` are handled by existing test cases. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/934 Reviewed By: myleott Differential Revision: D18816310 Pulled By: joshim5 fbshipit-source-id: 4773695a7f5d36aa773da8a45db2e02f76c968a9	2020-01-24 13:16:22 -08:00
Elijah Rippeth	f1d856e006	fix Windows build (#1007 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1007 # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1622 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1631 Differential Revision: D19555401 Pulled By: myleott fbshipit-source-id: c62dfc109e09a7d732a9fc73ac6feef63a8dd341	2020-01-24 10:32:20 -08:00
Myle Ott	f4a9bc2ea6	Clean up tests Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1004 Differential Revision: D19517900 Pulled By: myleott fbshipit-source-id: a588efeabd3119dd058067e82d1b21e4d81ae218	2020-01-22 11:29:20 -08:00
Ning Dong	89a2a0ccde	Script SinusoidalPositionalEmbedding (#683 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/683 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1612 Make SinusoidalPositionalEmbedding scriptable. Mostly adding types. The only change that affects lots of downstream code is to have max_positions as member variable instead of method. Reviewed By: myleott Differential Revision: D18924939 fbshipit-source-id: 2b6486563e9ec5cc34bcf11acdff9054658f4674	2020-01-22 10:55:28 -08:00
Ning Dong	4e48c4ae5d	Script MultiheadAttention (#1002 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1002 Pull Request resolved: https://github.com/pytorch/translate/pull/681 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1524 Make fairseq MultiheadAttention scriptable. Looking for feedbacks. 1. Add types 2. Move incremental state management logic from util functions to initializers. TorchScript in general doesn't support global dict. As a result modules with multihead attention in it would assign itself fairseq_instance_id in the initializer. 3. There might be opportunities to make assertions and annotations cleaner. Reviewed By: myleott Differential Revision: D18772594 fbshipit-source-id: 377aef4bbb7ef51da5b6bac9a87a6f7b03b16fe1	2020-01-21 18:35:28 -08:00
Myle Ott	9f961964aa	Fix logging of training sets (fixes #1632 ) (#1634 ) Summary: * fix: mid-epoch validation metrics were previously polluting training metrics * fix: mid-epoch metrics were not properly saved/restored in checkpoints * added tests, both for metrics and for mid-epoch reproducibility Pull Request resolved: https://github.com/pytorch/fairseq/pull/1634 Differential Revision: D19470714 Pulled By: myleott fbshipit-source-id: 491fa8d830b653cdd6a86095645aabcac758d214	2020-01-20 16:34:33 -08:00
Jiatao Gu	60fbf64f30	Add --eval-bleu for translation Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/989 Reviewed By: MultiPath Differential Revision: D19411162 Pulled By: myleott fbshipit-source-id: 74842f0174f58e39a13fb90f3cc1170c63bc89be	2020-01-17 12:17:46 -08:00
Myle Ott	b488e1fe56	Reverse symlinks in root and fairseq_cli (2/3) Summary: This is needed to support other build environments (e.g., Windows) Reviewed By: ngoyal2707 Differential Revision: D19409984 fbshipit-source-id: e970510781abf92f1b02d0961bc30e1210b524dd	2020-01-17 08:26:20 -08:00
Myle Ott	fb76dac1c4	Switch to Python logging (+ lint) (#1627 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1627 Python logging offers a number of benefits, such as logging timestamps, better cross-library compatibility, ability to add multiple output handlers, etc. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/646 Reviewed By: spencerp Differential Revision: D15815620 Pulled By: myleott fbshipit-source-id: 5e64e9929b5e4b9dd5bb49bcdf7c510631907134	2020-01-16 16:14:45 -08:00
Aleksandra Piktus	fab2e86e51	Add a diverse beam search variant to sequence_generator.py (#953 ) Summary: This PR implements a new generation strategy that we experimented with in project Pinocchio (https://github.com/fairinternal/Pinocchio), see the paper submission in: https://fburl.com/hduj2me7. Specifically in this PR: - added a Diverse Beam Search variant as described in https://arxiv.org/abs/1611.08562 - moved the Search object generation out of `sequence_generation.py`, which allows for limiting the number of kwargs passes around - made sure the above changes are backward compatible based on grep - P124083926 - added test cases covering these scenarios Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/953 Test Plan: - `python -m unittest tests.test_binaries -v`- including added test cases, see issues below for some details - `python -m unittest tests.test_sequence_generator -v` - including added test cases - tested locally in conjunction with the Pinocchio repo - grepped for all instantiations of `SequenceGeneration`, made sure they're backward compatible # Issues - when I try to run all tests with `python -m unittest tests.test_binaries -v` command, the execution gets stuck on `test_binaries.TestTranslation.test_generation` - the test otherwise passes without problems when ran individually. Is this a known problem? - discovered T59235948 - assigned to fairseq oncall Reviewed By: myleott, fabiopetroni Differential Revision: D19142394 Pulled By: ola13 fbshipit-source-id: d24543424c14a9537e7b6485951d9f841da62b07	2020-01-06 08:24:02 -08:00
Myle Ott	fb2d29d2aa	Fix multilingual translation Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/972 Differential Revision: D19265750 Pulled By: myleott fbshipit-source-id: 4c432b0d3616a6194c2c0f61f97012937d22db6f	2020-01-06 07:13:10 -08:00
Peng-Jen Chen	4c5934ac61	Fix multilingual translation errors and add unit test Summary: - Fix github issue [1393](https://github.com/pytorch/fairseq/issues/1393), [1315](https://github.com/pytorch/fairseq/issues/1315). - Add unit test to cover training, validation and generation for multilingual model to make sure they can run without problem. (didn't test the correctness) Reviewed By: lematt1991 Differential Revision: D19149575 fbshipit-source-id: 9ec9000d037cc5c3bd8457feb527f2305375a442	2019-12-19 07:08:59 -08:00
Myle Ott	dfde36bc66	Create build.yml Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1515 Differential Revision: D19151562 Pulled By: myleott fbshipit-source-id: 426eca1e449cac914d49877678323a6487c0adbe	2019-12-17 20:45:11 -08:00
Sujit Verma	28b131359b	Added unit test for PathManager file io (with or without fvcore). Summary: Added unit test for PathManager file io (with or without fvcore). Reviewed By: theweiho Differential Revision: D18880067 fbshipit-source-id: 969c2be90415d22041b8276b7a5ff264571561d0	2019-12-09 14:19:51 -08:00
Jiatao Gu	b0fb74f143	REFACTOR: NAT Implementation (#925 ) Summary: This diff mainly first contains the implementation for NAT-CRF model: - Fast Structured Decoding for Sequence Models (NAT-CRF, Sun et al., 2019) We implemented a dynamic CRF module and incorporate it into the implementation of vanilla NAT model. In order to reproduce the performance on paper. We implemented the length beam as well as reranking from a learned autoregressive model in the iterative-refinement-generator; We also implemented a new ensemble code which enables to do ensemble for all NAT models, not only Levenshtein Transformer itself. We refactor all the codes and move the models into ``fairseq/models/nat``. Finally, we updated the README.md for NAT models. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/925 Differential Revision: D18738085 Pulled By: MultiPath fbshipit-source-id: 4e421c5d52d2456fbe99e7863d715c756b1fd49b	2019-12-03 18:39:28 -08:00
Myle Ott	1c56594001	Fix lightconv_lm and add test (#932 ) Summary: Fixes https://github.com/fairinternal/fairseq-py/issues/536 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/932 Differential Revision: D18783032 Pulled By: myleott fbshipit-source-id: a520faccc20be78296a228214923ee1495fb536f	2019-12-03 09:25:52 -08:00
Myle Ott	cb6c67bcdb	Make torch.hub interface automatically apply tokenization and BPE Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926 Differential Revision: D18685772 Pulled By: myleott fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0	2019-11-26 07:49:37 -08:00
Myle Ott	e26ee47a8c	Fix LM generation and add unit test Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/896 Differential Revision: D18250948 Pulled By: myleott fbshipit-source-id: 7a515311e18795670b29f5e24eeba7619a625da7	2019-11-13 14:37:12 -08:00
Myle Ott	27568a7ebe	Merge TracingCompliantTransformer and regular Transformer, fix NAT tests Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/899 Differential Revision: D18373060 Pulled By: myleott fbshipit-source-id: bb5510ec15799a0a10a7c0669e76d8200e1ba479	2019-11-13 09:12:13 -08:00
Spencer Poff	68dd3e171b	Fixing key padding mask during transformer generation Summary: https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size. This diff adds empty columns in such a case to ensure key_padding_mask is a usable size. Reviewed By: myleott Differential Revision: D18224313 fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6	2019-11-05 06:50:53 -08:00
Nayan Singhal	b5f41f828b	Add Unit test cases for BMUF Summary: This unit test guards the bmuf code. change: 1. distributed_init assumes we are always using cuda device which is not the case if you are using "gloo" backend on CPU machine. Reviewed By: jay-mahadeokar Differential Revision: D17821391 fbshipit-source-id: 28e1bb39f7a4889b1dc6bd636b7c499e55bfc69a	2019-10-15 09:59:36 -07:00
Sarthak Garg	1c66792948	Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859	2019-09-30 06:57:32 -07:00
Stephan Peitz	4ac2c5f2cc	Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097 ) Summary: This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019). Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality. More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097 Differential Revision: D17653168 Pulled By: myleott fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c	2019-09-29 05:09:42 -07:00
Changhan Wang	86857a58bf	Levenshtein Transformer paper code Summary: Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006) * Added Levenshtein Transformer model, task and criterion class * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines * Add an option for prepending BOS to dictionary class and translation task class Reviewed By: myleott Differential Revision: D17297372 fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7	2019-09-27 13:58:45 -07:00
Jerry Ma	a8a85c2676	Add dataset class for weighted sampling with replacement. (#861 ) Summary: As discussed with Naman earlier today. Weighted sampling with replacement can be done on a per-epoch basis using `set_epoch()` functionality, which generates the samples as a function of random seed and epoch. Additionally, `FairseqTask` needs to set the starting epoch for the dataset at the very beginning of iterator construction. Not yet implemented is the per-epoch iterator construction, which is necessary to actually regenerate the batches for each epoch. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861 Differential Revision: D17460687 Pulled By: jma127 fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658	2019-09-19 10:36:00 -07:00
Myle Ott	6ce55e4b01	Small fixes Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835 Differential Revision: D16904038 Pulled By: myleott fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a	2019-08-19 15:08:25 -07:00
Myle Ott	7c89e13f64	Fix tests Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822 Differential Revision: D16800078 Pulled By: myleott fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7	2019-08-13 20:36:00 -07:00
Myle Ott	d015d23a1f	Add fairseq-validate Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765 Differential Revision: D16763357 Pulled By: myleott fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503	2019-08-13 13:07:04 -07:00
Dmytro Okhonko	72f9364cc6	Asr initial push (#810 ) Summary: Initial code for speech recognition task. Right now only one ASR model added - https://arxiv.org/abs/1904.11660 unit test testing: python -m unittest discover tests also run model training with this code and obtained 5.0 test_clean \| 13.4 test_other on librispeech with pytorch/audio features Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810 Reviewed By: cpuhrsch Differential Revision: D16706659 Pulled By: okhonko fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514	2019-08-08 02:46:12 -07:00
Myle Ott	4abadbdf77	Fix sampling with beam>1 Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792 Differential Revision: D16591987 Pulled By: myleott fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694	2019-08-01 07:34:06 -07:00
Myle Ott	e75cff5f2c	Relicense fairseq under MIT license (#786 ) Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034	2019-07-30 07:48:23 -07:00
Sara Hanson	a03fe6faf3	Implement sparse transformer fixed attention pattern (#804 ) Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746 Pull Request resolved: https://github.com/pytorch/fairseq/pull/894 Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values. Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary. Reviewed By: borguz Differential Revision: D16042988 fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5	2019-07-22 16:42:55 -07:00
Myle Ott	47fd985269	Move Masked LM components to legacy/ -- new ones are coming Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740 Differential Revision: D16377797 Pulled By: myleott fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f	2019-07-21 19:38:00 -07:00
Xing Zhou	e46b924dea	Nucleus (top-P) sampling (#710 ) Summary: Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p. To test it: python generate.py ~myleott/data/data-bin/wmt17_zh_en_full/ --path ~myleott/zh_en/model.pt --remove-bpe --nbest 5 --beam 5 --sampling --sampling-topp 0.3 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710 Test Plan: python generate.py ~myleott/data/data-bin/wmt17_zh_en_full/ --path ~myleott/zh_en/model.pt --remove-bpe --nbest 5 --beam 5 --sampling --sampling-topp 0.3 python tests/test_sequence_generator.py python tests/test_binaries.py Reviewed By: myleott Differential Revision: D16286688 Pulled By: xingz9 fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f	2019-07-17 06:21:33 -07:00
Myle Ott	efb4345042	Fix resuming training when using --memory-efficient-fp16 Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678 Differential Revision: D15956712 Pulled By: myleott fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396	2019-06-23 14:19:16 -07:00
Bairen Yi	a8f28ecb63	Python3.5 compat (#794 ) Summary: See #467. Ping myleott to review. This is a work-related contribution. Ping lark to review. Pull Request resolved: https://github.com/pytorch/fairseq/pull/794 Differential Revision: D15756816 Pulled By: myleott fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b	2019-06-11 04:10:08 -07:00
Matt Le	fa7791df9a	Change encoder_learned_pos default back to True for xlm_base Reviewed By: pipibjc Differential Revision: D15635402 fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde	2019-06-06 07:38:17 -07:00
Matt Le	5408bc0821	Fix loading XLM pretraining Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False Reviewed By: liezl200 Differential Revision: D15629061 fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb	2019-06-04 15:36:55 -07:00
Myle Ott	ffc3bb5806	Add --reset-dataloader Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613 Differential Revision: D15541384 Pulled By: myleott fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d	2019-05-30 11:41:40 -07:00
Yongqiang Wang	8ce2c35d8e	Implement reducing footprint of average checkpoint correctly (#747 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/747 In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging is not Implemented correctly when it comes to shared parameters. This diff has the right Implementation and a test case to guard future change. Reviewed By: myleott Differential Revision: D15402943 fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522	2019-05-24 12:12:24 -07:00
Ning Dong	ee28411f76	Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/730 Pull Request resolved: https://github.com/pytorch/translate/pull/528 Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch. Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset. Reviewed By: liezl200 Differential Revision: D15260872 fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f	2019-05-20 11:31:53 -07:00
Myle Ott	3bfbb49ba5	Clean up sharded train iterator Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586 Differential Revision: D15372949 Pulled By: myleott fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da	2019-05-16 21:03:08 -07:00
Myle Ott	dffb167449	Updates to model API (#561 ) Summary: - `FairseqModel` -> `FairseqEncoderDecoderModel` - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer` - `encoder_out_dict` -> `encoder_out` - rm unused `remove_head` functions - update docs Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561 Differential Revision: D15271142 Pulled By: myleott fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264	2019-05-15 07:12:41 -07:00
Myle Ott	7432130eb0	rm default_key from MultiCorpusSampledDataset Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575 Differential Revision: D15318004 Pulled By: myleott fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9	2019-05-14 16:45:21 -07:00
Dmytro Okhonko	cd1e5c09fa	Move save/load checkpoint functions to utils Summary: Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py Move `get_perplexity` from train.py to utils.py. This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library. Reviewed By: myleott Differential Revision: D15289607 fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1	2019-05-14 12:57:12 -07:00
Jingfei Du	93ec8d0bc6	expose arguments for bias_kv and zero_attn for masked_lm Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them. Reviewed By: myleott Differential Revision: D15266154 fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8	2019-05-08 17:48:29 -07:00
Davide Caroselli	a1c997bd9a	Memory-Mapped IndexedDataset implementation (#589 ) Summary: Following discussion in https://github.com/pytorch/fairseq/issues/574: - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder - Update scripts/read_binarized.py to support new MMapIndexedDataset - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate) - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists() Pull Request resolved: https://github.com/pytorch/fairseq/pull/589 Differential Revision: D14597128 Pulled By: myleott fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e	2019-05-07 07:13:52 -07:00
Myle Ott	e4edf27a97	Improve init speed of TokenBlockDataset and EpochBatchIterator Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/704 Differential Revision: D15221549 Pulled By: myleott fbshipit-source-id: b0021acdc2d7792ce51421f1432e1f2bd8218f7b	2019-05-07 07:08:53 -07:00
Naman Goyal	0add50c2e0	allowing sharded dataset (#696 ) Summary: Co-authored-by: myleott <myleott@fb.com> Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner. For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`. myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets. Pull Request resolved: https://github.com/pytorch/fairseq/pull/696 Differential Revision: D15214049 fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013	2019-05-06 15:27:17 -07:00
Myle Ott	96ac28d33d	Fix and generalize --temperature option (#508 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/508 The previous version applied the temperature after the softmax. Fix that, and also generalize so it works with other search approaches. Pull Request resolved: https://github.com/pytorch/fairseq/pull/694 Differential Revision: D15175160 Pulled By: myleott fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0	2019-05-04 16:39:32 -07:00
Myle Ott	d45db80431	Merge internal changes (#654 ) Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e	2019-04-29 19:50:58 -07:00
Liezl Puzon	57b6a6dbfb	Fix fairseq unittest timeouts (#667 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/667 Use smaller models so that unittests won't timeout Reviewed By: pipibjc Differential Revision: D15056894 fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013	2019-04-25 08:39:36 -07:00
Liezl Puzon	5008fd4e5a	XLM for NMT: option to only load encoder or decoder (#666 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/666 Option to load the XLM weights into only the encoder or the decoder Reviewed By: pipibjc Differential Revision: D14881004 fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac	2019-04-25 05:57:02 -07:00
Liezl Puzon	8da9b1c530	Load a XLM model into transformer encoder / decoder for MT training (#629 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/629 Use GeLU as an alternate activation layer for ReLU. Reviewed By: lematt1991 Differential Revision: D14689851 fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15	2019-04-25 05:57:02 -07:00
Ning Dong	90d6eac2b3	Enable custom sampling strategy in MultiCorpusSampledDataset (#639 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/639 Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously. Reviewed By: liezl200 Differential Revision: D14965774 fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e	2019-04-16 23:29:02 -07:00
Myle Ott	e12e1d254c	Simplify and generalize utils.make_positions Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625 Differential Revision: D14822123 Pulled By: myleott fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103	2019-04-15 07:32:11 -07:00
Peng-Jen Chen	d7e19573fa	Back translation + denoising in MultilingualTranslation task (#620 ) Summary: - Add language token to MultilingualTranslation task - Add back translation and denoising loss to MultilingualTranslation task Pull Request resolved: https://github.com/pytorch/fairseq/pull/620 Reviewed By: liezl200 Differential Revision: D14756873 Pulled By: pipibjc fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7	2019-04-10 10:56:51 -07:00
Dmytro Okhonko	860010e907	Handle 3+ dimensional input in sequence_generator + nits Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths. Reviewed By: myleott Differential Revision: D14420044 fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015	2019-03-12 15:12:21 -07:00
Dmytro Okhonko	d17fa85135	Adadelta optimizer Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta Reviewed By: myleott Differential Revision: D14418635 fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f	2019-03-12 15:12:21 -07:00
Vladimir Karpukhin	f296824f40	Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/541 Just a combo of a stacked pair D14057943 & D14176011, Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo Differential Revision: D14251048 fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8	2019-02-28 09:19:12 -08:00
Myle Ott	bc919276a1	Add test for mixture of experts Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543 Differential Revision: D14259481 Pulled By: myleott fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e	2019-02-28 08:56:24 -08:00
Myle Ott	44d27e645b	Add Tensorboard support (#530 ) Summary: Enable with the `--tensorboard-logdir` option. Pull Request resolved: https://github.com/pytorch/fairseq/pull/530 Differential Revision: D14218430 Pulled By: myleott fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053	2019-02-25 18:40:18 -08:00
Myle Ott	b65c579bed	Modularize generate.py (#351 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/351 This makes it easier for tasks to plugin to generate.py/interactive.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/520 Differential Revision: D14183881 Pulled By: myleott fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33	2019-02-22 10:08:52 -08:00
Davide Caroselli	bbb4120b00	Support custom Dictionary implementations in 'preprocess.py' (#448 ) Summary: The `preprocess.py` script has been refactored in order to: 1. Use the `options` module for command line arguments parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries) 2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step. Pull Request resolved: https://github.com/pytorch/fairseq/pull/448 Differential Revision: D13674819 Pulled By: myleott fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822	2019-02-01 09:45:59 -08:00
Myle Ott	3dce7c9fc0	Add --input option to interactive.py to support reading from file Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484 Differential Revision: D13880636 Pulled By: myleott fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e	2019-01-30 09:46:05 -08:00
Myle Ott	42be3ebd41	Merge internal changes (#483 ) Summary: Changelog: - `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU. - `0d76427`: fix assertion error when training language model with dataset containing empty sentences - minor bug and style fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/483 Differential Revision: D13867899 Pulled By: myleott fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda	2019-01-30 09:01:10 -08:00
Myle Ott	b41c74dc5b	Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473 ) Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9	2019-01-25 15:40:26 -08:00
Myle Ott	7633129ba8	Merge internal changes (#283 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5	2019-01-04 20:03:19 -08:00
Myle Ott	3c19878f71	Refactor BacktranslationDataset to be more reusable (#354 ) Summary: - generalize AppendEosDataset -> TransformEosDataset - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead) - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself Pull Request resolved: https://github.com/pytorch/fairseq/pull/354 Reviewed By: liezl200 Differential Revision: D12970233 Pulled By: myleott fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36	2018-11-25 21:26:03 -08:00
Myle Ott	0864a9c49d	Fix build for docs Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372 Differential Revision: D13114426 Pulled By: myleott fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3	2018-11-18 08:32:59 -08:00
Liezl Puzon	2b13f3c036	Support BPE end of word marker suffix in fairseq noising module Summary: There are 2 ways to implement BPE: 1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word 2. use a end of word marker suffix to indicate that there is no more subtokens left in the word This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data. Reviewed By: xianxl Differential Revision: D12919428 fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a	2018-11-06 20:40:36 -08:00
Liezl Puzon	b1521f962e	Refactor fairseq/test_noising with a word shuffle helper function (#340 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/340 This allows us to do a lot less copy paste when adding new word shuffle function tests Reviewed By: xianxl Differential Revision: D12810304 fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70	2018-11-01 17:13:05 -07:00
Liezl Puzon	0b05467dd8	Black formatting in fairseq/test_noising (#341 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/341 Use black formatting in test_noising.py Reviewed By: xianxl Differential Revision: D12810285 fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297	2018-11-01 17:13:05 -07:00
Myle Ott	5bbd148e6e	Fix tests + style nits + Python 3.5 compat Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336 Differential Revision: D12876709 Pulled By: myleott fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86	2018-11-01 01:28:30 -07:00
Xian Li	90c01b3a0b	Extend WordShuffle noising function to apply to non-bpe tokens Summary: We'd like to resue the noising functions and DenoisingDataset in adversarial training. However, current noising functions assume the input are subword tokens. The goal of this diff is to extend it so the noising can be applied to word tokens. Since we're mostly interested in the word shuffle noising, so I only modified the WordShuffle class. Reviewed By: liezl200 Differential Revision: D10523177 fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab	2018-10-26 18:20:11 -07:00
Deepak Gopinath	613ffeea9c	Add size method to BacktranslationDataset + misc fixes (#325 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/325 RoundRobinZipDataset requires size(index) method implemented in every dataset used. Also added missing return statements in a few methods. Reviewed By: liezl200 Differential Revision: D10457159 fbshipit-source-id: 01856eb455f2f3a21e7fb723129ff35fbe29e0ae	2018-10-22 22:27:59 -07:00
Liezl Puzon	e286243c68	Add denoising dataset for denoising autoencoder (#306 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/306 This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model. Reviewed By: xianxl Differential Revision: D10078981 fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c	2018-10-05 18:21:27 -07:00
Liezl Puzon	8798a24031	Have noising account for sentences with and without EOS (#305 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/305 Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS Reviewed By: xianxl Differential Revision: D10114425 fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4	2018-10-05 18:21:26 -07:00
Liezl Puzon	b9e29a4711	Option to remove EOS at source in backtranslation dataset Summary: If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation. If we don't want our parallel data to have EOS at the end of source, we remove the EOS at the end of the generated source dialect backtranslation. Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating. We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects. Reviewed By: jmp84 Differential Revision: D10157725 fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb	2018-10-03 18:23:32 -07:00
Myle Ott	fc677c945e	Fix proxying in DistributedFairseqModel Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302 Differential Revision: D10174608 Pulled By: myleott fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff	2018-10-03 16:22:35 -07:00
Liezl Puzon	f766c9a0d5	Pass in kwargs and SequenceGenerator class to init BacktranslationDataset Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs. Reviewed By: xianxl Differential Revision: D10156552 fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c	2018-10-02 18:22:27 -07:00
Liezl Puzon	86e93f2bcf	Explicitly list out generation args for backtranslation dataset Summary: Using argparse Namespace hides the actual args that are expected and makes code harder to read. Note the difference in style for the args list def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): instead of def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): Reviewed By: dpacgopinath Differential Revision: D10152331 fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6	2018-10-02 15:45:38 -07:00
myleott	f8377a704c	fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11	2018-09-30 12:28:20 -07:00
Myle Ott	864b89d044	Online backtranslation module Co-authored-by: liezl200 <lie@fb.com>	2018-09-25 17:36:43 -04:00
Alexei Baevski	cfd2a3a048	core changes to support latte collab	2018-09-25 17:36:43 -04:00
Myle Ott	fbe8ce65d3	Better support for various c10d API changes	2018-09-25 17:36:43 -04:00
Myle Ott	e775877f68	Add unit test to verify reproducibility after reloading checkpoints	2018-09-25 17:36:43 -04:00
Stephen Roller	bfeb773214	Pass encoder_input to generator, rather than src_tokens/src_lengths.	2018-09-25 17:36:43 -04:00
Myle Ott	8bd8ec8fa8	Update LM test with --no-c10d	2018-09-25 17:36:43 -04:00
Myle Ott	311d2c6ca9	Revert sequence generator changes	2018-09-25 17:36:43 -04:00
Stephen Roller	e6d45d5cd7	Generator: net_input instead of manual src_tokens.	2018-09-25 17:36:43 -04:00
Myle Ott	d473620e39	Test max_positions	2018-09-03 19:15:23 -04:00
Myle Ott	0a7f9e64bb	Further generalize EpochBatchIterator and move iterators into new file	2018-09-03 19:15:23 -04:00
Myle Ott	2e507d3cb4	Clean up FairseqTask so that it's easier to extend/add new tasks	2018-09-03 19:15:23 -04:00
Myle Ott	8c0ca1a0c1	Diverse Beam Search	2018-09-03 19:15:23 -04:00
alexeib	f1d81db8b7	fix tests	2018-09-03 19:15:23 -04:00
Myle Ott	ef43da72d3	Factor out search logic in SequenceGenerator	2018-09-03 19:15:23 -04:00
alexeib	0b5166db2e	fix tests	2018-09-03 19:15:23 -04:00
alexeib	2dc074d8f2	add flag that allows keeping optimizer config adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags	2018-09-03 19:15:23 -04:00
Alexei Baevski	885e7ec9ec	character token embeddings for word level predictions	2018-09-03 19:15:23 -04:00
Myle Ott	bb5f15d137	Iterate on need_attn and fix tests	2018-07-25 07:26:08 -07:00
Myle Ott	6edf81ddfe	Remove more Variable() calls (#198 )	2018-06-25 12:23:04 -04:00
Myle Ott	74efc21403	Fix attention order in unit tests (fixes #195 ) (#197 )	2018-06-25 12:16:10 -04:00
Myle Ott	c6fe9fc5e0	Fix for Dictionary.finalize	2018-06-24 13:19:07 -04:00
Myle Ott	6ec5022e57	Move reorder_encoder_out to FairseqEncoder and fix non-incremental decoding	2018-06-21 14:58:50 -04:00
Myle Ott	572a1d55df	Fix `--output-format raw` option to preprocess.py (Fixes #188 ) (#190 )	2018-06-21 08:19:16 -04:00
Myle Ott	bfcc6ec739	Fix bidirectional lstm	2018-06-15 13:05:23 -06:00
Myle Ott	e89329d665	Updates for latest PyTorch	2018-06-15 13:05:22 -06:00
Myle Ott	ff68a9ef50	Add FairseqTask A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss. Changes: - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator. - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position - Remove LEFT_PAD_* constants and make them configurable per task	2018-06-15 13:05:22 -06:00
Myle Ott	16a72b4dd1	Add more integration tests (LM, stories, transformer, lstm)	2018-06-15 13:05:20 -06:00
Myle Ott	736fbee2a1	Suppress stdout in test_train	2018-06-15 13:05:20 -06:00
Myle Ott	cf1c64a5f7	Nits	2018-06-15 13:05:19 -06:00
alexeib	7d5604024b	record end_of_epoch in checkpoint	2018-06-15 13:05:17 -06:00
alexeib	978c125aee	fix restoring from middle of epoch; fix defaulting transformer dropout params	2018-06-15 13:05:17 -06:00
alexeib	4c2ef2de74	Conv lm implementation This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf There are 3 modes for constructing batches: - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper - eos: one sentence per sample (skip blank lines) some results: GCNN-13 - GBW - 37.46 GCNN-14B - GBW - 33.88 GCNN-8 - Wiki103 - 43.76 GCNN-14 - Wiki103 - 35.66 train: python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500 eval: python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)3+(850,1)+(850,5)4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'	2018-06-15 13:05:16 -06:00
Myle Ott	ae2585d9fd	Fix tests	2018-06-15 13:05:15 -06:00
Myle Ott	8afb77612c	Fix tests	2018-06-15 13:05:11 -06:00
Myle Ott	ec0031df7b	Merge internal changes (#163 )	2018-05-24 13:38:12 -04:00
Myle Ott	d3795d6cd1	Merge internal changes (#136 ) Changes: - `7d19e36`: Add `--sampling` flag to generate.py to sample instead of doing beam search - `c777340`: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model - `3ea882c`: Add `--max-update` option to train.py to stop training after a given number of updates - small bugfixes for distributed training, LSTM, inverse square root LR scheduler	2018-04-02 10:13:07 -04:00
Myle Ott	e73fddf453	Filter padding properly in LabelSmoothedCrossEntropyCriterion (#229 )	2018-03-05 14:20:29 -08:00
Myle Ott	6e4d370af9	More updates for PyTorch (#114 )	2018-03-01 14:04:08 -05:00
Myle Ott	9438019ff0	Refactor incremental generation to be more explicit and less magical (#222 )	2018-02-27 14:28:24 -08:00
Myle Ott	0d90e35f3b	More unit test fixes	2018-02-27 14:28:24 -08:00
Myle Ott	29c8274128	Fix tests and flake8	2018-02-27 14:28:24 -08:00
Myle Ott	6641520612	fairseq-py goes distributed (#106 ) This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes. Changes: - `c7033ef`: add support for distributed training! See updated README for usage. - `e016299`: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc. - `154e440`: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf - `90c2973` and `1da6265`: improve unit test coverage	2018-02-27 17:09:42 -05:00
Myle Ott	e1f49695ee	Rename LabelSmoothedCrossEntropy to LabelSmoothedNLLLoss	2017-11-08 08:01:31 -07:00
Myle Ott	6e4b7e22ee	Refactor model definitions * Move some functionality out of FConvModel into FairseqModel base class * Move incremental decoding functionality into FairseqIncrementalDecoder module * Refactor positional embeddings to be more specific to FConvModel	2017-11-08 07:59:22 -07:00
Sam Gross	ae0c05d920	Fix call ordering to ATen addmm and sum (#22 )	2017-10-11 10:14:19 -04:00
Sergey Edunov	e734b0fa58	Initial commit	2017-09-14 17:22:43 -07:00

... 2 3 4 5 6 ...

309 Commits