fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 20:10:40 +03:00

Author	SHA1	Message	Date
Myle Ott	a48f235636	Apply black+isort (#1357 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357 Reviewed By: alexeib Differential Revision: D24377772 fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c	2020-10-18 18:14:51 -07:00
Chen Liu	1b749f4a34	Deprecate the SequenceGenerator with the Scripted vision (#1120 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940 Deprecate the SequenceGenerator in Fairseq with the Scripted vision. Pass all integration unit tests - Copy ScriptSequenceGenerator to SequenceGenerator: - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment) - Add support for other EnsembleModels as input arg (marked with the inline comment) - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel - Add `test_fb_ensemble_model` to test folk/join feature - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross - Modified SequenceGenerator initialization the interface - Clear up the codes: delete unused functions `get_normalized_probs` and `_decode` Reland reverted diff D20685075 Reviewed By: cndn Differential Revision: D20895977 fbshipit-source-id: 424ee318e67d5d6ffed3edb92c7fa78485ba34af	2020-04-07 13:28:30 -07:00
Aapo Kyrola	966436403e	Revert D20685075: Deprecate the SequenceGenerator with the Scripted vision Differential Revision: D20685075 Original commit changeset: 046b76874465 fbshipit-source-id: 7ec2a2ca3b90251a560e2323c22b52ec7436fecb	2020-04-07 00:59:53 -07:00
Chen Liu	bc93681348	Deprecate the SequenceGenerator with the Scripted vision (#1120 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940 Deprecate the SequenceGenerator in Fairseq with the Scripted vision. Pass all integration unit tests - Copy ScriptSequenceGenerator to SequenceGenerator: - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment) - Add support for other EnsembleModels as input arg (marked with the inline comment) - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel - Add `test_fb_ensemble_model` to test folk/join feature - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross - Modified SequenceGenerator initialization the interface - Clear up the codes: delete unused functions `get_normalized_probs` and `_decode` Reviewed By: myleott Differential Revision: D20685075 fbshipit-source-id: 046b76874465a70d8118a97ad670311c6ce1d1c8	2020-04-06 17:47:47 -07:00
Aleksandra Piktus	fab2e86e51	Add a diverse beam search variant to sequence_generator.py (#953 ) Summary: This PR implements a new generation strategy that we experimented with in project Pinocchio (https://github.com/fairinternal/Pinocchio), see the paper submission in: https://fburl.com/hduj2me7. Specifically in this PR: - added a Diverse Beam Search variant as described in https://arxiv.org/abs/1611.08562 - moved the Search object generation out of `sequence_generation.py`, which allows for limiting the number of kwargs passes around - made sure the above changes are backward compatible based on grep - P124083926 - added test cases covering these scenarios Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/953 Test Plan: - `python -m unittest tests.test_binaries -v`- including added test cases, see issues below for some details - `python -m unittest tests.test_sequence_generator -v` - including added test cases - tested locally in conjunction with the Pinocchio repo - grepped for all instantiations of `SequenceGeneration`, made sure they're backward compatible # Issues - when I try to run all tests with `python -m unittest tests.test_binaries -v` command, the execution gets stuck on `test_binaries.TestTranslation.test_generation` - the test otherwise passes without problems when ran individually. Is this a known problem? - discovered T59235948 - assigned to fairseq oncall Reviewed By: myleott, fabiopetroni Differential Revision: D19142394 Pulled By: ola13 fbshipit-source-id: d24543424c14a9537e7b6485951d9f841da62b07	2020-01-06 08:24:02 -08:00
Myle Ott	e75cff5f2c	Relicense fairseq under MIT license (#786 ) Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034	2019-07-30 07:48:23 -07:00
Peng-Jen Chen	d7e19573fa	Back translation + denoising in MultilingualTranslation task (#620 ) Summary: - Add language token to MultilingualTranslation task - Add back translation and denoising loss to MultilingualTranslation task Pull Request resolved: https://github.com/pytorch/fairseq/pull/620 Reviewed By: liezl200 Differential Revision: D14756873 Pulled By: pipibjc fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7	2019-04-10 10:56:51 -07:00
Myle Ott	b65c579bed	Modularize generate.py (#351 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/351 This makes it easier for tasks to plugin to generate.py/interactive.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/520 Differential Revision: D14183881 Pulled By: myleott fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33	2019-02-22 10:08:52 -08:00
Myle Ott	3c19878f71	Refactor BacktranslationDataset to be more reusable (#354 ) Summary: - generalize AppendEosDataset -> TransformEosDataset - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead) - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself Pull Request resolved: https://github.com/pytorch/fairseq/pull/354 Reviewed By: liezl200 Differential Revision: D12970233 Pulled By: myleott fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36	2018-11-25 21:26:03 -08:00
Liezl Puzon	b9e29a4711	Option to remove EOS at source in backtranslation dataset Summary: If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation. If we don't want our parallel data to have EOS at the end of source, we remove the EOS at the end of the generated source dialect backtranslation. Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating. We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects. Reviewed By: jmp84 Differential Revision: D10157725 fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb	2018-10-03 18:23:32 -07:00
Liezl Puzon	f766c9a0d5	Pass in kwargs and SequenceGenerator class to init BacktranslationDataset Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs. Reviewed By: xianxl Differential Revision: D10156552 fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c	2018-10-02 18:22:27 -07:00
Liezl Puzon	86e93f2bcf	Explicitly list out generation args for backtranslation dataset Summary: Using argparse Namespace hides the actual args that are expected and makes code harder to read. Note the difference in style for the args list def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): instead of def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): Reviewed By: dpacgopinath Differential Revision: D10152331 fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6	2018-10-02 15:45:38 -07:00
Myle Ott	864b89d044	Online backtranslation module Co-authored-by: liezl200 <lie@fb.com>	2018-09-25 17:36:43 -04:00

13 Commits