Commit Graph

13 Commits

Author SHA1 Message Date
Myle Ott
a48f235636 Apply black+isort (#1357)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357

Reviewed By: alexeib

Differential Revision: D24377772

fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c
2020-10-18 18:14:51 -07:00
Chen Liu
1b749f4a34 Deprecate the SequenceGenerator with the Scripted vision (#1120)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940

Deprecate the SequenceGenerator in Fairseq with the Scripted vision.

Pass all integration unit tests

- Copy ScriptSequenceGenerator to SequenceGenerator:
  - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment)
   - Add support for other EnsembleModels as input arg (marked with the inline comment)
 - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel
   - Add `test_fb_ensemble_model` to test folk/join feature
   - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross
- Modified SequenceGenerator initialization the interface
- Clear up the codes: delete unused functions `get_normalized_probs` and `_decode`

Reland reverted diff D20685075

Reviewed By: cndn

Differential Revision: D20895977

fbshipit-source-id: 424ee318e67d5d6ffed3edb92c7fa78485ba34af
2020-04-07 13:28:30 -07:00
Aapo Kyrola
966436403e Revert D20685075: Deprecate the SequenceGenerator with the Scripted vision
Differential Revision:
D20685075

Original commit changeset: 046b76874465

fbshipit-source-id: 7ec2a2ca3b90251a560e2323c22b52ec7436fecb
2020-04-07 00:59:53 -07:00
Chen Liu
bc93681348 Deprecate the SequenceGenerator with the Scripted vision (#1120)
Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1120

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1940

Deprecate the SequenceGenerator in Fairseq with the Scripted vision.

Pass all integration unit tests

- Copy ScriptSequenceGenerator to SequenceGenerator:
  - Modified the forward_decoder to fix bug when using adaptive_softmax in `get_prob_normalize` (marked with the inline comment)
   - Add support for other EnsembleModels as input arg (marked with the inline comment)
 - Add `FBEnsembleModelWithFork` to support folk/join in ensemblemodel
   - Add `test_fb_ensemble_model` to test folk/join feature
   - Still have bugs in folk/join feature when running in the Fairseq interface (like generation and interactive). Need further investigation P128130029. cc cndn, jhcross
- Modified SequenceGenerator initialization the interface
- Clear up the codes: delete unused functions `get_normalized_probs` and `_decode`

Reviewed By: myleott

Differential Revision: D20685075

fbshipit-source-id: 046b76874465a70d8118a97ad670311c6ce1d1c8
2020-04-06 17:47:47 -07:00
Aleksandra Piktus
fab2e86e51 Add a diverse beam search variant to sequence_generator.py (#953)
Summary:
This PR implements a new generation strategy that we experimented with in project Pinocchio (https://github.com/fairinternal/Pinocchio), see the paper submission in: https://fburl.com/hduj2me7.

Specifically in this PR:
- added a Diverse Beam Search variant as described in https://arxiv.org/abs/1611.08562
- moved the Search object generation out of `sequence_generation.py`, which allows for limiting the number of kwargs passes around
- made sure the above changes are backward compatible based on grep - P124083926
- added test cases covering these scenarios
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/953

Test Plan:
- `python -m unittest tests.test_binaries -v`- including added test cases, see issues below for some details
- `python -m unittest tests.test_sequence_generator -v` - including added test cases
- tested locally in conjunction with the Pinocchio repo
- grepped for all instantiations of `SequenceGeneration`, made sure they're backward compatible

# Issues
- when I try to run all tests with `python -m unittest tests.test_binaries -v` command, the execution gets stuck on `test_binaries.TestTranslation.test_generation` - the test otherwise passes without problems when ran individually. Is this a known problem?
- discovered T59235948 - assigned to fairseq oncall

Reviewed By: myleott, fabiopetroni

Differential Revision: D19142394

Pulled By: ola13

fbshipit-source-id: d24543424c14a9537e7b6485951d9f841da62b07
2020-01-06 08:24:02 -08:00
Myle Ott
e75cff5f2c Relicense fairseq under MIT license (#786)
Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034
2019-07-30 07:48:23 -07:00
Peng-Jen Chen
d7e19573fa Back translation + denoising in MultilingualTranslation task (#620)
Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7
2019-04-10 10:56:51 -07:00
Myle Ott
b65c579bed Modularize generate.py (#351)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/351

This makes it easier for tasks to plugin to generate.py/interactive.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/520

Differential Revision: D14183881

Pulled By: myleott

fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33
2019-02-22 10:08:52 -08:00
Myle Ott
3c19878f71 Refactor BacktranslationDataset to be more reusable (#354)
Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
2018-11-25 21:26:03 -08:00
Liezl Puzon
b9e29a4711 Option to remove EOS at source in backtranslation dataset
Summary:
If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.

Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.

Reviewed By: jmp84

Differential Revision: D10157725

fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
2018-10-03 18:23:32 -07:00
Liezl Puzon
f766c9a0d5 Pass in kwargs and SequenceGenerator class to init BacktranslationDataset
Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs.

Reviewed By: xianxl

Differential Revision: D10156552

fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
2018-10-02 18:22:27 -07:00
Liezl Puzon
86e93f2bcf Explicitly list out generation args for backtranslation dataset
Summary:
Using argparse Namespace hides the actual args that are expected and makes code harder to read.

Note the difference in style for the args list

    def __init__(
        self,
        tgt_dataset,
        tgt_dict,
        backtranslation_model,
        unkpen,
        sampling,
        beam,
        max_len_a,
        max_len_b,
    ):

instead of

    def __init__(
        self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling,
        beam,  max_len_a, max_len_b,
    ):

Reviewed By: dpacgopinath

Differential Revision: D10152331

fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
2018-10-02 15:45:38 -07:00
Myle Ott
864b89d044 Online backtranslation module
Co-authored-by: liezl200 <lie@fb.com>
2018-09-25 17:36:43 -04:00