fairseq/tests
Liezl Puzon b9e29a4711 Option to remove EOS at source in backtranslation dataset
Summary:
If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.

Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.

Reviewed By: jmp84

Differential Revision: D10157725

fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
2018-10-03 18:23:32 -07:00
..
__init__.py fairseq-py goes distributed (#106) 2018-02-27 17:09:42 -05:00
test_average_checkpoints.py Merge internal changes (#136) 2018-04-02 10:13:07 -04:00
test_backtranslation_dataset.py Option to remove EOS at source in backtranslation dataset 2018-10-03 18:23:32 -07:00
test_binaries.py Fix proxying in DistributedFairseqModel 2018-10-03 16:22:35 -07:00
test_character_token_embedder.py fix tests 2018-09-03 19:15:23 -04:00
test_convtbc.py Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
test_dictionary.py Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
test_iterators.py Further generalize EpochBatchIterator and move iterators into new file 2018-09-03 19:15:23 -04:00
test_label_smoothing.py Add FairseqTask 2018-06-15 13:05:22 -06:00
test_noising.py fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11 2018-09-30 12:28:20 -07:00
test_reproducibility.py Add unit test to verify reproducibility after reloading checkpoints 2018-09-25 17:36:43 -04:00
test_sequence_generator.py Online backtranslation module 2018-09-25 17:36:43 -04:00
test_sequence_scorer.py Add FairseqTask 2018-06-15 13:05:22 -06:00
test_train.py core changes to support latte collab 2018-09-25 17:36:43 -04:00
test_utils.py Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
utils.py Online backtranslation module 2018-09-25 17:36:43 -04:00