fairseq/tests
Xing Zhou e46b924dea Nucleus (top-P) sampling (#710)
Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
2019-07-17 06:21:33 -07:00
..
__init__.py fairseq-py goes distributed (#106) 2018-02-27 17:09:42 -05:00
test_average_checkpoints.py Implement reducing footprint of average checkpoint correctly (#747) 2019-05-24 12:12:24 -07:00
test_backtranslation_dataset.py Back translation + denoising in MultilingualTranslation task (#620) 2019-04-10 10:56:51 -07:00
test_binaries.py Nucleus (top-P) sampling (#710) 2019-07-17 06:21:33 -07:00
test_character_token_embedder.py fix tests 2018-09-03 19:15:23 -04:00
test_concat_dataset.py Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730) 2019-05-20 11:31:53 -07:00
test_convtbc.py Remove more Variable() calls (#198) 2018-06-25 12:23:04 -04:00
test_dictionary.py Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541) 2019-02-28 09:19:12 -08:00
test_iterators.py Further generalize EpochBatchIterator and move iterators into new file 2018-09-03 19:15:23 -04:00
test_label_smoothing.py Add FairseqTask 2018-06-15 13:05:22 -06:00
test_memory_efficient_fp16.py Fix resuming training when using --memory-efficient-fp16 2019-06-23 14:19:16 -07:00
test_multi_corpus_sampled_dataset.py rm default_key from MultiCorpusSampledDataset 2019-05-14 16:45:21 -07:00
test_noising.py Refactor BacktranslationDataset to be more reusable (#354) 2018-11-25 21:26:03 -08:00
test_reproducibility.py Merge internal changes (#654) 2019-04-29 19:50:58 -07:00
test_sequence_generator.py Nucleus (top-P) sampling (#710) 2019-07-17 06:21:33 -07:00
test_sequence_scorer.py Modularize generate.py (#351) 2019-02-22 10:08:52 -08:00
test_token_block_dataset.py Improve init speed of TokenBlockDataset and EpochBatchIterator 2019-05-07 07:08:53 -07:00
test_train.py Add --reset-dataloader 2019-05-30 11:41:40 -07:00
test_utils.py Simplify and generalize utils.make_positions 2019-04-15 07:32:11 -07:00
utils.py Updates to model API (#561) 2019-05-15 07:12:41 -07:00