fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-21 22:27:16 +03:00

Author	SHA1	Message	Date
Myle Ott	b4d57c6d49	Move TPU grad reductions out of Trainer into TPUDistributedDataParallel (#1397 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1397 Data parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001` Data parallel before: ``` 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:20:14 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:20:14 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:20:14 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:20:19 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=5 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=160870, ups=4.91, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=517232, ups=15.77, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=537322, ups=16.38, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=540488, ups=16.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=543411, ups=16.57, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=540359, ups=16.47, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=533446, ups=16.26, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=539734, ups=16.46, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel after: ``` 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:03 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:03 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:03 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:08 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=157099, ups=4.79, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=560049, ups=17.08, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=558507, ups=17.03, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=514194, ups=15.68, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=552676, ups=16.85, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=546402, ups=16.66, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=508472, ups=15.5, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=552493, ups=16.84, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel command (no_c10d): `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001 --dp-backend no_c10d` Data parallel before: ``` 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:19:25 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:19:26 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:19:26 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:19:31 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:19:31 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=141659, ups=4.32, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=503762, ups=15.36, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=488599, ups=14.9, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=507855, ups=15.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=503270, ups=15.34, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=467778, ups=14.26, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=503800, ups=15.36, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=468486, ups=14.28, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Data parallel after: ``` 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:50 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:51 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:51 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:56 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=137677, ups=4.2, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=519541, ups=15.84, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=517063, ups=15.76, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=490728, ups=14.95, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=505262, ups=15.41, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=508874, ups=15.52, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=518028, ups=15.79, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=515996, ups=15.73, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Model parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm_megatron --decoder-layers 4 --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --model-parallel-size 2 --share-decoder-input-output-embed --lr 0.0001` Model parallel before: ``` 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:18:38 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:18:39 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:18:39 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:18:44 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=7 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=48371.7, ups=2.95, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72422.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=72933.5, ups=4.45, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=71974.8, ups=4.39, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=72897.8, ups=4.45, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=73044.6, ups=4.46, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73453.1, ups=4.48, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73442.6, ups=4.48, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=9 ``` Model parallel after: ``` 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:12:09 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:12:10 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:12:10 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:12:16 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=53097, ups=3.24, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72355.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=70526.4, ups=4.3, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=73063.5, ups=4.46, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=73559.4, ups=4.49, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=72693.2, ups=4.44, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73531.2, ups=4.49, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:19 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73187.6, ups=4.47, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=10 ``` Test Plan: Imported from OSS Reviewed By: ngoyal2707 Differential Revision: D24729295 Pulled By: myleott fbshipit-source-id: beee8bdece3eaa0419a2e813990420411e507c75	2020-11-05 15:29:33 -08:00
Myle Ott	dd52ed0f38	Small fixes (#1392 ) Summary: - Set default value of clip-norm back to 0.0 (disabled) - Add comment explaining that we divide loss by log(2) to covert the base - Fix `--zero-optimizer=os` (fixes #2811) - Update requirements to PyTorch >= 1.5 - Fix bug in fixed LR schedule Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1392 Reviewed By: alexeib Differential Revision: D24714231 Pulled By: myleott fbshipit-source-id: 63dc8cfc74683bbccbf05b44228014eb12ddbfc7	2020-11-03 20:45:06 -08:00
Armen Aghajanyan	f2fa07106c	RXF OS Implementation (#2455 ) Summary: ## What does this PR do? Implements R3F and R4F coming from Facebook Research: https://arxiv.org/abs/2008.03156 This code was used to generate all the results from the paper excluding probing results. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2455 Reviewed By: myleott Differential Revision: D23444863 Pulled By: AkshatSh fbshipit-source-id: b724a6d6cc9cebfdb4bd219828afbb5679f2259b	2020-10-16 14:32:12 -07:00
Xian Li	573c2f4b60	Opensource code for Deep Transformer with Latent Depth (#2703 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Opensource code for Deep Transformer with Latent Depth (https://arxiv.org/pdf/2009.13102.pdf). New features and design choices made: - New feature: allow non-residual block to be weighted by sample z (generated per batch) instead of `x = residual + x`. - Design choice: move `x = residual + x` in transformer_layer.py into a function where the subclass (with latent depth) could overwrite it to `x = residual + z*x`. - New feature: allow TransformerEncoder or TransformerDecoder to have additional logits parameters which will generate the samples z. - Design choice: added subclass LatentTransformerEncoder and LatentTransformerDecoder, which has additional attributes for the logits parameters, and instantiate the corresponding LatentTransformerEncoderLayer and LatentTransformerDecoderLayer. - New feature: allow multilingual_translation task to train with latent depth (results in the paper). - Design choice: - added additional arguments in the multilingual_translation task. - added option for multilingual_transformer to use LatentTransformerEncoder and LatentTransformerDecoder besides standard TransformerEncoder. - added option in multilingual_translation task's `train_step` to generate the samples z and compute the KL (and sparsity) loss per batch. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2703 Reviewed By: myleott Differential Revision: D24155059 Pulled By: xianxl fbshipit-source-id: f3e41639429f9664ec5565839709aa857a643668	2020-10-15 09:26:05 -07:00
Chau Tran	a2d0be4989	Add CRISS README and code to fairseq (#1344 ) Summary: # Before submitting - [N] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [Y] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [Y] Did you make sure to update the docs? - [N/A] Did you write any new necessary tests? ## What does this PR do? Add code to reproduce results from Cross-lingual Retrieval for Iterative Self-supervised Training. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1344 Test Plan: Imported from GitHub, without a `Test Plan:` line. See https://github.com/fairinternal/fairseq-py/tree/criss_pr/examples/criss Reviewed By: myleott Differential Revision: D24268469 Pulled By: chtran fbshipit-source-id: d4dd36b22bde3c364ce6e935bd39baf8f96e0735	2020-10-14 10:34:51 -07:00
Myle Ott	a524832d1d	Publish Linformer to public fairseq Summary: Initial open source release for Linformer Reviewed By: madian9 Differential Revision: D22771263 fbshipit-source-id: bf08c64c5ecb899db9da00b79d09f6308347c915	2020-09-28 15:32:20 -07:00
Myle Ott	703fd48bb1	Fix README and #2496 (#2505 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2505 Reviewed By: shruti-bh Differential Revision: D23247882 Pulled By: myleott fbshipit-source-id: 1cfc9e0128e1aa55a1aca31d8dd30f231558e70f	2020-08-20 15:44:17 -07:00
Matt Post	bd1b35d9b7	Added constrained decoding (#1536 ) (#2402 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? This PR implements constrained decoding ([Hokamp & Liu, 2017](https://www.aclweb.org/anthology/P17-1141/); [Post & Vilar, 2018](https://www.aclweb.org/anthology/N18-1119/)) with vectorization for batching ([Hu et al., 2019](https://www.aclweb.org/anthology/N19-1090/)). In addition, it add ordered constraints, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references). ### Usage and quick start It works with `fairseq-interactive` via a new command-line option: `fairseq-interactive --constraints [ordered,unordered]`, defaulting to `ordered` if nothing is provided. When active, it will split lines from STDIN on `\t`, with separate constraints each separated by a tab. For example (after downloading the [Fairseq WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md)): ```bash echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \ \| [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \ \| [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \ \| PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \ --bpe fastbpe \ --bpe-codes $modeldir/bpecodes \ --constraints \ --constraints-both -s de -t en \ --path $modeldir/model1.pt \ --max-tokens 1000 \ --beam 5 \ ``` Adding the `--constraints-both` option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!) ```text S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren . W-0 1.844 seconds C-0 hard C-0 influence H-0 -1.5333266258239746 Mach@@ ine trans@@ lation is hard to influence . D-0 -1.5333266258239746 Machine translation is hard to influence . P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511 S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren . W-0 1.844 seconds H-0 -0.3731671869754791 Mach@@ ine trans@@ lation is difficult to control . D-0 -0.3731671869754791 Machine translation is difficult to control . P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514 2020-07-31 12:17:55 \| INFO \| fairseq_cli.interactive \| Total time: 12.803 seconds; translation time: 3.688 ``` Note the new tags present in the output: * `C-#` records active constraints (after applying preprocessing) for a sentence * `W-#` reports the sentence-level translation time (a useful unrelated feature I hope you'll accept) Some unit tests are written (`fairseq/test_constraints.py`) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it. ### Implementation notes This is largely self-contained, implemented in a new `LexicallyConstrainedBeamSearch` class in `search.py`. It does require a few minimal hooks from `_generate()` in `sequence_generator.py`, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch. Addresses https://github.com/pytorch/fairseq/issues/1536. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2402 Reviewed By: alexeib Differential Revision: D23188945 Pulled By: myleott fbshipit-source-id: 9f5ed855f7a1dcf535b091c0ccf98b07fb9cbdd6	2020-08-20 11:59:53 -07:00
Myle Ott	9831634946	Misc fixes (#2448 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2448 Reviewed By: ngoyal2707 Differential Revision: D23011193 Pulled By: myleott fbshipit-source-id: 1a29481707108e4465aca78ec1581fb79f05efba	2020-08-14 10:24:51 -07:00
alexeib	621e834103	wav2vec 2.0 (#1220 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1220 Test Plan: Please see examples/wav2vec/README.md for instructions Reviewed By: edunov Differential Revision: D22707565 Pulled By: alexeib fbshipit-source-id: 0c0d4ca7acc933ef7c0062f8dce550b94e414680	2020-08-04 14:19:56 -07:00
Myle Ott	f0a61a2774	Miscellaneous fixes (#1196 ) Summary: Incorporate several fixes, incl. from OSS contributors: - fix model argument in sequence generator in semisupervised_translation.py - fix aggregate logging in semisupervised_translation.py - Fix EOS token in multilingual_denoising - Handle missing eos_idx in data_utils.collate_tokens - Better OOM handling for single-GPU training - fix prepend_bos argument in translation_from_pretrained_bart.py … - Fix eos_idx in multilingual_denoising - Small logging fixes - Fix fb_hub on PyTorch 1.6 - Better variable names - Add support for model parallel to interactive.py - Use `//` operator to fix Integer division warning - Set default `--clip-norm=0.0` - Cleanup some binaries in root directory Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1196 Reviewed By: ngoyal2707 Differential Revision: D22162202 Pulled By: myleott fbshipit-source-id: 835b0c0ad9246827f9d915fdb4e89d7b5be2475d	2020-06-24 10:08:53 -07:00
Myle Ott	145bc9de12	Several small fixes (incl. set default --data-buffer-size=10) (#2163 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2163 Reviewed By: ngoyal2707 Differential Revision: D21665601 Pulled By: myleott fbshipit-source-id: 47673ff7f07acf0002c4e28380aa08ff917618ee	2020-05-26 15:59:59 -07:00
Xutai Ma	12d5e6ff16	Monotonic multihead attention (#1707 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Add code for published paper from FB ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Still WIP jmp84 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1707 Reviewed By: jmp84 Differential Revision: D21304498 Pulled By: xutaima fbshipit-source-id: 073d522e0eeef3e02c83e4617b8e5b697ff6979b	2020-04-30 13:26:32 -07:00
Angela Fan	1c8ab79ca5	quant noise code, readme, start of adding quantization (#1896 ) Summary: FUNCTIONALITY: This diff provides two core pieces of functionality - Adds training with quantization noise from "Training with Quantization Noise for Extreme Model Compression" - controlled by the "quant_noise" and "quant_noise_block_size" parameters. Added in embeddings, attention, FFN for BERT and Transformer LM training - Adds quantization with product quantization based on code from "And the bit goes down: Revisiting the quantization of neural networks" (Stock et al, 2019). This is applied to a fairseq trained model to quantize after training. TODO: -> Pierre, look at quantization code -> int4 and int8 quantization will be added soon. EVALUATED TEST CASES: 0. Training of LM and BERT models starts from scratch with no errors -> yes 1. Retrain LM from scratch with code, no quantization, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_noise 2. Reload previously trained LM from scratch, not trained with quant noise, reproduces Wikitext-103 LM results -> yes 3. Train LM from scratch with code, no trained with quant noise, reproduces Wikitext-103 LM results -> yes, see /checkpoint/angelafan/qn_open_source_baseline 4. Train BERT model from scratch with code, no quantization, training curve looks the same as before -> yes 5. Check wps during training and wps during inference, no large change from before -> yes 6. Check structured dropout isn't being applied at eval time -> yes 7. Works in combination with LayerDrop -> yes Pull Request resolved: https://github.com/pytorch/fairseq/pull/1896 Reviewed By: myleott Differential Revision: D20609420 Pulled By: huihuifan fbshipit-source-id: 94468dd811c4caaaef46a9fab2b8d381f9d2b955	2020-04-21 09:28:56 -07:00
Naman Goyal	78a995db2f	adding readme and releasing megatron big model (#1124 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1124 Reviewed By: myleott Differential Revision: D20749898 fbshipit-source-id: 42bca96d8d65158ae858ceaa7386afedf1696ebb	2020-04-03 12:03:56 -07:00
Myle Ott	5065077dfc	Use cross entropy from apex for improved memory efficiency (#1122 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1122 Reviewed By: ngoyal2707 Differential Revision: D20745717 Pulled By: myleott fbshipit-source-id: 877a1185f17952461ef204d8ad7f05b8d37b1fd9	2020-03-31 08:59:59 -07:00
Changhan Wang	fdac9bbce1	Byte-Level BPE paper code Summary: Implemented byte-level BPE described in ["Neural Machine Translation with Byte-Level Subwords"](https://arxiv.org/abs/1909.03341) * Added bytes/characters/byte-level BPE tokenizers to fairseq.data.encoder * Added detokenization option to generate.py * Added an example under examples/byte_level_bpe * Implemented Transformer model with Bi-GRU embedding contextualization: `examples/byte_level_bpe/gru_transformer.py` Reviewed By: myleott Differential Revision: D20600963 fbshipit-source-id: 3eca4d046056c07f65333123416017a4eac04c8a	2020-03-24 07:59:05 -07:00
Myle Ott	11cc356395	fairseq requires PyTorch >= 1.4 (#1844 ) Summary: Fixes https://github.com/pytorch/fairseq/issues/1843 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1844 Differential Revision: D20468391 Pulled By: myleott fbshipit-source-id: 0b2e2ba35c94eeb49d0e6bb05a8fefa4b847f46d	2020-03-16 07:45:50 -07:00
Stefan Schweter	3dd221c90f	Readme: Fix link to mBART documentation (#1789 ) Summary: Hi, this PR updates the link to mBART documentation in main readme. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1789 Differential Revision: D20322673 Pulled By: myleott fbshipit-source-id: b59c94f49176ba5bbd664791818b5b8ce7402698	2020-03-07 10:04:46 -08:00
Yinhan Liu	5e79322b3a	open source mbart (#1033 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1033 Differential Revision: D20122520 Pulled By: yinhanliu fbshipit-source-id: e2fd93e2fa9b7a8e276acc4316a176ba3ceae4ed	2020-02-27 08:30:43 -08:00
Myle Ott	37bd90845c	Update README (#1051 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1051 Differential Revision: D20119560 Pulled By: myleott fbshipit-source-id: caf089341931990777393b916c846a332b76e9dc	2020-02-26 09:13:47 -08:00
Pengcheng YIN	9090dad8d1	Update README for apex (#1563 ) Summary: Recent releases of apex removed the `fused_adam_cuda` function used in `3f4fc50163/fairseq/optim/adam.py (L220)`. Users need to use the `--deprecated_fused_adam` option to isntall `fused_adam_cuda` # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/1563 Differential Revision: D19260517 Pulled By: myleott fbshipit-source-id: 69af015f3ef1fa85b98d138c28876ada194c9437	2020-01-02 11:37:30 -08:00
Myle Ott	6b4700cea4	Update README.md Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1521 Differential Revision: D19159323 Pulled By: myleott fbshipit-source-id: 7e3fefc29229a90bcffe8bb1c5cff3507712d94c	2019-12-18 07:04:19 -08:00
Myle Ott	dfde36bc66	Create build.yml Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1515 Differential Revision: D19151562 Pulled By: myleott fbshipit-source-id: 426eca1e449cac914d49877678323a6487c0adbe	2019-12-17 20:45:11 -08:00
Myle Ott	05514f8a82	Update README to indicate we only support Python >= 3.6 (fixes #1317 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/952 Differential Revision: D19133348 Pulled By: myleott fbshipit-source-id: 51f96ddb13386143fe0088f19f7cb0674755811f	2019-12-16 19:46:53 -08:00
Changhan Wang	7612eefc6d	add VizSeq to README Summary: add VizSeq to README Reviewed By: MultiPath Differential Revision: D18877679 fbshipit-source-id: f1de226e37b19ec967dfcec91216521d4e5b6e22	2019-12-08 11:08:46 -08:00
Myle Ott	cb6c67bcdb	Make torch.hub interface automatically apply tokenization and BPE Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926 Differential Revision: D18685772 Pulled By: myleott fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0	2019-11-26 07:49:37 -08:00
Louis Martin	b31849aa92	Camembert model and code (#904 ) Summary: Check locally that everything works fine. Model is uploaded to fbaipublicfiles. I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306.. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904 Reviewed By: ngoyal2707 Differential Revision: D18418345 Pulled By: louismartin fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17	2019-11-10 11:29:07 -08:00
Naman Goyal	a92bcdad5a	adding first version of bart code release (#902 ) Summary: This is the first version of BART code / model release. It still requires lot of clean up, instructions, making sure results are reproducible before we can release it. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902 Differential Revision: D18389535 fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46	2019-11-08 21:01:16 -08:00
ngoyal2707	e23e5eaa32	XLM-R code and model release (#900 ) Summary: TODO: 1) Need to update bibtex entry 2) Need to upload models, spm_vocab and dict.txt to public s3 location. For Future: 1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900 Reviewed By: myleott Differential Revision: D18333076 Pulled By: myleott fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c	2019-11-05 15:02:43 -08:00
alexeib	4cb895b6f6	add pre-trained wav2vec model Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884 Differential Revision: D17774515 Pulled By: alexeib fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f	2019-10-04 17:10:39 -07:00
Sarthak Garg	1c66792948	Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859	2019-09-30 06:57:32 -07:00
Changhan Wang	86857a58bf	Levenshtein Transformer paper code Summary: Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006) * Added Levenshtein Transformer model, task and criterion class * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines * Add an option for prepending BOS to dictionary class and translation task class Reviewed By: myleott Differential Revision: D17297372 fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7	2019-09-27 13:58:45 -07:00
Myle Ott	b870468689	Update READMEs Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/823 Differential Revision: D16804995 Pulled By: myleott fbshipit-source-id: abac5dc0ed6b7bfe2309ba273456e54b37340b2c	2019-08-14 08:28:36 -07:00
Myle Ott	a33ac060de	Add Commonsense QA task Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1014 Differential Revision: D16784120 Pulled By: myleott fbshipit-source-id: 946c0e33b594f8378e4ab6482ce49efcb36e1743	2019-08-13 07:52:57 -07:00
Vincent Quenneville-Belair	838e108a91	MacOS requires c++ flag (#1000 ) Summary: To install on MacOS, `-stdlib=libc++` needs to be specified. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1000 Differential Revision: D16733819 Pulled By: myleott fbshipit-source-id: 7a1ed11e2b4e1071e61c64c379c84f72e02ad2b5	2019-08-09 10:07:09 -07:00
Myle Ott	e75cff5f2c	Relicense fairseq under MIT license (#786 ) Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034	2019-07-30 07:48:23 -07:00
Xing Zhou	2fe45f09a1	Update README.md to add top-p sampling (#783 ) Summary: Update README.md to include the recently implemented top-p/nucleus sampling. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/783 Differential Revision: D16543974 Pulled By: myleott fbshipit-source-id: 27c502af10ee390d29607038118a99ff0067aec4	2019-07-29 10:38:47 -07:00
Myle Ott	8d036c2fe0	Add RoBERTa Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/916 Differential Revision: D16537774 Pulled By: myleott fbshipit-source-id: 86bb7b1913a428ee4a21674cc3fc7b39264067ec	2019-07-28 18:44:02 -07:00
alexeib	392fce8a98	wav2vec model (#654 ) Summary: Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654 Differential Revision: D15913409 Pulled By: alexeib fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80	2019-06-19 19:24:48 -07:00
Bairen Yi	a8f28ecb63	Python3.5 compat (#794 ) Summary: See #467. Ping myleott to review. This is a work-related contribution. Ping lark to review. Pull Request resolved: https://github.com/pytorch/fairseq/pull/794 Differential Revision: D15756816 Pulled By: myleott fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b	2019-06-11 04:10:08 -07:00
Khoa Ho	d5f76d7446	Clarify mixed precision training support (#766 ) Summary: Change the wording to avoid confusion. Mixed precision ensures both higher arithmetic throughput and numerical stability, not exactly synonymous to pure half-precision/FP16 training. Also add mentioning of tensor cores since older generation GPUs without tensor cores don't support true mixed precision training. Pull Request resolved: https://github.com/pytorch/fairseq/pull/766 Differential Revision: D15559565 Pulled By: myleott fbshipit-source-id: c71e720772657bb3e8ad330b58bf69e23beb614e	2019-05-30 12:07:48 -07:00
Myle Ott	849605a0fd	Update comments and citations Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a	2019-04-29 08:01:16 -07:00
Myle Ott	e6422528da	0.6.1 -> 0.6.2 (#577 ) Summary: Changelog: - `998ba4f`: Add language models from Baevski & Auli (2018) - `4294c4f`: Add mixture of experts code from Shen et al. (2019) - `0049349`: Add example for multilingual training - `48d9afb`: Speed improvements, including fused operators from apex - `44d27e6`: Add Tensorboard support - `d17fa85`: Add Adadelta optimizer - `9e1c880`: Add `FairseqEncoderModel` - `b65c579`: Add `FairseqTask.inference_step` to modularize generate.py - `2ad1178`: Add back `--curriculum` - Misc bug fixes and other features Pull Request resolved: https://github.com/pytorch/fairseq/pull/577 Differential Revision: D14481233 Pulled By: myleott fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7	2019-03-15 10:27:01 -07:00
Myle Ott	48d9afbeb3	Speed improvements (#531 ) Summary: * Add FusedLayerNorm and FusedAdam * Softmax and zero grad optimizations Pull Request resolved: https://github.com/pytorch/fairseq/pull/531 Differential Revision: D14218457 Pulled By: myleott fbshipit-source-id: 5656b2d0152cd85f77dc21ec0e1439ec04b9fa89	2019-03-14 11:42:19 -07:00
Myle Ott	392bdd6ce0	Update README for Mixture of Experts paper Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/522 Differential Revision: D14194672 Pulled By: myleott fbshipit-source-id: 4ff669826c4313de6f12076915cfb1bd15289ef0	2019-02-22 16:34:45 -08:00
Myle Ott	4294c4f6d7	Add code for mixture of experts (#521 ) Summary: Code for the paper: [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](https://arxiv.org/abs/1902.07816). Pull Request resolved: https://github.com/pytorch/fairseq/pull/521 Differential Revision: D14188021 Pulled By: myleott fbshipit-source-id: ed5b1ed5ad9a582359bd5215fa2ea26dc76c673e	2019-02-22 13:14:09 -08:00
Myle Ott	fbd4cef9a5	Add fairseq to PyPI (#495 ) Summary: - fairseq can now be installed via pip: `pip install fairseq` - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc. Pull Request resolved: https://github.com/pytorch/fairseq/pull/495 Differential Revision: D14017761 Pulled By: myleott fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235	2019-02-08 22:03:29 -08:00
Myle Ott	b41c74dc5b	Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473 ) Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9	2019-01-25 15:40:26 -08:00
Huihui Fan	d9284ee7ea	Fixes (#442 ) Summary: minor fixes: 1- adding fairseq logo 2- encoder padding for fconv self att 3- legacy ddp change Pull Request resolved: https://github.com/pytorch/fairseq/pull/442 Differential Revision: D13651715 Pulled By: myleott fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7	2019-01-14 08:58:51 -08:00

1 2

79 Commits