fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-22 06:39:29 +03:00

Author	SHA1	Message	Date
Myle Ott	b4d57c6d49	Move TPU grad reductions out of Trainer into TPUDistributedDataParallel (#1397 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1397 Data parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001` Data parallel before: ``` 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:20:13 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:20:13 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:20:14 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:20:14 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:20:14 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:20:19 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=5 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=160870, ups=4.91, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=517232, ups=15.77, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=537322, ups=16.38, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=540488, ups=16.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=543411, ups=16.57, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:19 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=540359, ups=16.47, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=533446, ups=16.26, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:20:20 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=539734, ups=16.46, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel after: ``` 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:02 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:02 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:03 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:03 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:03 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:08 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108002, wps=157099, ups=4.79, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.8, wps=560049, ups=17.08, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.4, wps=558507, ups=17.03, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492.1, wps=514194, ups=15.68, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:08 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603.2, wps=552676, ups=16.85, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=546402, ups=16.66, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=508472, ups=15.5, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:09 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.4, wps=552493, ups=16.84, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=6 ``` Data parallel command (no_c10d): `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --share-decoder-input-output-embed --lr 0.0001 --dp-backend no_c10d` Data parallel before: ``` 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:19:25 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:19:25 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:19:25 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:19:26 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:19:26 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:19:31 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:19:31 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=141659, ups=4.32, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=503762, ups=15.36, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=488599, ups=14.9, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=507855, ups=15.48, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=503270, ups=15.34, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=467778, ups=14.26, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=503800, ups=15.36, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=7 2020-11-04 08:19:32 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=468486, ups=14.28, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Data parallel after: ``` 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:14:50 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 08:14:50 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:14:50 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:14:51 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:14:51 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:14:56 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 2 / 3587 loss=19.682, ppl=841142, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=13.17, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 3 / 3587 loss=16.721, ppl=108001, wps=137677, ups=4.2, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=4.507, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 4 / 3587 loss=16.07, ppl=68785.9, wps=519541, ups=15.84, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=2.737, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 5 / 3587 loss=15.714, ppl=53741.5, wps=517063, ups=15.76, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=2.542, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 6 / 3587 loss=15.441, ppl=44492, wps=490728, ups=14.95, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=2.485, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 7 / 3587 loss=15.199, ppl=37603, wps=505262, ups=15.41, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.382, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:56 \| INFO \| train_inner \| epoch 001: 8 / 3587 loss=14.984, ppl=32414, wps=508874, ups=15.52, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.274, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 9 / 3587 loss=14.7, ppl=26622.2, wps=518028, ups=15.79, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.16, loss_scale=64, train_wall=0, wall=6 2020-11-04 08:14:57 \| INFO \| train_inner \| epoch 001: 10 / 3587 loss=14.482, ppl=22875.3, wps=515996, ups=15.73, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.055, loss_scale=64, train_wall=0, wall=7 ``` Model parallel command: `python train.py ~/data/data-bin/wikitext-103-roberta-bpe-bin/ --task language_modeling --arch transformer_lm_megatron --decoder-layers 4 --batch-size 8 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --model-parallel-size 2 --share-decoder-input-output-embed --lr 0.0001` Model parallel before: ``` 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:18:38 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:18:38 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:18:38 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:18:39 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:18:39 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:18:44 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=7 2020-11-04 08:18:45 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=48371.7, ups=2.95, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72422.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=72933.5, ups=4.45, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=71974.8, ups=4.39, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:18:46 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=72897.8, ups=4.45, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=73044.6, ups=4.46, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73453.1, ups=4.48, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:18:47 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73442.6, ups=4.48, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=9 ``` Model parallel after: ``` 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 08:12:09 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 08:12:09 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 08:12:09 \| INFO \| fairseq.data.data_utils \| loaded 1801350 examples from: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/train 2020-11-04 08:12:10 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 08:12:10 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 08:12:16 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 2 / 7173 loss=55.997, ppl=7.19017e+16, wps=0, ups=0, wpb=16384, bsz=32, num_updates=1, lr=0.0001, gnorm=14.03, loss_scale=64, train_wall=1, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 3 / 7173 loss=28.372, ppl=3.47501e+08, wps=53097, ups=3.24, wpb=16384, bsz=32, num_updates=2, lr=0.0001, gnorm=15.339, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 4 / 7173 loss=15.855, ppl=59276.8, wps=72355.5, ups=4.42, wpb=16384, bsz=32, num_updates=3, lr=0.0001, gnorm=4.189, loss_scale=64, train_wall=0, wall=8 2020-11-04 08:12:17 \| INFO \| train_inner \| epoch 001: 5 / 7173 loss=14.713, ppl=26858.7, wps=70526.4, ups=4.3, wpb=16384, bsz=32, num_updates=4, lr=0.0001, gnorm=4.751, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 6 / 7173 loss=13.901, ppl=15299.7, wps=73063.5, ups=4.46, wpb=16384, bsz=32, num_updates=5, lr=0.0001, gnorm=4.361, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 7 / 7173 loss=13.312, ppl=10169.5, wps=73559.4, ups=4.49, wpb=16384, bsz=32, num_updates=6, lr=0.0001, gnorm=3.307, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 8 / 7173 loss=12.914, ppl=7720.21, wps=72693.2, ups=4.44, wpb=16384, bsz=32, num_updates=7, lr=0.0001, gnorm=5.473, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:18 \| INFO \| train_inner \| epoch 001: 9 / 7173 loss=12.56, ppl=6036.72, wps=73531.2, ups=4.49, wpb=16384, bsz=32, num_updates=8, lr=0.0001, gnorm=6.112, loss_scale=64, train_wall=0, wall=9 2020-11-04 08:12:19 \| INFO \| train_inner \| epoch 001: 10 / 7173 loss=12.116, ppl=4437.77, wps=73187.6, ups=4.47, wpb=16384, bsz=32, num_updates=9, lr=0.0001, gnorm=4.415, loss_scale=64, train_wall=0, wall=10 ``` Test Plan: Imported from OSS Reviewed By: ngoyal2707 Differential Revision: D24729295 Pulled By: myleott fbshipit-source-id: beee8bdece3eaa0419a2e813990420411e507c75	2020-11-05 15:29:33 -08:00
Myle Ott	f57b148938	Require process group for all helpers in distributed_utils (#1395 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1395 Data parallel command: `python train.py --task dummy_lm --arch transformer_lm --tokens-per-sample 512 --max-sentences 8 --decoder-attention-heads 8 --dropout 0.0 --activation-dropout 0.0 --optimizer adam --lr 0.0001 --log-format simple --log-interval 1 --no-save --clip-norm 0.0` Data parallel before: ``` 2020-11-04 07:14:16 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 07:14:16 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 07:14:16 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 07:14:16 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 07:14:16 \| INFO \| fairseq.trainer \| NOTE: your device may support faster training with --fp16 2020-11-04 07:14:16 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 07:14:16 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 07:14:21 \| INFO \| train_inner \| epoch 001: 1 / 1563 loss=16.297, ppl=80495, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=2.501, train_wall=2, wall=5 2020-11-04 07:14:21 \| INFO \| train_inner \| epoch 001: 2 / 1563 loss=15.399, ppl=43203.8, wps=101398, ups=3.09, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=2.101, train_wall=0, wall=6 2020-11-04 07:14:21 \| INFO \| train_inner \| epoch 001: 3 / 1563 loss=14.742, ppl=27411.2, wps=217567, ups=6.63, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=1.888, train_wall=0, wall=6 2020-11-04 07:14:21 \| INFO \| train_inner \| epoch 001: 4 / 1563 loss=14.206, ppl=18899.3, wps=219413, ups=6.69, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=1.91, train_wall=0, wall=6 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 5 / 1563 loss=13.697, ppl=13282.1, wps=219446, ups=6.69, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=1.98, train_wall=0, wall=6 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 6 / 1563 loss=13.179, ppl=9274.18, wps=220131, ups=6.71, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.08, train_wall=0, wall=6 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 7 / 1563 loss=12.634, ppl=6358.37, wps=220236, ups=6.72, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.195, train_wall=0, wall=6 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 8 / 1563 loss=12.056, ppl=4256.86, wps=220392, ups=6.72, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.259, train_wall=0, wall=6 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 9 / 1563 loss=11.453, ppl=2804.05, wps=225842, ups=6.89, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.287, train_wall=0, wall=7 2020-11-04 07:14:22 \| INFO \| train_inner \| epoch 001: 10 / 1563 loss=10.842, ppl=1835, wps=238808, ups=7.28, wpb=32768, bsz=64, num_updates=10, lr=0.0001, gnorm=2.311, train_wall=0, wall=7 ``` Data parallel after: ``` 2020-11-04 07:14:47 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 07:14:47 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 8 2020-11-04 07:14:47 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last.pt 2020-11-04 07:14:47 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 07:14:47 \| INFO \| fairseq.trainer \| NOTE: your device may support faster training with --fp16 2020-11-04 07:14:47 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 07:14:47 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 07:14:52 \| INFO \| train_inner \| epoch 001: 1 / 1563 loss=16.297, ppl=80495, wps=0, ups=0, wpb=32768, bsz=64, num_updates=1, lr=0.0001, gnorm=2.501, train_wall=2, wall=5 2020-11-04 07:14:52 \| INFO \| train_inner \| epoch 001: 2 / 1563 loss=15.399, ppl=43203.8, wps=96089.4, ups=2.93, wpb=32768, bsz=64, num_updates=2, lr=0.0001, gnorm=2.101, train_wall=0, wall=5 2020-11-04 07:14:52 \| INFO \| train_inner \| epoch 001: 3 / 1563 loss=14.742, ppl=27411.2, wps=239285, ups=7.3, wpb=32768, bsz=64, num_updates=3, lr=0.0001, gnorm=1.888, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 4 / 1563 loss=14.206, ppl=18899.3, wps=233039, ups=7.11, wpb=32768, bsz=64, num_updates=4, lr=0.0001, gnorm=1.91, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 5 / 1563 loss=13.697, ppl=13282.1, wps=237484, ups=7.24, wpb=32768, bsz=64, num_updates=5, lr=0.0001, gnorm=1.98, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 6 / 1563 loss=13.179, ppl=9274.18, wps=231683, ups=7.07, wpb=32768, bsz=64, num_updates=6, lr=0.0001, gnorm=2.08, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 7 / 1563 loss=12.634, ppl=6358.37, wps=233804, ups=7.13, wpb=32768, bsz=64, num_updates=7, lr=0.0001, gnorm=2.195, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 8 / 1563 loss=12.056, ppl=4256.86, wps=234025, ups=7.14, wpb=32768, bsz=64, num_updates=8, lr=0.0001, gnorm=2.259, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 9 / 1563 loss=11.453, ppl=2804.05, wps=238426, ups=7.27, wpb=32768, bsz=64, num_updates=9, lr=0.0001, gnorm=2.287, train_wall=0, wall=6 2020-11-04 07:14:53 \| INFO \| train_inner \| epoch 001: 10 / 1563 loss=10.842, ppl=1835, wps=240069, ups=7.32, wpb=32768, bsz=64, num_updates=10, lr=0.0001, gnorm=2.311, train_wall=0, wall=6 ``` Model parallel command: `python train.py --task dummy_lm --arch transformer_lm_megatron --decoder-layers 2 --batch-size 2 --tokens-per-sample 512 --log-format simple --log-interval 1 --fp16 --optimizer adam --model-parallel-size 2 --share-decoder-input-output-embed --lr 0.0001` Model parallel before: ``` 2020-11-04 07:12:22 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 07:12:22 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 2 2020-11-04 07:12:22 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 07:12:22 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 07:12:23 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 07:12:23 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 1 / 12500 loss=60.017, ppl=1.16627e+18, wps=0, ups=0, wpb=4096, bsz=8, num_updates=1, lr=0.0001, gnorm=8.531, loss_scale=128, train_wall=2, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 2 / 12500 loss=46.473, ppl=9.77028e+13, wps=48996.6, ups=11.95, wpb=4096, bsz=8, num_updates=2, lr=0.0001, gnorm=15.019, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 3 / 12500 loss=30.525, ppl=1.54543e+09, wps=58424.2, ups=14.25, wpb=4096, bsz=8, num_updates=3, lr=0.0001, gnorm=13.936, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 4 / 12500 loss=18.561, ppl=386799, wps=58399.5, ups=14.24, wpb=4096, bsz=8, num_updates=4, lr=0.0001, gnorm=7.251, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 5 / 12500 loss=15.145, ppl=36230, wps=58275.6, ups=14.21, wpb=4096, bsz=8, num_updates=5, lr=0.0001, gnorm=2.392, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 6 / 12500 loss=14.683, ppl=26304.2, wps=58704.8, ups=14.32, wpb=4096, bsz=8, num_updates=6, lr=0.0001, gnorm=2.487, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:28 \| INFO \| train_inner \| epoch 001: 7 / 12500 loss=14.169, ppl=18418.9, wps=58449.2, ups=14.26, wpb=4096, bsz=8, num_updates=7, lr=0.0001, gnorm=2.45, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:29 \| INFO \| train_inner \| epoch 001: 8 / 12500 loss=13.574, ppl=12197.4, wps=59106.5, ups=14.42, wpb=4096, bsz=8, num_updates=8, lr=0.0001, gnorm=2.393, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:29 \| INFO \| train_inner \| epoch 001: 9 / 12500 loss=12.974, ppl=8047.87, wps=58619.6, ups=14.3, wpb=4096, bsz=8, num_updates=9, lr=0.0001, gnorm=2.317, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:12:29 \| INFO \| train_inner \| epoch 001: 10 / 12500 loss=12.341, ppl=5187.55, wps=58166.5, ups=14.19, wpb=4096, bsz=8, num_updates=10, lr=0.0001, gnorm=2.213, loss_scale=128, train_wall=0, wall=6 ``` Model parallel after: ``` 2020-11-04 07:11:07 \| INFO \| fairseq_cli.train \| training on 8 devices (GPUs/TPUs) 2020-11-04 07:11:07 \| INFO \| fairseq_cli.train \| max tokens per GPU = None and batch size per GPU = 2 2020-11-04 07:11:07 \| INFO \| fairseq.trainer \| no existing checkpoint found checkpoints/checkpoint_last-model_part-0.pt 2020-11-04 07:11:07 \| INFO \| fairseq.trainer \| loading train data for epoch 1 2020-11-04 07:11:08 \| INFO \| fairseq.optim.adam \| using FusedAdam 2020-11-04 07:11:08 \| INFO \| fairseq.trainer \| begin training epoch 1 2020-11-04 07:11:13 \| INFO \| train_inner \| epoch 001: 1 / 12500 loss=60.017, ppl=1.16627e+18, wps=0, ups=0, wpb=4096, bsz=8, num_updates=1, lr=0.0001, gnorm=8.531, loss_scale=128, train_wall=2, wall=6 2020-11-04 07:11:13 \| INFO \| train_inner \| epoch 001: 2 / 12500 loss=46.473, ppl=9.77028e+13, wps=47018.1, ups=11.47, wpb=4096, bsz=8, num_updates=2, lr=0.0001, gnorm=15.019, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:13 \| INFO \| train_inner \| epoch 001: 3 / 12500 loss=30.525, ppl=1.54543e+09, wps=59292.6, ups=14.46, wpb=4096, bsz=8, num_updates=3, lr=0.0001, gnorm=13.936, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:13 \| INFO \| train_inner \| epoch 001: 4 / 12500 loss=18.561, ppl=386799, wps=57708.9, ups=14.08, wpb=4096, bsz=8, num_updates=4, lr=0.0001, gnorm=7.251, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 5 / 12500 loss=15.145, ppl=36230, wps=57427.4, ups=14.01, wpb=4096, bsz=8, num_updates=5, lr=0.0001, gnorm=2.392, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 6 / 12500 loss=14.683, ppl=26304.2, wps=58730.2, ups=14.33, wpb=4096, bsz=8, num_updates=6, lr=0.0001, gnorm=2.487, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 7 / 12500 loss=14.169, ppl=18418.9, wps=59523.2, ups=14.52, wpb=4096, bsz=8, num_updates=7, lr=0.0001, gnorm=2.45, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 8 / 12500 loss=13.574, ppl=12197.4, wps=58945.2, ups=14.38, wpb=4096, bsz=8, num_updates=8, lr=0.0001, gnorm=2.393, loss_scale=128, train_wall=0, wall=6 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 9 / 12500 loss=12.974, ppl=8047.87, wps=59659.2, ups=14.55, wpb=4096, bsz=8, num_updates=9, lr=0.0001, gnorm=2.317, loss_scale=128, train_wall=0, wall=7 2020-11-04 07:11:14 \| INFO \| train_inner \| epoch 001: 10 / 12500 loss=12.341, ppl=5187.55, wps=59681.4, ups=14.56, wpb=4096, bsz=8, num_updates=10, lr=0.0001, gnorm=2.213, loss_scale=128, train_wall=0, wall=7 ``` Test Plan: Imported from OSS Reviewed By: ngoyal2707 Differential Revision: D24728687 Pulled By: myleott fbshipit-source-id: 2d387d022ee889494f429b98df1942167896e306	2020-11-05 09:44:32 -08:00
alexeib	b58f4f017e	end to end hydra configs (#1393 ) Summary: this adds a hydra_train binary that uses hydra configs/command line overrides instead of argparse use case 1: built in configs + overrides from command line ``` python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/transformer_lm_gpt task=language_modeling optimization.max_update=5000 ``` use case 2: use an external config that is used instead of bundled configs (but dataclass defaults still work) ``` python fairseq_cli/hydra_train.py --config-path ~/fairseq-py-dev/lm --config-name wiki103 ``` the config file contains this: ``` # package _group_ model: _name: transformer_lm distributed_training: distributed_world_size: 1 dataset: batch_size: 2 task: _name: language_modeling data: /private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ add_bos_token: false max_target_positions: 1024 optimization: max_update: 50000 lr: [ 0.25 ] criterion: cross_entropy optimizer: adam lr_scheduler: _name: cosine ``` use case 3: use an external config directory that provides additional configs for e.g. models python fairseq_cli/hydra_train.py distributed_training.distributed_world_size=1 dataset.batch_size=2 task.data=/private/home/myleott/data/data-bin/wikitext-103-roberta-bpe-bin/ model=transformer_lm/2_layers task=language_modeling optimization.max_update=5000 --config-dir ~/fairseq-py-dev/lm/hydra where ~/fairseq-py-dev/lm/hydra has the following structure: - model -- transformer_lm --- 2_layers.yaml and inside 2_layers.yaml is a copy of transformer_lm_gpt.yaml but with decoder_layers set to 2 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1393 Reviewed By: myleott Differential Revision: D24722252 Pulled By: alexeib fbshipit-source-id: 758ea431fa099cd7c0e4daf41eff680df1d3b841	2020-11-04 18:20:12 -08:00
Alex Xiao	ea4ccd94de	Load and broadcast fairseq checkpoints instead of having each rank load them individually Summary: This diff is based on feedback in D24379649 Before when loading checkpoints: Each rank loads the checkpoint from Manifold. Now: Rank 0 loads checkpoint from Manifold. This checkpoint is broadcasted to all other ranks. This saves IO. Furthermore, when doing zero-sharding, we only broadcast the relevant parts of the optimizer state to each node. This makes checkpoint loading more memory-efficient and should enable loading models beyond 2-3B parameters. Reviewed By: myleott Differential Revision: D24660791 fbshipit-source-id: e30b2ea5990083375e4549f0427a112346ba170d	2020-11-04 12:57:29 -08:00
Myle Ott	1a709b2a40	Reproduce #1781 . Add Weights and Biases support Summary: Fixes https://github.com/pytorch/fairseq/issues/1790. Reviewed By: alexeib Differential Revision: D24579153 fbshipit-source-id: 74a30effa164db9d6376554376e36b1f47618899 Co-authored-by: Nikolay Korolev <korolevns98@gmail.com> Co-authored-by: Vlad Lyalin <Guitaricet@gmail.com>	2020-11-03 20:48:00 -08:00
Myle Ott	dd52ed0f38	Small fixes (#1392 ) Summary: - Set default value of clip-norm back to 0.0 (disabled) - Add comment explaining that we divide loss by log(2) to covert the base - Fix `--zero-optimizer=os` (fixes #2811) - Update requirements to PyTorch >= 1.5 - Fix bug in fixed LR schedule Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1392 Reviewed By: alexeib Differential Revision: D24714231 Pulled By: myleott fbshipit-source-id: 63dc8cfc74683bbccbf05b44228014eb12ddbfc7	2020-11-03 20:45:06 -08:00
Joshua Meier	b120fbbe8f	Fix correctness issue with megatron save/load checkpoints (#1386 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/2681. Proof that it's working now: ``` python fairseq_train.py --task masked_lm /checkpoint/bioseq_nonsecure/model-parallel-data/tiny_sample_valid_ur50-bin --dataset-impl fasta --save-dir checkpoints/mp-fix4 --dropout 0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 --tokens-per-sample 128 --sample-break-mode none --max-tokens 128 --no-progress-bar --log-interval 1 --seed 4 --max-epoch 1 --max-update 50 --encoder-layers 4 --arch model_parallel_roberta_large --model-parallel-size 2 --update-freq 2 --save-interval-updates 10 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 11 / 78 loss=0.939, ppl=1.92, wps=116.7, ups=0.11, wpb=1024, bsz=8, num_updates=11, lr=1.47473e-06, gnorm=2.276, train_wall=0, wall=15 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 12 / 78 loss=0.938, ppl=1.92, wps=15769.2, ups=15.38, wpb=1024, bsz=8, num_updates=12, lr=1.5997e-06, gnorm=2.612, train_wall=0, wall=15 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 13 / 78 loss=0.877, ppl=1.84, wps=18658.8, ups=18.2, wpb=1024, bsz=8, num_updates=13, lr=1.72468e-06, gnorm=2.798, train_wall=0, wall=15 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 14 / 78 loss=0.887, ppl=1.85, wps=18324.5, ups=17.88, wpb=1024, bsz=8, num_updates=14, lr=1.84965e-06, gnorm=2.326, train_wall=0, wall=15 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 15 / 78 loss=0.867, ppl=1.82, wps=17616.5, ups=17.19, wpb=1024, bsz=8, num_updates=15, lr=1.97463e-06, gnorm=2.112, train_wall=0, wall=15 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 16 / 78 loss=0.891, ppl=1.85, wps=18624.5, ups=18.17, wpb=1024, bsz=8, num_updates=16, lr=2.0996e-06, gnorm=2.123, train_wall=0, wall=16 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 17 / 78 loss=0.887, ppl=1.85, wps=17972.5, ups=17.53, wpb=1024, bsz=8, num_updates=17, lr=2.22458e-06, gnorm=2.061, train_wall=0, wall=16 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 18 / 78 loss=0.862, ppl=1.82, wps=14672.4, ups=14.32, wpb=1024, bsz=8, num_updates=18, lr=2.34955e-06, gnorm=2.282, train_wall=0, wall=16 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 19 / 78 loss=0.876, ppl=1.83, wps=14398.6, ups=14.05, wpb=1024, bsz=8, num_updates=19, lr=2.47453e-06, gnorm=2.261, train_wall=0, wall=16 2020-10-29 18:42:08 \| INFO \| train_inner \| epoch 001: 20 / 78 loss=0.818, ppl=1.76, wps=18652.2, ups=18.2, wpb=1024, bsz=8, num_updates=20, lr=2.5995e-06, gnorm=1.969, train_wall=0, wall=16 ...relaunch... 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 11 / 78 loss=0.939, ppl=1.92, wps=98.2, ups=0.1, wpb=1024, bsz=8, num_updates=11, lr=1.47473e-06, gnorm=2.276, train_wall=1, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 12 / 78 loss=0.938, ppl=1.92, wps=17137.8, ups=16.72, wpb=1024, bsz=8, num_updates=12, lr=1.5997e-06, gnorm=2.612, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 13 / 78 loss=0.877, ppl=1.84, wps=17239.6, ups=16.82, wpb=1024, bsz=8, num_updates=13, lr=1.72468e-06, gnorm=2.798, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 14 / 78 loss=0.887, ppl=1.85, wps=18132, ups=17.69, wpb=1024, bsz=8, num_updates=14, lr=1.84965e-06, gnorm=2.326, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 15 / 78 loss=0.867, ppl=1.82, wps=17795.1, ups=17.36, wpb=1024, bsz=8, num_updates=15, lr=1.97463e-06, gnorm=2.112, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 16 / 78 loss=0.891, ppl=1.85, wps=18021.3, ups=17.58, wpb=1024, bsz=8, num_updates=16, lr=2.0996e-06, gnorm=2.123, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 17 / 78 loss=0.887, ppl=1.85, wps=16452.9, ups=16.05, wpb=1024, bsz=8, num_updates=17, lr=2.22458e-06, gnorm=2.061, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 18 / 78 loss=0.862, ppl=1.82, wps=17563.3, ups=17.14, wpb=1024, bsz=8, num_updates=18, lr=2.34955e-06, gnorm=2.282, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 19 / 78 loss=0.876, ppl=1.83, wps=16770.3, ups=16.36, wpb=1024, bsz=8, num_updates=19, lr=2.47453e-06, gnorm=2.261, train_wall=0, wall=0 2020-10-29 18:47:20 \| INFO \| train_inner \| epoch 001: 20 / 78 loss=0.818, ppl=1.76, wps=16808.2, ups=16.4, wpb=1024, bsz=8, num_updates=20, lr=2.5995e-06, gnorm=1.969, train_wall=0, wall=0 ``` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1386 Reviewed By: myleott Differential Revision: D24640946 Pulled By: joshim5 fbshipit-source-id: cb141d92496b289a04d53f080ecd4d5ac6941672	2020-11-03 14:07:06 -08:00
Shashank Jain	de977736f9	Support running batch of sentences together on GPU with BART fill_mask (#2833 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2833 Add support for filling masks using BART on a batch of sentences. This will be helpful when running on GPU Reviewed By: myleott Differential Revision: D24687773 fbshipit-source-id: 1b8005c18a09be526f40e9e2b99207afa38e0f1a	2020-11-02 17:17:15 -08:00
Yuqing Tang	de859692ff	Enable translation_multi_simple_epoch to have different source and target dictionaries Summary: In past, we always use shared dictionary for multilingual experiments. This diff renables different dictionaries for source and target languages by changing the assertion criteria and reverts back to use specific languages to return source_dict and target_dict. Reviewed By: chtran Differential Revision: D24637682 fbshipit-source-id: a982e4f1e48395cc5bf10dc03b98fbe970062f8d	2020-10-30 18:25:25 -07:00
Myle Ott	a4356b1da2	Simplify --user-dir and require user-dir module name to be globally unique (#2815 ) Summary: This PR reverts recent changes that attempted to make `--user-dir` work with non-unique module names. But that new approach introduced other issues (e.g., poor compatibility with multiprocessing and Windows), so let's revert to the previous simpler implementation. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2815 Reviewed By: alexeib Differential Revision: D24611571 Pulled By: myleott fbshipit-source-id: cecfe28395585ca0401f844f10bd0d49d014c4d8	2020-10-29 17:08:20 -07:00
Anuroop Sriram	6debe29150	Compute WER for Wav2Vec 2.0 Seq2Seq models (#1376 ) Summary: # Before submitting - [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? ## What does this PR do? Adds support to compute WER for wav2vec2.0 seq2seq models. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1376 Reviewed By: alexeib Differential Revision: D24611516 Pulled By: anuroopsriram fbshipit-source-id: dd7daab73ebccc21367dd51f41a11e89c404977b	2020-10-29 11:46:08 -07:00
Myle Ott	4cdc81f6f1	Support activation checkpointing in Transformer (#1378 ) Summary: Without activation checkpointing (peak GPU memory usage: 7138MiB) ``` $ python train.py --task dummy_mt --arch transformer --dropout 0.1 --max-tokens 4096 --optimizer adam --lr 0.00001 --log-format simple --log-interval 25 --fp16 (...) 2020-10-28 08:03:03 \| INFO \| train_inner \| epoch 001: 25 / 92 loss=12.67, ppl=6517.2, wps=281380, ups=8.61, wpb=32640, bsz=1088, num_updates=25, lr=1e-05, gnorm=8.541, clip=0, loss_scale=128, train_wall=5, wall=10 2020-10-28 08:03:05 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-10-28 08:03:06 \| INFO \| train_inner \| epoch 001: 51 / 92 loss=8.938, ppl=490.52, wps=302975, ups=9.28, wpb=32640, bsz=1088, num_updates=50, lr=1e-05, gnorm=6.395, clip=0, loss_scale=64, train_wall=3, wall=12 2020-10-28 08:03:08 \| INFO \| train_inner \| epoch 001: 76 / 92 loss=3.855, ppl=14.47, wps=316039, ups=9.68, wpb=32640, bsz=1088, num_updates=75, lr=1e-05, gnorm=9.078, clip=0, loss_scale=64, train_wall=3, wall=15 2020-10-28 08:03:10 \| INFO \| fairseq_cli.train \| begin validation on "valid" subset 2020-10-28 08:03:17 \| INFO \| valid \| epoch 001 \| valid on 'valid' subset \| loss 0.048 \| ppl 1.03 \| wps 1.09646e+06 \| wpb 32640 \| bsz 1088 \| num_updates 91 ``` With activation checkpointing (peak GPU memory usage: 6466MiB) ``` $ python train.py --checkpoint-activations --task dummy_mt --arch transformer --dropout 0.1 --max-tokens 4096 --optimizer adam --lr 0.00001 --log-format simple --log-interval 25 --fp16 (...) 2020-10-28 08:01:50 \| INFO \| train_inner \| epoch 001: 25 / 92 loss=12.67, ppl=6517.22, wps=291110, ups=8.91, wpb=32640, bsz=1088, num_updates=25, lr=1e-05, gnorm=8.541, clip=0, loss_scale=128, train_wall=4, wall=9 2020-10-28 08:01:51 \| INFO \| fairseq.trainer \| NOTE: overflow detected, setting loss scale to: 64.0 2020-10-28 08:01:52 \| INFO \| train_inner \| epoch 001: 51 / 92 loss=8.938, ppl=490.54, wps=295438, ups=9.05, wpb=32640, bsz=1088, num_updates=50, lr=1e-05, gnorm=6.394, clip=0, loss_scale=64, train_wall=3, wall=12 2020-10-28 08:01:55 \| INFO \| train_inner \| epoch 001: 76 / 92 loss=3.855, ppl=14.47, wps=308351, ups=9.45, wpb=32640, bsz=1088, num_updates=75, lr=1e-05, gnorm=9.082, clip=0, loss_scale=64, train_wall=3, wall=14 2020-10-28 08:01:57 \| INFO \| fairseq_cli.train \| begin validation on "valid" subset ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1378 Reviewed By: min-xu-ai Differential Revision: D24593170 Pulled By: myleott fbshipit-source-id: 701254e603a2277d22f8b3bcc3ebbade54bb7479	2020-10-28 18:35:56 -07:00
alexeib	b7d8b9dce2	fix architecture params (#1382 ) Summary: fixes architectures not getting applied to migrated models Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1382 Reviewed By: myleott Differential Revision: D24603110 Pulled By: alexeib fbshipit-source-id: 18f44d3736853282466feed5e8896db95338b097	2020-10-28 18:29:11 -07:00
freewym	9c66ff54c4	build_generator() in generator.py should accept cfg.generation instea… (#2813 ) Summary: …d of cfg.task # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2813 Reviewed By: alexeib Differential Revision: D24604698 Pulled By: myleott fbshipit-source-id: e41996147203ec47274ded803bab910460a19eb3	2020-10-28 18:21:08 -07:00
Myle Ott	e4e01780f8	Fix dummy LM task (#1381 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1381 Reviewed By: alexeib Differential Revision: D24603479 Pulled By: myleott fbshipit-source-id: 5aae8da9c0f20d6526c98b0b37bf9b32a8c78393	2020-10-28 18:19:07 -07:00
alexeib	65b02d529a	fix wav2vec infer and finetuning (#1384 ) Summary: Fixes #2807, #2810, #2519 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1384 Reviewed By: myleott Differential Revision: D24605451 Pulled By: alexeib fbshipit-source-id: 46ec8f273ac2fab86bd444461e2706c35608b250	2020-10-28 17:18:12 -07:00
alexeib	f6d9313092	fix eval lm (#1380 ) Summary: fixes eval lm that wasnt parsing arguments correctly Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1380 Reviewed By: myleott Differential Revision: D24600415 Pulled By: alexeib fbshipit-source-id: eb56575bef4d20a3cd5cee3dcd279046f085d938	2020-10-28 14:59:44 -07:00
Elijah Rippeth	3c726544d2	fix issue where is_initialized is not available in single-worker paradigm (#2801 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/1205 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2801 Reviewed By: alexeib Differential Revision: D24579193 Pulled By: myleott fbshipit-source-id: bcb14bb588d4538398bff4114e0a387fd29818c5	2020-10-28 14:54:11 -07:00
Myle Ott	1bc83c703a	Misc fixes (#2786 ) Summary: - Rename type -> key in fairseq/tasks/sentence_prediction.py (fixes https://github.com/pytorch/fairseq/issues/2746) - Update preprocessing docs (fixes https://github.com/pytorch/fairseq/issues/2565) - Turn off logging in test_fp16_optimizer.TestGradientScaling - Documentation updates - Remove some unused code - Fix noisychannel example (fixes https://github.com/pytorch/fairseq/issues/2213) Pull Request resolved: https://github.com/pytorch/fairseq/pull/2786 Reviewed By: shruti-bh Differential Revision: D24515146 Pulled By: myleott fbshipit-source-id: 86b0f5516c57610fdca801c60e58158ef052fc3a	2020-10-27 11:26:07 -07:00
Myle Ott	01be083e46	Centralize hydra init (and support packaged location of configs) (#2784 ) Summary: Configs can either be in `/fairseq/configs` (once the package is installed) or `/configs` (if using an editable installation). This centralizes the hydra init and supports these two possible config locations. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2784 Reviewed By: alexeib Differential Revision: D24513586 Pulled By: myleott fbshipit-source-id: 8e10a88177ebcf809d5d37d448d2b384142febef	2020-10-27 07:46:44 -07:00
Shruti Bhosale	beeac0ad68	Get 12B M2M-100 model generation to work correctly on exactly 2 32gb gpus (#1366 ) Summary: # What does this PR do? Addresses https://github.com/pytorch/fairseq/issues/2772 where external users can't generate using the model because the README is currently not accurate. This PR fixes the issues in the README Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1366 Reviewed By: edunov Differential Revision: D24455634 Pulled By: shruti-bh fbshipit-source-id: 480a11f8b95d1278162d585700e58d467a35d35a	2020-10-27 02:15:13 -07:00
Vladimir Smirnov	81677d751d	Update README.md (#2796 ) Summary: Fixed link. # Before submitting - [-] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [+] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [+] Did you make sure to update the docs? - [-] Did you write any new necessary tests? ## What does this PR do? Fixes link. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2796 Reviewed By: nlaptev Differential Revision: D24538759 Pulled By: myleott fbshipit-source-id: af947f432c34ca2aec35c9fe59dd1214e363450b	2020-10-26 08:18:52 -07:00
alexeib	3c41478083	fix loading emissions (#1375 ) Summary: broken in last change to infer.py Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1375 Reviewed By: xuqiantong Differential Revision: D24531499 Pulled By: alexeib fbshipit-source-id: fab60abf67a05c48e1ff750fac3ab6d4c0fa2770	2020-10-25 12:54:22 -07:00
alexeib	6ee0364685	fix building components when no configuration is provided (#1374 ) Summary: see title, in particular fixes evaluating generate.py with --scoring wer Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1374 Reviewed By: kahne Differential Revision: D24527059 Pulled By: alexeib fbshipit-source-id: b01994441fda12eafd4e465d147047c6e84a8335	2020-10-24 21:20:05 -07:00
alexeib	c147060598	add new w2v models (#1373 ) Summary: update readme to add new wav2vec models (incl w/ self training) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1373 Reviewed By: michaelauli Differential Revision: D24524182 Pulled By: alexeib fbshipit-source-id: c918971f8009b11855908e71bfcc247cf6776a8f	2020-10-24 10:21:31 -07:00
Shashank Jain	4b0cf6649b	Revert "Fix deprecated usage of nonzero()" Summary: Reverting the diff because it has already been fixed in https://github.com/pytorch/pytorch/pull/45413 Reviewed By: myleott Differential Revision: D24511658 fbshipit-source-id: a5561dae50d69a03443ca8a60bebe2cd064e3ee0	2020-10-23 14:45:25 -07:00
alexeib	2409d5a36e	refactor dataclass related files, add proper types for static checkin… (#1371 ) Summary: - refactor dataclass/ hierarchy to make it a bit more sane (while avoiding circular references) - add top level FairseqConfig - change typehints to reflect the correct config type if it is known Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1371 Reviewed By: myleott Differential Revision: D24469026 Pulled By: alexeib fbshipit-source-id: 01f68918f761d51ec5216286b8959ad35f41a7b2	2020-10-23 00:07:33 -07:00
alexeib	cd2bba4419	rename remove_bpe to post_process; add aliasing (#1369 ) Summary: some binaries (e.g. speech based ones) used --post-process, some used --remove-bpe. --post-process seems more appropriate as it does more than just remove bpe at the moment. this renames remove_bpe to post_process, adds alias so existing command lines would work and adds checkpoint upgrades so they continue to work also. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1369 Reviewed By: myleott Differential Revision: D24465040 Pulled By: alexeib fbshipit-source-id: 1b3e388291ccc403e76e069ef6606b80ead863a7	2020-10-22 16:31:49 -07:00
Myle Ott	e0737c3c29	Dynamically generate versions based on commit hash (#2774 ) Summary: This will produce version strings like `1.0.0a0+3065963`, similar to PyTorch version strings. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2774 Reviewed By: alexeib Differential Revision: D24453517 Pulled By: myleott fbshipit-source-id: 03a0c324ed6124bbc513ba7edc954abd71d63a0f	2020-10-22 12:51:04 -07:00
Myle Ott	b8a938e96e	BART hub fixes + improvements (#1342 ) Summary: - Make BART hub interface extend from GeneratorHubInterface (fixes #1748) - Add mask filling interface for BART Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1342 Reviewed By: ngoyal2707 Differential Revision: D24264195 Pulled By: myleott fbshipit-source-id: 0885f90a54fabe1672b1bfe137dfbccbc5d25d0e	2020-10-22 12:45:49 -07:00
alexeib	f0fcb55d5b	fix #2764 (#1368 ) Summary: fix interactive.py + add args from tasks before registries (where we catch errors) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1368 Reviewed By: myleott Differential Revision: D24462871 Pulled By: alexeib fbshipit-source-id: 307b829c935aa5061bdd79d8cc339eaf87fd8845	2020-10-22 12:19:31 -07:00
Myle Ott	11aaffdd18	rm FairseqModel::upgrade_args, it's not needed anymore (#1363 ) Summary: Tests seems to pass without it, so let's remove it Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1363 Reviewed By: alexeib Differential Revision: D24452369 Pulled By: myleott fbshipit-source-id: 186933ff3ee16be61c77a9581658db8e853c1baa	2020-10-22 12:05:39 -07:00
Chau Tran	31c23baafc	Fix fairseq/criss README Summary: Add requirements, fix wrong command Reviewed By: tangyuq Differential Revision: D24452748 fbshipit-source-id: 4837610ea7e5b5df8caecc685226080cafddb3e0	2020-10-22 11:30:54 -07:00
alexeib	18cadab1d0	support new cfg based models; make sure --normalize is consistent in … (#1370 ) Summary: support new cfg based models; make sure --normalize is consistent in infer with the model Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1370 Reviewed By: myleott Differential Revision: D24467698 Pulled By: alexeib fbshipit-source-id: 056b3608e3c1fe8acdb3e45e0306de5d874cb4d1	2020-10-22 07:08:54 -07:00
Pavel Soriano	751bcbfcb9	Changed EnvironmentError to RuntimeError in get_from_cache (#2767 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? No need I believe - [x] Did you write any new necessary tests? No ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/2724 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Yes! It is not a big PR at all but it allowed me to familiarize with the caching/downloading logic used in fairseq (which is very similar to that used in pytorch/transformers) Pull Request resolved: https://github.com/pytorch/fairseq/pull/2767 Reviewed By: edunov Differential Revision: D24456055 Pulled By: myleott fbshipit-source-id: bc634a9b97f957ecc5a8da57b112ff892e492107	2020-10-22 06:31:07 -07:00
Myle Ott	43c69a7666	Fix deprecated usage of nonzero() (#1364 ) Summary: PyTorch requires the `as_tuple` argument now, otherwise it prints warnings. Let's just fix this everywhere Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1364 Reviewed By: edunov Differential Revision: D24452587 Pulled By: myleott fbshipit-source-id: 7e6d424792ffec74a6197b2a266600cb13f24770	2020-10-22 06:28:27 -07:00
Myle Ott	0f44e89c38	Fix Latent Depth args (#1365 ) Summary: Args should be registered in the Model rather than modules Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1365 Reviewed By: pipibjc Differential Revision: D24453007 Pulled By: myleott fbshipit-source-id: d22b0d86a3c940456b394b005acab4bb6a3f5bed	2020-10-21 15:32:28 -07:00
Changhan Wang	ee450dde19	S2T multilingual example + bug fix Summary: * S2T multilingual example on MuST-C * A bug fix for `speech_to_text_dataset` (for multilingual setting) Reviewed By: jmp84 Differential Revision: D24339394 fbshipit-source-id: ef0c0be08137884897b532e45ebc56551d20be48	2020-10-21 08:10:47 -07:00
Myle Ott	eece1d7082	More detailed error message for data iterator size mismatch (#2768 ) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/2768 Reviewed By: vimalmanohar Differential Revision: D24446804 Pulled By: myleott fbshipit-source-id: 19220f2fd3e3db49f7528f6fb17188834b09646f	2020-10-21 07:47:54 -07:00
Myle Ott	9b0611e678	Fix torch.hub (fixes #2756 ) (#2762 ) Summary: Typically `torch.hub.load(...)` doesn't call `pip install`, so our Cython components never get built. We have a hack in our hubconf that builds these components by running the equivalent of `python setup.py build_ext --inplace` using the setuptools sandbox: `f6677b6755/hubconf.py (L52-L55)`. Unfortunately, this sandbox gets mad if you modify the filesystem, which is what this recent change does: `f6677b6755/setup.py (L203-L205)`. Combined this breaks torch.hub. The solution is that when we're doing `build_ext`, don't setup the symlinks. This is fine, since `build_ext` doesn't actually build a package, so we don't care about including config or examples. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2762 Reviewed By: alexeib Differential Revision: D24430228 Pulled By: myleott fbshipit-source-id: e05d075a003ddfde196cb8a86b32882d73808015	2020-10-20 15:46:55 -07:00
freewym	d6f2c907be	remove unnecessary logging configs (#2733 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? It's sufficient to set logging.basicConfig in the most outside calling code like train.py or generate.py. Actually the setting of logging.basicConfig () (like [here](https://github.com/pytorch/fairseq/blob/master/fairseq_cli/generate.py#L54)) will been overwritten if logging.basicConfig is set in the inner part of the whole code. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2733 Reviewed By: alexeib Differential Revision: D24418987 Pulled By: myleott fbshipit-source-id: 862d200023357de8947799f380e513f4c411b143	2020-10-20 15:46:48 -07:00
Xu Song	8248a12a64	Upgrade args: max_sentences to batch_size (#2754 ) Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Upgrade args: `max_sentences` to `batch_size` ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/2754 Reviewed By: alexeib Differential Revision: D24418980 Pulled By: myleott fbshipit-source-id: 5269c2fc8c434513cc5114f7e9d2eccd0c553fbd	2020-10-20 15:42:54 -07:00
Alexei Baevski	f6677b6755	fix #2761 , #2760 Summary: Fixes issue #2761 and #2760 args from registries were not added to argparse Reviewed By: myleott Differential Revision: D24422792 fbshipit-source-id: c8a8e835965da5c4f527bd589bd621371441e7fe	2020-10-20 13:44:42 -07:00
alexeib	3b27ed7996	Enable Hydra configs in fairseq (#1343 ) (#1510 ) Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1510 this is the main pr that switches on hydra functionality in fairseq we migrate "args" object into omegaconf "DictConfig" at all legacy entry points in addition this migrates various components from secondary registries (like bpe encoders and tokenizers) to make the migration smoother i am going through code that references migrated fairseq components and changing it to inherit from "Legacy*" components instead. hopefully tests will catch most of this Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1343 Reviewed By: myleott Differential Revision: D23973928 Pulled By: alexeib fbshipit-source-id: dd9554981fff51ea75c1ff343874d1d6e61793c9	2020-10-20 00:32:26 -07:00
Alexei Baevski	c76cb6dfb9	composite criterion should still use legacy criterion as it will break with subsequent diff Summary: see title Reviewed By: myleott Differential Revision: D24393903 fbshipit-source-id: 4b972b8150c7228fb32977675c6c60b13d5194d0	2020-10-19 20:17:11 -07:00
Myle Ott	de5c2cb35a	Fix model parallel LM (#1358 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1358 Reviewed By: alexeib Differential Revision: D24393064 Pulled By: myleott fbshipit-source-id: ee88fd1e7b203d7df6b7a65d3b1b1469e8fe9b6e	2020-10-19 14:15:02 -07:00
Myle Ott	9b8b464070	Package config and examples with fairseq (#1356 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1356 Reviewed By: alexeib Differential Revision: D24385688 Pulled By: myleott fbshipit-source-id: 72c4a702d93d2854a6409d42913d7413207cb61e	2020-10-19 09:24:04 -07:00
Angela Fan	e3168f74a8	minor fix for linter (#1360 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1360 Reviewed By: myleott Differential Revision: D24393217 Pulled By: huihuifan fbshipit-source-id: a110ef6958b1e15cd8c4e23b610db5cfc994f06d	2020-10-19 09:11:03 -07:00
Shruti Bhosale	65e11a37d5	Readme with instructions to generate and evaluate with a 12B model (#1351 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1351 Reviewed By: edunov Differential Revision: D24386349 Pulled By: huihuifan fbshipit-source-id: ade362d7cb64e24e6b2689ba87c53636073d2246	2020-10-19 06:11:59 -07:00
Myle Ott	a48f235636	Apply black+isort (#1357 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357 Reviewed By: alexeib Differential Revision: D24377772 fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c	2020-10-18 18:14:51 -07:00

1 2 3 4 5 ...

1580 Commits