fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 20:10:40 +03:00

History

Sergey Edunov 1082ba352c Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 - no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq		2018-09-25 17:36:43 -04:00
..
_static	Add documentation	2018-09-03 19:15:23 -04:00
command_line_tools.rst	Add documentation	2018-09-03 19:15:23 -04:00
conf.py	Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0	2018-09-25 17:36:43 -04:00
criterions.rst	Add documentation	2018-09-03 19:15:23 -04:00
data.rst	Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0	2018-09-25 17:36:43 -04:00
docutils.conf	Add documentation	2018-09-03 19:15:23 -04:00
getting_started.rst	Fix docs	2018-09-17 22:34:17 -07:00
index.rst	Add documentation	2018-09-03 19:15:23 -04:00
lr_scheduler.rst	Add documentation	2018-09-03 19:15:23 -04:00
make.bat	Add documentation	2018-09-03 19:15:23 -04:00
Makefile	Add documentation	2018-09-03 19:15:23 -04:00
models.rst	Add documentation	2018-09-03 19:15:23 -04:00
modules.rst	Add documentation	2018-09-03 19:15:23 -04:00
optim.rst	Add documentation	2018-09-03 19:15:23 -04:00
overview.rst	Add documentation	2018-09-03 19:15:23 -04:00
requirements.txt	Update documentation	2018-09-03 20:03:37 -04:00
tasks.rst	Add documentation	2018-09-03 19:15:23 -04:00
tutorial_classifying_names.rst	Add documentation	2018-09-03 19:15:23 -04:00
tutorial_simple_lstm.rst	Add documentation	2018-09-03 19:15:23 -04:00