fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

Author	SHA1	Message	Date
Myle Ott	b41c74dc5b	Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473 ) Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9	2019-01-25 15:40:26 -08:00
Davide Caroselli	ebaf8c5030	'--user-dir' documentation (correct) (#447 ) Summary: Command line option --user-dir documented in docs/overview.rst Pull Request resolved: https://github.com/pytorch/fairseq/pull/447 Differential Revision: D13674744 Pulled By: myleott fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde	2019-01-15 11:54:17 -08:00
Myle Ott	14bd9c62a3	Update docs for --lazy-load and torch.distributed.launch Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433 Differential Revision: D13588032 Pulled By: myleott fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81	2019-01-07 15:28:09 -08:00
Myle Ott	7633129ba8	Merge internal changes (#283 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5	2019-01-04 20:03:19 -08:00
Sergey Edunov	1082ba352c	Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 - no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq	2018-09-25 17:36:43 -04:00
Sergey Edunov	fe2d1581a4	Fix docs	2018-09-17 22:34:17 -07:00
Myle Ott	4a47b88992	Update documentation	2018-09-03 20:03:37 -04:00
Myle Ott	6381cc977f	Add documentation	2018-09-03 19:15:23 -04:00

8 Commits