fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-19 13:17:39 +03:00

Author	SHA1	Message	Date
Myle Ott	864b89d044	Online backtranslation module Co-authored-by: liezl200 <lie@fb.com>	2018-09-25 17:36:43 -04:00
Myle Ott	6edf81ddfe	Remove more Variable() calls (#198 )	2018-06-25 12:23:04 -04:00
Myle Ott	74efc21403	Fix attention order in unit tests (fixes #195 ) (#197 )	2018-06-25 12:16:10 -04:00
Myle Ott	6ec5022e57	Move reorder_encoder_out to FairseqEncoder and fix non-incremental decoding	2018-06-21 14:58:50 -04:00
Myle Ott	ff68a9ef50	Add FairseqTask A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss. Changes: - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator. - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position - Remove LEFT_PAD_* constants and make them configurable per task	2018-06-15 13:05:22 -06:00
alexeib	4c2ef2de74	Conv lm implementation This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf There are 3 modes for constructing batches: - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper - eos: one sentence per sample (skip blank lines) some results: GCNN-13 - GBW - 37.46 GCNN-14B - GBW - 33.88 GCNN-8 - Wiki103 - 43.76 GCNN-14 - Wiki103 - 35.66 train: python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500 eval: python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)3+(850,1)+(850,5)4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'	2018-06-15 13:05:16 -06:00
Myle Ott	8afb77612c	Fix tests	2018-06-15 13:05:11 -06:00
Myle Ott	d3795d6cd1	Merge internal changes (#136 ) Changes: - `7d19e36`: Add `--sampling` flag to generate.py to sample instead of doing beam search - `c777340`: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model - `3ea882c`: Add `--max-update` option to train.py to stop training after a given number of updates - small bugfixes for distributed training, LSTM, inverse square root LR scheduler	2018-04-02 10:13:07 -04:00
Myle Ott	e73fddf453	Filter padding properly in LabelSmoothedCrossEntropyCriterion (#229 )	2018-03-05 14:20:29 -08:00
Myle Ott	9438019ff0	Refactor incremental generation to be more explicit and less magical (#222 )	2018-02-27 14:28:24 -08:00
Myle Ott	6641520612	fairseq-py goes distributed (#106 ) This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes. Changes: - `c7033ef`: add support for distributed training! See updated README for usage. - `e016299`: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc. - `154e440`: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf - `90c2973` and `1da6265`: improve unit test coverage	2018-02-27 17:09:42 -05:00

11 Commits