Myle Ott
be3515b289
More fully deprecate --raw-text and --lazy-load ( fixes #1488 )
...
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/947
Differential Revision: D19084273
Pulled By: myleott
fbshipit-source-id: de80d9abfac8e3d813a9c9b343b41327c500344e
2019-12-16 17:22:11 -08:00
Myle Ott
df2f84ce61
v0.8.0 -> v0.9.0 ( #1452 )
...
Summary:
Possibly breaking changes:
- Set global numpy seed (4a7cd58
)
- Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e
)
- TransformerEncoder returns namedtuples instead of dict (27568a7
)
New features:
- Add `--fast-stat-sync` option (e1ba32a
)
- Add `--empty-cache-freq` option (315c463
)
- Support criterions with parameters (ba5f829
)
New papers:
- Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9
)
- Levenshtein Transformer (86857a5
, ...)
- Cross+Self-Attention for Transformer Models (4ac2c5f
)
- Jointly Learning to Align and Translate with Transformer Models (1c66792
)
- Reducing Transformer Depth on Demand with Structured Dropout (dabbef4
)
- Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea
)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda
)
- CamemBERT: a French BERT (b31849a
)
Speed improvements:
- Add CUDA kernels for LightConv and DynamicConv (f840564
)
- Cythonization of various dataloading components (4fc3953
, ...)
- Don't project mask tokens for MLM training (718677e
)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1452
Differential Revision: D18798409
Pulled By: myleott
fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727
2019-12-03 15:19:33 -08:00
Kevin
13d9e2baf8
Fix changes of file locations of subword-nmt ( #1219 )
...
Summary:
Solves https://github.com/pytorch/fairseq/issues/1218 .
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1219
Differential Revision: D18339541
Pulled By: myleott
fbshipit-source-id: 6d5bd7b60fa7fd30c038fdad54591343a01f228b
2019-11-07 09:08:29 -08:00
Myle Ott
a0f75996b1
Fix building of docs
...
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340
Differential Revision: D18289455
Pulled By: myleott
fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378
2019-11-02 16:52:50 -07:00
Zhanghao Wu
2314979ea5
Update getting_started.rst ( #1188 )
...
Summary:
Hi,
I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188
Differential Revision: D17627778
Pulled By: myleott
fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69
2019-09-27 07:27:28 -07:00
Jerry Ma
3f4fc50163
Miscellaneous documentation improvements: ( #868 )
...
Summary:
- More clearly document the correspondence between FairseqAdam and torch.optim.AdamW
- Add ResamplingDataset to Sphinx docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/868
Differential Revision: D17523244
Pulled By: jma127
fbshipit-source-id: 8e7b34b24889b2c8f70b09a52a625d2af135734b
2019-09-23 12:27:12 -07:00
Myle Ott
ffffe04ea1
v0.7.2 -> v0.8.0 ( #1017 )
...
Summary:
Changelog:
- Relicensed under MIT license
- Add RoBERTa
- Add wav2vec
- Add WMT'19 models
- Add initial ASR code
- Changed torch.hub interface (`generate` renamed to `translate`)
- Add `--tokenizer` and `--bpe`
- f812e52
: Renamed data.transforms -> data.encoders
- 654affc
: New Dataset API (optional)
- `47fd985`: Deprecate old Masked LM components
- `5f78106`: Set mmap as default dataset format and infer format automatically
- Misc fixes for sampling
- Misc fixes to support PyTorch 1.2
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017
Differential Revision: D16799880
Pulled By: myleott
fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7
2019-08-14 05:02:45 -07:00
Myle Ott
8835d93cf0
Standardize on 'teacher forcing' rather than 'input feeding' which is… ( #769 )
...
Summary:
Input feeding generally refers to a slightly different concept
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769
Differential Revision: D16491898
Pulled By: myleott
fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287
2019-07-25 07:24:07 -07:00
Myle Ott
8af5554269
Improve interactive generation (support --tokenizer and --bpe)
...
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734
Differential Revision: D16377044
Pulled By: myleott
fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7
2019-07-19 06:45:18 -07:00
Myle Ott
b002d0096e
v0.7.1 -> v0.7.2 ( #891 )
...
Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/891
Differential Revision: D16377132
Pulled By: myleott
fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7
2019-07-19 06:33:40 -07:00
Myle Ott
881381cfc7
v0.7.1: fix PyPI setup and tests
...
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818
Differential Revision: D15916265
Pulled By: myleott
fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86
2019-06-20 06:28:37 -07:00
Myle Ott
bd710e75ae
v0.7.0 ( #817 )
...
Summary:
Notable (possibly breaking) changes:
- d45db80
: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c2
: Move LM definitions into separate files
- dffb167
: Updates to model API:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- 34726d5
: Move `distributed_init` into `DistributedFairseqModel`
- cf17068
: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db80
: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
- 96ac28d
: Rename `--sampling-temperature` -> `--temperature`
- fc1a19a
: Deprecate dummy batches
- a1c997b
: Add memory mapped datasets
- 0add50c
: Allow cycling over multiple datasets, where each one becomes an "epoch"
Plus many additional features and bugfixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/817
Differential Revision: D15913844
Pulled By: myleott
fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
2019-06-19 19:08:50 -07:00
Myle Ott
dffb167449
Updates to model API ( #561 )
...
Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561
Differential Revision: D15271142
Pulled By: myleott
fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
2019-05-15 07:12:41 -07:00
zhiqiang
d0577ba7a5
Fix option in docs ( #735 )
...
Summary:
`--output-format` -> `--dataset-impl` in Tutorial: Classifying Names with a Character-Level RNN
Pull Request resolved: https://github.com/pytorch/fairseq/pull/735
Differential Revision: D15314625
Pulled By: myleott
fbshipit-source-id: 65b8efd1a367ca754e5b9dca088aefbc648864dd
2019-05-12 16:37:59 -07:00
Myle Ott
d45db80431
Merge internal changes ( #654 )
...
Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654
Differential Revision: D15041794
Pulled By: myleott
fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
2019-04-29 19:50:58 -07:00
Myle Ott
e6422528da
0.6.1 -> 0.6.2 ( #577 )
...
Summary:
Changelog:
- 998ba4f
: Add language models from Baevski & Auli (2018)
- 4294c4f
: Add mixture of experts code from Shen et al. (2019)
- 0049349
: Add example for multilingual training
- 48d9afb
: Speed improvements, including fused operators from apex
- 44d27e6
: Add Tensorboard support
- d17fa85
: Add Adadelta optimizer
- 9e1c880
: Add `FairseqEncoderModel`
- b65c579
: Add `FairseqTask.inference_step` to modularize generate.py
- 2ad1178
: Add back `--curriculum`
- Misc bug fixes and other features
Pull Request resolved: https://github.com/pytorch/fairseq/pull/577
Differential Revision: D14481233
Pulled By: myleott
fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
2019-03-15 10:27:01 -07:00
Vladimir Karpukhin
f296824f40
Move string line encoding logic from tokenizer to Dictionary (unified diff). ( #541 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541
Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo
Differential Revision: D14251048
fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8
2019-02-28 09:19:12 -08:00
Myle Ott
fbd4cef9a5
Add fairseq to PyPI ( #495 )
...
Summary:
- fairseq can now be installed via pip: `pip install fairseq`
- command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/495
Differential Revision: D14017761
Pulled By: myleott
fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
2019-02-08 22:03:29 -08:00
Myle Ott
b41c74dc5b
Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" ( #473 )
...
Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473
Differential Revision: D13819717
Pulled By: myleott
fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
2019-01-25 15:40:26 -08:00
Davide Caroselli
ebaf8c5030
'--user-dir' documentation (correct) ( #447 )
...
Summary:
Command line option --user-dir documented in docs/overview.rst
Pull Request resolved: https://github.com/pytorch/fairseq/pull/447
Differential Revision: D13674744
Pulled By: myleott
fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde
2019-01-15 11:54:17 -08:00
Myle Ott
14bd9c62a3
Update docs for --lazy-load and torch.distributed.launch
...
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433
Differential Revision: D13588032
Pulled By: myleott
fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81
2019-01-07 15:28:09 -08:00
Myle Ott
7633129ba8
Merge internal changes ( #283 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283
Pull Request resolved: https://github.com/pytorch/fairseq/pull/428
Differential Revision: D13564190
Pulled By: myleott
fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
2019-01-04 20:03:19 -08:00
Sergey Edunov
1082ba352c
Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
...
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
2018-09-25 17:36:43 -04:00
Sergey Edunov
fe2d1581a4
Fix docs
2018-09-17 22:34:17 -07:00
Myle Ott
4a47b88992
Update documentation
2018-09-03 20:03:37 -04:00
Myle Ott
6381cc977f
Add documentation
2018-09-03 19:15:23 -04:00