fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-09-11 17:25:31 +03:00

Author	SHA1	Message	Date
Changhan Wang	0ac3f3270c	add TTS Summary: [fairseq-py] add TTS Reviewed By: wnhsu Differential Revision: D30720666 fbshipit-source-id: b5288acec72bea1d3a9f3884a4ed51b616c7a403	2021-09-13 18:13:45 -07:00
Pierre Andrews	68a81202a3	Indexed Huffman Coded dataset (#2029 ) Summary: ## What does this PR do? Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk. When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies. In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols. In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it. Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space. ## Follow Ups currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing: ``` code_builder = HuffmanCodeBuilder() with open(sample_file, 'r', encoding="utf-8") as input: for line in input: code_builder.add(*line.strip().split(" ")) coder = code_builder.build_code() with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder: with open(dataset_file, 'r', encoding="utf-8") as input: for line in input: builder.add_item(line.strip().split(' ')) ``` a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2029 Reviewed By: dianaml0 Differential Revision: D29557468 Pulled By: Mortimerp9 fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe	2021-08-31 01:12:35 -07:00
Omry Yadan	53802e7812	Compatibility fix with Hydra 1.1 (#3722 ) Summary: One of the changes in Hydra 1.1 is that the default composition order is changing. This is documented [here](https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order). In Hydra 1.1, a config is overriding values introduced by the defaults list while in Hydra 1.0 - the defaults list was overriding the values in the config. fairseq is currently depending on the previous behavior: The class `FairseqConfig` defines config values, and it's expecting them to be overridden by the defaults list. This result in a different config being created when running `fairseq_cli/hydra_train.py` with Hydra 1.0 and with 1.1. Hydra 1.1 introduced the `_self_` keyword in the defaults list to control the composition order. In order to achieve the behavior of Hydra 1.0, `_self_` should be added as the first item in the defaults list. To allow for a smoother migration, Hydra 1.0 is ignoring `_self_` starting from 1.0.7 (previous versions will issue an error). This diff adds `_self_` as the first item in the defaults list the fairseq config, and introduce a dependency a Hydra 1.0 version that is equal or newer to 1.0.7. ### Testing: I ensured that the following yield the same composed config: Default config with Hydra 1.0.6, 1.0.7 and 1.1.0 `examples/wav2vec/config/finetuning/base_10h.yaml` with Hydra 1.0.6, 1.0.7 and 1.1.0. This can be achieved by outputing the generated config using `--cfg job` and compating the outputs. Pull Request resolved: https://github.com/pytorch/fairseq/pull/3722 Reviewed By: dianaml0 Differential Revision: D29917677 Pulled By: jieru-hu fbshipit-source-id: 7e645b83cccb03fc80a6702e302c4643d2b14a78	2021-07-26 16:36:43 -07:00
Michael Lewis	7dafb05754	BASE layers (#1654 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1654 Reviewed By: myleott Differential Revision: D27128074 Pulled By: shruti-bh fbshipit-source-id: ac86d383cd53c9c9bdd946fea839a37b719d95e3	2021-03-29 18:02:50 -07:00
Frankie Robertson	61e46bb997	Fix attempt to unlink directory copied into source package (Python 3.9) (#3235 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [N/A] Did you make sure to update the docs? - [N/A] Did you write any new necessary tests? ## What does this PR do? Currently when installing the newest source package from PyPI I get an error like so: ``` Collecting fairseq Using cached fairseq-0.10.2.tar.gz (938 kB) Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /home/frankier/sources/datasets/.venv/bin/python3 /tmp/tmp_ujftsgi_in_process.py get_requires_for_build_wheel /tmp/tmpmn0eumq2 cwd: /tmp/pip-install-dg5d6q9y/fairseq Complete output (31 lines): Traceback (most recent call last): File "setup.py", line 214, in <module> do_setup(package_data) File "setup.py", line 136, in do_setup setup( File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 152, in setup _install_setup_requires(attrs) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 147, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 60, in fetch_build_eggs raise SetupRequirementsError(specifier_list) setuptools.build_meta.SetupRequirementsError: ['cython', 'numpy', 'setuptools>=18.0'] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/tmp_ujftsgi_in_process.py", line 280, in <module> main() File "/tmp/tmp_ujftsgi_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/tmp/tmp_ujftsgi_in_process.py", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel return self._get_build_requires( File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-hag0sxvp/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 145, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 217, in <module> os.unlink(fairseq_examples) IsADirectoryError: [Errno 21] Is a directory: 'fairseq/examples' ---------------------------------------- ERROR: Command errored out with exit status 1: /home/frankier/sources/datasets/.venv/bin/python3 /tmp/tmp_ujftsgi_in_process.py get_requires_for_build_wheel /tmp/tmpmn0eumq2 Check the logs for full command output. ``` I believe the reason for this is that the source package contains the examples directory because it was put there during package creation (it seems the symlink because a directory). Now, when setup.py is run again, it seems the setup.py attempts to unlink the directory, which is not possible because only symlinks can be unlinked. This PR therefore only attempts to unlink it if it is a symlink. I have not thoroughly tested whether my proposed cause is the true cause, but this should fix it in any case. Note that the source package is fetched because there is no wheel for Python 3.9, so most users will not see this because they will use the wheel. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/pytorch/fairseq/pull/3235 Reviewed By: alexeib Differential Revision: D26513259 Pulled By: myleott fbshipit-source-id: 775d6c636a5867b9983bb6419829f13ee414e2fd	2021-02-20 06:23:45 -08:00
Myle Ott	cfbf0dddbc	Small changes to make tests more reliable (#1572 ) Summary: After this, `python setup.py test` should be more reliable (including when multiple GPUs are present) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1572 Reviewed By: alexeib Differential Revision: D25984113 Pulled By: myleott fbshipit-source-id: 7fef27ae90c079c07f592ed9fb350ccf8b56d23d	2021-01-21 07:33:54 -08:00
Sam Shleifer	bff7f85206	fastseq ngram blocking (#1509 ) Summary: Command: ```bash fairseq-generate \ ~myleott/data/data-bin/wmt16_en_de_bpe32k/ \ --path /checkpoint/myleott/s3/models/wmt16.en-de.joined-dict.transformer/model.pt \ --beam 4 --remove-bpe --lenpen 0.6 --batch-size 256 --no-repeat-ngram-size 3 \ --gen-subset test --fp16 ``` master/devfair: 297.8s (10.08 sentences/s, 286.47 tokens/s) branch/devfair: 31.9s (94.27 sentences/s, 2678.66 tokens/s) master/v100: 227.4s (13.21 sentences/s, 375.24 tokens/s) branch/v100: 13.1s (228.68 sentences/s, 6497.99 tokens/s) (all BLEU4=29.17) ### ToDo: - tests ### Future Work - test other fastseq proposed improvements. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1509 Reviewed By: myleott Differential Revision: D25587857 Pulled By: sshleifer fbshipit-source-id: d42af5c50e3f94c90e878f92da5ce5ef3fc8b988	2020-12-30 12:58:09 -08:00
Myle Ott	8d7ee5bf81	Fix hydra with Python 3.8 (#1511 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1511 Test Plan: Imported from OSS Reviewed By: alexeib Differential Revision: D25570468 Pulled By: myleott fbshipit-source-id: 98efc6983479e163e6cf0a7ef33decaa1bc429f1	2020-12-15 17:47:39 -08:00
Myle Ott	bc4ebcafb4	Fix tests (#1482 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1482 Reviewed By: michaelauli Differential Revision: D25318618 Pulled By: myleott fbshipit-source-id: bed171ffe5ca10e8359be96a15d0fe9bb1a630ea	2020-12-03 18:18:11 -08:00
Myle Ott	3b77a61600	Add fairseq-hydra-train and update docs (#1449 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1449 Test Plan: Imported from OSS Reviewed By: alexeib Differential Revision: D25094525 Pulled By: myleott fbshipit-source-id: 430387d11196d3292933bb168cf09ea16ebc0d3b	2020-11-20 06:00:59 -08:00
Myle Ott	41a61bd4e2	Add GitHub Action to build Python wheels (+ minor cleanup in build scripts) (#1447 ) Summary: Here's an example run in a forked repo: https://github.com/fairseq/fairseq/runs/1419699104 We can upload the wheels to PyPI to make `pip install fairseq` easier for folks. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1447 Reviewed By: lematt1991 Differential Revision: D25060753 Pulled By: myleott fbshipit-source-id: 9fdc28cc7c8a172daac668dd09684ec43e2ff11a	2020-11-18 14:31:23 -08:00
alexeib	09a5d864fc	move configs into fairseq dir (#1403 ) Summary: this way they get shipped together with fairseq package Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1403 Reviewed By: myleott Differential Revision: D24803076 Pulled By: alexeib fbshipit-source-id: a9aa6e47a8ef26fae4d54691f1616a721b8f6112	2020-11-06 23:27:32 -08:00
Myle Ott	e0737c3c29	Dynamically generate versions based on commit hash (#2774 ) Summary: This will produce version strings like `1.0.0a0+3065963`, similar to PyTorch version strings. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2774 Reviewed By: alexeib Differential Revision: D24453517 Pulled By: myleott fbshipit-source-id: 03a0c324ed6124bbc513ba7edc954abd71d63a0f	2020-10-22 12:51:04 -07:00
Myle Ott	9b0611e678	Fix torch.hub (fixes #2756 ) (#2762 ) Summary: Typically `torch.hub.load(...)` doesn't call `pip install`, so our Cython components never get built. We have a hack in our hubconf that builds these components by running the equivalent of `python setup.py build_ext --inplace` using the setuptools sandbox: `f6677b6755/hubconf.py (L52-L55)`. Unfortunately, this sandbox gets mad if you modify the filesystem, which is what this recent change does: `f6677b6755/setup.py (L203-L205)`. Combined this breaks torch.hub. The solution is that when we're doing `build_ext`, don't setup the symlinks. This is fine, since `build_ext` doesn't actually build a package, so we don't care about including config or examples. Pull Request resolved: https://github.com/pytorch/fairseq/pull/2762 Reviewed By: alexeib Differential Revision: D24430228 Pulled By: myleott fbshipit-source-id: e05d075a003ddfde196cb8a86b32882d73808015	2020-10-20 15:46:55 -07:00
Myle Ott	9b8b464070	Package config and examples with fairseq (#1356 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1356 Reviewed By: alexeib Differential Revision: D24385688 Pulled By: myleott fbshipit-source-id: 72c4a702d93d2854a6409d42913d7413207cb61e	2020-10-19 09:24:04 -07:00
Myle Ott	a48f235636	Apply black+isort (#1357 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1357 Reviewed By: alexeib Differential Revision: D24377772 fbshipit-source-id: 51581af041d42d62166b33a35a1a4228b1a76f0c	2020-10-18 18:14:51 -07:00
Changhan Wang	1d1c145387	speech-to-text OSS Summary: Imported from https://github.com/fairinternal/fairseq-py/pull/1284. Updated according to PR comments. Main changes: * New task: `fairseq.tasks.speech_to_text` * Multilingual support: multiple train sub-splits, temperature-based sampling, language ID tokens * New dataset: `fairseq.data.audio.speech_to_text_dataset` * Added accuracy metrics and BOS prefix removal to label smoothed cross entropy * New models: Transformer (`fairseq.models.speech_to_text.s2t_transformer`) and BLSTM (`fairseq.models.speech_to_text.berard`) * Extended scorers: * Added a base scorer class: `fairseq.scorers.BaseScorer` (the parent class for all scorers except the BLEU scorer in CPP) * Added an evaluation tokenizer: `fairseq.scorers.eval_tokenizer` which leverages sacreBLEU's built-in tokenizers and allows character-level tokenization as well as punctuation removal (for WER scoring). * Added chrF scorer: `fairseq.scorers.chrf` * Online Mel-filter bank speech feature extraction (via CPP-based pyKaldi or Python-based TorchAudio): `fairseq.data.audio.audio_utils` * Online speech feature transforms: `fairseq.data.audio.feature_transforms.` Fixed the subsampled sequence lengths in VGGTransformer (`examples.speech_recognition.models.vggtransformer`) * Examples under `examples/speech_to_text`: * LibriSpeech (ASR): better results than VGGTransformer with smaller Transformer-based models * MuST-C (ST): comparable to [SOTA results](https://arxiv.org/pdf/2004.10234.pdf) but with less tricks Reviewed By: jmp84 Differential Revision: D24065273 fbshipit-source-id: 5f842ca9c826f92d4af660705611885fe440a9ab	2020-10-14 12:30:05 -07:00
Myle Ott	f902a363ab	Small fixes (#1325 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1325 Reviewed By: ngoyal2707 Differential Revision: D24024198 Pulled By: myleott fbshipit-source-id: c3b776970d625eff21a26bf7c86cd28ef9e9d2ef	2020-10-02 10:51:09 -07:00
Mu Tian	e7f76c4481	hydra-fairseq - add dataclass Summary: hydra fairseq - add main common dataclasses as structured config Reviewed By: alexeib Differential Revision: D23375458 fbshipit-source-id: 4cb2802e523990d4e2b1a87e3cf1bc4dc852bc5b	2020-09-04 17:08:30 -07:00
alexeib	621e834103	wav2vec 2.0 (#1220 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1220 Test Plan: Please see examples/wav2vec/README.md for instructions Reviewed By: edunov Differential Revision: D22707565 Pulled By: alexeib fbshipit-source-id: 0c0d4ca7acc933ef7c0062f8dce550b94e414680	2020-08-04 14:19:56 -07:00
Myle Ott	e198482e71	Fix binaries in root dir (#995 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/995 The symlinks approach didn't work with `python train.py`. Differential Revision: D19451900 fbshipit-source-id: 2988eb48077cf8e0e078b9fca527a675132187db	2020-01-17 13:09:09 -08:00
Elijah Rippeth	cec0da2927	add other platforms to CI. (#1595 ) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Runs CI for `fairseq` on all major platforms provided by GitHub actions. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1595 Differential Revision: D19438282 Pulled By: myleott fbshipit-source-id: a64db46d7785e6f583848f27699f6463c4dc3170	2020-01-17 00:15:10 -08:00
Jiatao Gu	a316bd99b7	CUDA implementation of Levenshtein distance for NAT training (#960 ) Summary: ## What does this PR do? CUDA implementation for Levenshtein distance for NAT and other potential application. It will make training Levenshtein Transformer slightly faster and clean the functions. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/960 Test Plan: Imported from GitHub. Tested locally. Reviewed By: cndn Differential Revision: D19207096 Pulled By: MultiPath fbshipit-source-id: 4890bbaa851ffd302648c0d949173158dc3167e2	2019-12-21 02:45:15 -08:00
Myle Ott	05514f8a82	Update README to indicate we only support Python >= 3.6 (fixes #1317 ) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/952 Differential Revision: D19133348 Pulled By: myleott fbshipit-source-id: 51f96ddb13386143fe0088f19f7cb0674755811f	2019-12-16 19:46:53 -08:00
Myle Ott	df2f84ce61	v0.8.0 -> v0.9.0 (#1452 ) Summary: Possibly breaking changes: - Set global numpy seed (`4a7cd58`) - Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (`fdf4c3e`) - TransformerEncoder returns namedtuples instead of dict (`27568a7`) New features: - Add `--fast-stat-sync` option (`e1ba32a`) - Add `--empty-cache-freq` option (`315c463`) - Support criterions with parameters (`ba5f829`) New papers: - Simple and Effective Noisy Channel Modeling for Neural Machine Translation (`49177c9`) - Levenshtein Transformer (`86857a5`, ...) - Cross+Self-Attention for Transformer Models (`4ac2c5f`) - Jointly Learning to Align and Translate with Transformer Models (`1c66792`) - Reducing Transformer Depth on Demand with Structured Dropout (`dabbef4`) - Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (`e23e5ea`) - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (`a92bcda`) - CamemBERT: a French BERT (`b31849a`) Speed improvements: - Add CUDA kernels for LightConv and DynamicConv (`f840564`) - Cythonization of various dataloading components (`4fc3953`, ...) - Don't project mask tokens for MLM training (`718677e`) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1452 Differential Revision: D18798409 Pulled By: myleott fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727	2019-12-03 15:19:33 -08:00
Myle Ott	cb6c67bcdb	Make torch.hub interface automatically apply tokenization and BPE Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/926 Differential Revision: D18685772 Pulled By: myleott fbshipit-source-id: 0f99d79ed6ee72e9d3ced786d75ab9504d0dfcf0	2019-11-26 07:49:37 -08:00
Myle Ott	4d21c157ad	Have `setup.py clean` remove compiled Cython files Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/907 Differential Revision: D18480215 Pulled By: myleott fbshipit-source-id: b02002f631f6d47380f309d4f464bd135d623280	2019-11-13 10:51:22 -08:00
Myle Ott	a0f75996b1	Fix building of docs Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340 Differential Revision: D18289455 Pulled By: myleott fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378	2019-11-02 16:52:50 -07:00
Changhan Wang	86857a58bf	Levenshtein Transformer paper code Summary: Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006) * Added Levenshtein Transformer model, task and criterion class * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines * Add an option for prepending BOS to dictionary class and translation task class Reviewed By: myleott Differential Revision: D17297372 fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7	2019-09-27 13:58:45 -07:00
Naman Goyal	1f0f7cd82c	added cython to install_requires Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856 Reviewed By: myleott Differential Revision: D17162411 Pulled By: myleott fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18	2019-09-03 09:08:38 -07:00
Myle Ott	8d4588b1ba	Cleaner handling of numpy-based extensions in setup.py Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/853 Differential Revision: D17147879 Pulled By: myleott fbshipit-source-id: b1f5e838533de62ade52fa82112ea5308734c70f	2019-08-31 16:53:34 -07:00
Myle Ott	746e59a262	Improve support for `python setup.py build_ext --inplace` Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/852 Differential Revision: D17147452 Pulled By: myleott fbshipit-source-id: 5fd9c7da3cc019c7beec98d41db1aef1329ee57a	2019-08-31 13:44:22 -07:00
Myle Ott	d2410c4207	Minor cleanup for setup.py Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1078 Differential Revision: D17072514 Pulled By: myleott fbshipit-source-id: 69a8c8c9cc7caa7e04c414329a5d79e6e1a6621c	2019-08-27 10:07:40 -07:00
Naman Goyal	396ff7f59f	installing numpy headers for cython Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/848 Differential Revision: D17060283 fbshipit-source-id: c7e61cae76a0566cc3e2ddc3ab4d48f8dec9d777	2019-08-27 07:11:34 -07:00
Naman Goyal	8a8c0691ba	fix cython dependency in the setup (#847 ) Summary: Fixes broken build for `pytext` `4fc39538ae` Earlier version of setup tools required `cython` to be installed before even starting setup.py. This one fixes it. More details: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#180 and https://stackoverflow.com/questions/37471313/setup-requires-with-cython Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/847 Differential Revision: D16997450 fbshipit-source-id: 5f65026c228a1b94280ca73937078ee3e21ce4f8	2019-08-26 07:19:21 -07:00
Naman Goyal	4fc39538ae	Cythonize token block dataset (#834 ) Summary: Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds. TODO: 1) I think, I can make it 2x more faster. 2) cleanup. EDIT History: ~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered. This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes. myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~ Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834 Test Plan: Imported from GitHub, without a `Test Plan:` line. Test workflow: f133816198 Reviewed By: myleott Differential Revision: D16970257 Pulled By: myleott fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f	2019-08-23 07:32:36 -07:00
Myle Ott	ffffe04ea1	v0.7.2 -> v0.8.0 (#1017 ) Summary: Changelog: - Relicensed under MIT license - Add RoBERTa - Add wav2vec - Add WMT'19 models - Add initial ASR code - Changed torch.hub interface (`generate` renamed to `translate`) - Add `--tokenizer` and `--bpe` - `f812e52`: Renamed data.transforms -> data.encoders - `654affc`: New Dataset API (optional) - `47fd985`: Deprecate old Masked LM components - `5f78106`: Set mmap as default dataset format and infer format automatically - Misc fixes for sampling - Misc fixes to support PyTorch 1.2 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017 Differential Revision: D16799880 Pulled By: myleott fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7	2019-08-14 05:02:45 -07:00
Myle Ott	d015d23a1f	Add fairseq-validate Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765 Differential Revision: D16763357 Pulled By: myleott fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503	2019-08-13 13:07:04 -07:00
Myle Ott	abb7ed4c91	Update READMEs for torch.hub Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/795 Differential Revision: D16620488 Pulled By: myleott fbshipit-source-id: 1998a9ccd8816fc7f590861fb4898f910a36bc1e	2019-08-02 06:24:17 -07:00
Myle Ott	e75cff5f2c	Relicense fairseq under MIT license (#786 ) Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034	2019-07-30 07:48:23 -07:00
Myle Ott	b002d0096e	v0.7.1 -> v0.7.2 (#891 ) Summary: No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon. Pull Request resolved: https://github.com/pytorch/fairseq/pull/891 Differential Revision: D16377132 Pulled By: myleott fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7	2019-07-19 06:33:40 -07:00
Louis MARTIN	cc292afaed	Add specific compile flags for macOS (#862 ) Summary: Fairseq wouldn't install on macOS. A workaround was found here: https://github.com/pytorch/fairseq/issues/289 This is now automatic in setup.py, maybe be there's a cleaner way to do it. I checked that it compiles fine on Linux and macOS. Pull Request resolved: https://github.com/pytorch/fairseq/pull/862 Differential Revision: D16142105 Pulled By: myleott fbshipit-source-id: 998ac7781d7a1ac047f4f9239c1fe16eab4be0dd	2019-07-06 12:31:55 -07:00
Myle Ott	881381cfc7	v0.7.1: fix PyPI setup and tests Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818 Differential Revision: D15916265 Pulled By: myleott fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86	2019-06-20 06:28:37 -07:00
Myle Ott	bd710e75ae	v0.7.0 (#817 ) Summary: Notable (possibly breaking) changes: - `d45db80`: Remove checkpoint utility functions from utils.py into checkpoint_utils.py - `f2563c2`: Move LM definitions into separate files - `dffb167`: Updates to model API: - `FairseqModel` -> `FairseqEncoderDecoderModel` - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer` - `encoder_out_dict` -> `encoder_out` - rm unused `remove_head` functions - `34726d5`: Move `distributed_init` into `DistributedFairseqModel` - `cf17068`: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU) - `d45db80`: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed` - `96ac28d`: Rename `--sampling-temperature` -> `--temperature` - `fc1a19a`: Deprecate dummy batches - `a1c997b`: Add memory mapped datasets - `0add50c`: Allow cycling over multiple datasets, where each one becomes an "epoch" Plus many additional features and bugfixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/817 Differential Revision: D15913844 Pulled By: myleott fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b	2019-06-19 19:08:50 -07:00
Bairen Yi	a8f28ecb63	Python3.5 compat (#794 ) Summary: See #467. Ping myleott to review. This is a work-related contribution. Ping lark to review. Pull Request resolved: https://github.com/pytorch/fairseq/pull/794 Differential Revision: D15756816 Pulled By: myleott fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b	2019-06-11 04:10:08 -07:00
Myle Ott	66f033e6a2	Update setup.py Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/580 Differential Revision: D14494390 Pulled By: myleott fbshipit-source-id: 524cc16a106f2af630357e2ebdf7dde35fa7d494	2019-03-15 21:30:41 -07:00
Myle Ott	e6422528da	0.6.1 -> 0.6.2 (#577 ) Summary: Changelog: - `998ba4f`: Add language models from Baevski & Auli (2018) - `4294c4f`: Add mixture of experts code from Shen et al. (2019) - `0049349`: Add example for multilingual training - `48d9afb`: Speed improvements, including fused operators from apex - `44d27e6`: Add Tensorboard support - `d17fa85`: Add Adadelta optimizer - `9e1c880`: Add `FairseqEncoderModel` - `b65c579`: Add `FairseqTask.inference_step` to modularize generate.py - `2ad1178`: Add back `--curriculum` - Misc bug fixes and other features Pull Request resolved: https://github.com/pytorch/fairseq/pull/577 Differential Revision: D14481233 Pulled By: myleott fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7	2019-03-15 10:27:01 -07:00
Myle Ott	139e3a3c40	Add sacrebleu to requirements Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/542 Differential Revision: D14258895 Pulled By: myleott fbshipit-source-id: 950a840e1d001a472be8d4737c9e4de5224137b3	2019-02-28 07:54:28 -08:00
Myle Ott	b65c579bed	Modularize generate.py (#351 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/351 This makes it easier for tasks to plugin to generate.py/interactive.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/520 Differential Revision: D14183881 Pulled By: myleott fbshipit-source-id: ede5e53ddc1215ed3b12b8f1eba048c946913c33	2019-02-22 10:08:52 -08:00
Myle Ott	fbd4cef9a5	Add fairseq to PyPI (#495 ) Summary: - fairseq can now be installed via pip: `pip install fairseq` - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc. Pull Request resolved: https://github.com/pytorch/fairseq/pull/495 Differential Revision: D14017761 Pulled By: myleott fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235	2019-02-08 22:03:29 -08:00

1 2

60 Commits