fairseq

mirror of https://github.com/facebookresearch/fairseq.git synced 2024-08-16 20:10:40 +03:00

Author	SHA1	Message	Date
Pierre Andrews	f591cc94ca	upgrade black for lints (#3004 ) Summary: This is the same as https://github.com/fairinternal/fairseq-py/issues/3003 but for main instead of gshard. the lint test will run the latest version of black, which is 22.1.0 right now and seems to be incompatible with the 21.12b0 version that is setup in pre-commit. This means that some files were with valid format in the past, but are not anymore... This PR formats these files with 22.1.0 and autoupdates pre-commit config to use that black version too. (note: this is the second time it happens. a solution would be to pin the lint test to the same version as the one in the pre-commit hook and that was used to format everything clean so that we have a stable formating) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3004 Reviewed By: dianaml0 Differential Revision: D33917490 Pulled By: Mortimerp9 fbshipit-source-id: d55e800b976f94545cdab4132daa7c45cbd0e34c	2022-02-02 04:31:33 -08:00
Sam Shleifer	2be2f3c7c1	Plasma tests: ask for less disk (#1893 ) Summary: Old logs: ``` /arrow/cpp/src/plasma/store.cc:1274: Allowing the Plasma store to use up to 107.374GB of memory. ``` New logs: ``` ... up to 1e-05GB of memory. ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1893 Reviewed By: myleott Differential Revision: D28641488 Pulled By: sshleifer fbshipit-source-id: 3373526042cdcbf434c61790be62a09f15e6ad06	2021-05-24 09:00:18 -07:00
Sam Shleifer	da0432a3cd	MultiGPU test and --log-file workaround (#1793 ) Summary: The initial problem I set out to solve was that it's not easy to add a multigpu test. I solved that problem but it ruined log capturing, both with `self.assertLogs` and `with contextlib.redirect_stdout(StringIO())`. After some brief digging, I gave up on trying to get those to work, and added support for `--log-file AGI_v0.log` which will write the `progress_bar.log()` statements to `log-file` as well as `stdout`. This functionality is used by the resumption test. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1793 Reviewed By: myleott Differential Revision: D27671192 Pulled By: sshleifer fbshipit-source-id: bcba5f9df7a965889a4cd6993f7eeb0f14b770c6	2021-04-21 06:39:00 -07:00
Sam Shleifer	2235f86b40	PlasmaView: don't materialize array in memory (#1645 ) Summary: ### Changes: - `PlasmaArray` saves the underlying data to `self.array`, `PlasmaView` never does that, instead it fetches the data from `plasma_store` shared memory when it is needed. - `PlasmaArray` starts a new, ephemeral plasma_store and puts a new array in it when it is pickled. If `--use-plasma-view`, there is one server started before `spawn` and arrays are only put into it once, in `PlasmaArray.__init__` to accommodate this. - user can now pass `--plasma-path` to explicitly control where server is started. - We now make plasma keys based on `(split_path, (block_size, document_sep_len, str(break_mode), len(dataset)))`, so two jobs sharing plasma server but with different datasets, or same dataset but different clargs, will read each the other's array. ### Results [pre March 1] This saves some CPU memory (5-15%), according to both `psutil` and `psrecord`: here we run base_cmd (below) with num_workers=0,2,8, 2 GPUS and collect the logs. `branch` refers to `--use-plasma-view`, `master` uses `PlasmaArray` ``` +-------------------------+----------------+---------+-------+ \| setting \| cpu_mem_used \| wps \| ppl \| +=========================+================+=========+=======+ \| branch_nw0_gpu2_ddm.log \| 12 \| 55143.2 \| 429.1 \| +-------------------------+----------------+---------+-------+ \| branch_nw2_gpu2_ddm.log \| 13.67 \| 43377.6 \| 429.1 \| +-------------------------+----------------+---------+-------+ \| branch_nw8_gpu2_ddm.log \| 18.36 \| 53019.9 \| 429.1 \| +-------------------------+----------------+---------+-------+ \| master_nw0_gpu2_ddm.log \| 12.26 \| 56733 \| 429.1 \| +-------------------------+----------------+---------+-------+ \| master_nw2_gpu2_ddm.log \| 14.58 \| 53337.9 \| 429.1 \| +-------------------------+----------------+---------+-------+ \| master_nw8_gpu2_ddm.log \| 21.1 \| 53217.2 \| 429.1 \| +-------------------------+----------------+---------+-------+ ``` ### Replication 1) get this branch ```bash git fetch && git checkout share-plasma-server ``` 2) Train tiny model and save logs ```bash base_cmd () { fairseq-train --fp16 /private/home/sshleifer/data-bin/stories_mmap \ --task language_modeling \ --arch transformer_lm_gpt2_tiny \ --sample-break-mode complete --tokens-per-sample 512 \ --optimizer adam --clip-norm 0.0 --lr 0.0005 \ --batch-size 1 \ --max-update 200 --max-epoch 1 \ --log-format simple --log-interval 100 \ --restore-file x.pt --no-save \ --skip-invalid-size-inputs-valid-test --disable-validation $@ } USE_LOCK=1 CUDA_VISIBLE_DEVICES=0,1 base_cmd --num-workers 0 --use-plasma-view \| tee branch_nw0_gpu2_ddm.log ``` ### TODO: - [x] test larger dataset - [x] make it optional, cleanup - [x] 1 GPU - [x] unit-tests - [x] ask hashing Q on stackoverflow https://stackoverflow.com/questions/66354598/deterministic-method-to-hash-np-array-int - [ ] measure whether `PlasmaArray` disable for small array's logic helps - [ x] test with fb_sweep - [ x] measure 4 GPU savings Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1645 Test Plan: Read github PR description: https://github.com/fairinternal/fairseq-py/pull/1645 Reviewed By: myleott Differential Revision: D26630365 Pulled By: sshleifer fbshipit-source-id: b0c4163fbc97a7aefb116de70265fba11f6d7b42	2021-03-12 12:31:12 -08:00

4 Commits