Commit Graph

4 Commits

Author SHA1 Message Date
Pierre Andrews
f591cc94ca upgrade black for lints (#3004)
Summary:
This is the same as https://github.com/fairinternal/fairseq-py/issues/3003 but for main instead of gshard.

the lint test will run the latest version of black, which is 22.1.0 right now and seems to be incompatible with the 21.12b0 version that is setup in pre-commit. This means that some files were with valid format in the past, but are not anymore...

This PR formats these files with 22.1.0 and autoupdates pre-commit config to use that black version too.

(note: this is the second time it happens. a solution would be to pin the lint test to the same version as the one in the pre-commit hook and that was used to format everything clean so that we have a stable formating)

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/3004

Reviewed By: dianaml0

Differential Revision: D33917490

Pulled By: Mortimerp9

fbshipit-source-id: d55e800b976f94545cdab4132daa7c45cbd0e34c
2022-02-02 04:31:33 -08:00
Sam Shleifer
2be2f3c7c1 Plasma tests: ask for less disk (#1893)
Summary:
Old logs:
```
/arrow/cpp/src/plasma/store.cc:1274: Allowing the Plasma store to use up to 107.374GB of memory.
```

New logs:
```
... up to 1e-05GB of memory.
```

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1893

Reviewed By: myleott

Differential Revision: D28641488

Pulled By: sshleifer

fbshipit-source-id: 3373526042cdcbf434c61790be62a09f15e6ad06
2021-05-24 09:00:18 -07:00
Sam Shleifer
da0432a3cd MultiGPU test and --log-file workaround (#1793)
Summary:
The initial problem I set out to solve was that it's not easy to add a multigpu test. I solved that problem but it ruined log capturing, both with `self.assertLogs` and `with contextlib.redirect_stdout(StringIO())`.

After some brief digging, I gave up on trying to get those to work, and added support for `--log-file AGI_v0.log` which will write the `progress_bar.log()` statements to `log-file` as well as `stdout`. This functionality is used by the resumption test.

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1793

Reviewed By: myleott

Differential Revision: D27671192

Pulled By: sshleifer

fbshipit-source-id: bcba5f9df7a965889a4cd6993f7eeb0f14b770c6
2021-04-21 06:39:00 -07:00
Sam Shleifer
2235f86b40 PlasmaView: don't materialize array in memory (#1645)
Summary:
### Changes:
- `PlasmaArray` saves the underlying data to `self.array`, `PlasmaView` never does that, instead it fetches the data from `plasma_store` shared memory when it is needed.
- `PlasmaArray` starts a new, ephemeral plasma_store and puts a new array in it when it is pickled. If `--use-plasma-view`, there is one server started before `spawn` and arrays are only put into it once, in `PlasmaArray.__init__` to accommodate this.
- user can now pass `--plasma-path` to explicitly control where server is started.
- We now make plasma keys based on `(split_path, (block_size, document_sep_len, str(break_mode), len(dataset)))`, so two jobs sharing plasma server but with different datasets, or same dataset but different clargs, will read each the other's array.

### Results [pre March 1]
This saves some CPU memory (5-15%), according to both `psutil` and `psrecord`:
here we run base_cmd (below) with num_workers=0,2,8, 2 GPUS and collect the logs. `branch` refers to `--use-plasma-view`, `master` uses `PlasmaArray`

```
+-------------------------+----------------+---------+-------+
| setting                 |   cpu_mem_used |     wps |   ppl |
+=========================+================+=========+=======+
| branch_nw0_gpu2_ddm.log |          12    | 55143.2 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw2_gpu2_ddm.log |          13.67 | 43377.6 | 429.1 |
+-------------------------+----------------+---------+-------+
| branch_nw8_gpu2_ddm.log |          18.36 | 53019.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw0_gpu2_ddm.log |          12.26 | 56733   | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw2_gpu2_ddm.log |          14.58 | 53337.9 | 429.1 |
+-------------------------+----------------+---------+-------+
| master_nw8_gpu2_ddm.log |          21.1  | 53217.2 | 429.1 |
+-------------------------+----------------+---------+-------+
```

### Replication

1) get this branch
```bash
git fetch && git checkout share-plasma-server
```

2) Train tiny model and save logs

```bash

base_cmd () {
  fairseq-train --fp16 /private/home/sshleifer/data-bin/stories_mmap \
            --task language_modeling \
            --arch transformer_lm_gpt2_tiny \
            --sample-break-mode complete --tokens-per-sample 512 \
            --optimizer adam --clip-norm 0.0 --lr 0.0005 \
            --batch-size 1 \
            --max-update 200 --max-epoch 1 \
            --log-format simple --log-interval 100 \
            --restore-file x.pt --no-save \
            --skip-invalid-size-inputs-valid-test --disable-validation $@
}

USE_LOCK=1 CUDA_VISIBLE_DEVICES=0,1 base_cmd --num-workers 0 --use-plasma-view | tee branch_nw0_gpu2_ddm.log
```

### TODO:

- [x] test larger dataset
- [x] make it optional, cleanup
- [x] 1 GPU
- [x] unit-tests
- [x] ask hashing Q on stackoverflow https://stackoverflow.com/questions/66354598/deterministic-method-to-hash-np-array-int
- [ ] measure whether `PlasmaArray` disable for small array's logic helps
- [ x] test with fb_sweep
- [ x] measure 4 GPU savings

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1645

Test Plan: Read github PR description: https://github.com/fairinternal/fairseq-py/pull/1645

Reviewed By: myleott

Differential Revision: D26630365

Pulled By: sshleifer

fbshipit-source-id: b0c4163fbc97a7aefb116de70265fba11f6d7b42
2021-03-12 12:31:12 -08:00