fairseq[-hydra]-train torchrun compatibility: default device_id set to LOCAL_RANK if exists (#4351)

Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes https://github.com/pytorch/fairseq/issues/4302 (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
I had fun when I figured out why torchrun was failing :)

Pull Request resolved: https://github.com/pytorch/fairseq/pull/4351

Reviewed By: shruti-bh

Differential Revision: D35784181

Pulled By: dianaml0

fbshipit-source-id: 560c7af12b2f9278cba6c85711b98b9e043d0ec9
This commit is contained in:
Colin Clement 2022-04-28 10:55:42 -07:00 committed by Facebook GitHub Bot
parent 72d3408481
commit ab98e94046

View File

@ -3,6 +3,7 @@
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import os
import sys
from dataclasses import _MISSING_TYPE, dataclass, field
from typing import Any, List, Optional
@ -285,9 +286,9 @@ class DistributedTrainingConfig(FairseqDataclass):
},
)
device_id: int = field(
default=0,
default=os.getenv("LOCAL_RANK", 0),
metadata={
"help": "which GPU to use (usually configured automatically)",
"help": "which GPU to use (by default looks for $LOCAL_RANK, usually configured automatically)",
"argparse_alias": "--local_rank",
},
)