fairseq[-hydra]-train torchrun compatibility: default device_id set to LOCAL_RANK if exists (#4351)

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? Fixes https://github.com/pytorch/fairseq/issues/4302 (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? I had fun when I figured out why torchrun was failing :) Pull Request resolved: https://github.com/pytorch/fairseq/pull/4351 Reviewed By: shruti-bh Differential Revision: D35784181 Pulled By: dianaml0 fbshipit-source-id: 560c7af12b2f9278cba6c85711b98b9e043d0ec9
2024-09-11 17:25:31 +03:00 · 2022-04-28 10:55:42 -07:00 · 2022-04-28 10:55:42 -07:00 · ab98e94046
commit ab98e94046
parent 72d3408481
1 changed files with 3 additions and 2 deletions
--- a/fairseq/dataclass/configs.py
+++ b/fairseq/dataclass/configs.py
@ -3,6 +3,7 @@
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.

+import os
 import sys
 from dataclasses import _MISSING_TYPE, dataclass, field
 from typing import Any, List, Optional
@ -285,9 +286,9 @@ class DistributedTrainingConfig(FairseqDataclass):
        },
    )
    device_id: int = field(
-        default=0,
+        default=os.getenv("LOCAL_RANK", 0),
        metadata={
-            "help": "which GPU to use (usually configured automatically)",
+            "help": "which GPU to use (by default looks for $LOCAL_RANK, usually configured automatically)",
            "argparse_alias": "--local_rank",
        },
    )