Conversation
| self._job_env = JobEnvironment() | ||
| except RuntimeError as e: | ||
| if SlurmJobEnvironment._env["job_id"] in os.environ: | ||
| # identified a slurm env without submitit, so let's use it |
There was a problem hiding this comment.
As discussed, this is a really weird use case and a surprising thing to try to fix: this is basically to make it possible for users (that are not using submitit to launch jobs) to use a helper function from submitit...
There was a problem hiding this comment.
how is that a problem? I'm actually happy that we can avoid some inter-dependencies, it gives more freedom
fmassa
left a comment
There was a problem hiding this comment.
From my understanding, this change enables someone not using submitit to still be able to retrieve those environment variables that are normally set by torchrun.
This seems a bit weird to me, as this is a helper function from within submitit, so I would expect it to only be relevant when using it in conjunction with submitit.
Maybe what we need to do instead is to see if we can setup those env vars in user code (maybe by using torchrun?).
can torchrun be used from python and not commandline?
i'm fine with it being in a user code, then again with only a couple of line changes we are able to accomodate more use cases easily, without duplicating code which can also bring some positive aspects :) |
No description provided.