Multi server multi gpu#1367
Conversation
… evaluate, and predict
Adds reproducible Slurm helpers for multinode and large-tile prediction workflows.
Docs for multi-GPU and multi-node workflows.
c609ac5 to
5f40515
Compare
5f40515 to
0905ce4
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1367 +/- ##
==========================================
- Coverage 86.87% 85.53% -1.35%
==========================================
Files 24 25 +1
Lines 3064 3326 +262
==========================================
+ Hits 2662 2845 +183
- Misses 402 481 +79
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
33ef29e to
b9c119c
Compare
|
Can you confirm what SLURM script you used to check this so i can match that? |
bw4sz
left a comment
There was a problem hiding this comment.
I want to approve this, @jveitchmichaelis any objections? I have spoken to comet and they agree that the lack of multi-node GPU utilization graph is probably on their end and not anything wrong here.
|
remove references to torchrun, we can use srun alone. |
|
To do, is @henrykironde comparing this with #1304 and decide if both are needing, but we want to get this done because it's broad and could make rebasing harder. |
Description
Related Issue(s)
AI-Assisted Development
AI tools used (if applicable):