-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Describe the bug
A model's status may show as FAILED (no base-url available) yet the slurm job is still up and running, taking up resources. It seems like in such cases, vLLM launched successfully. Here is the output of the .err file when this happens:
jq: error: writing output failed: Stale file handle
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:05, 1.72s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:03<00:03, 1.87s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:05<00:01, 1.94s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:07<00:00, 1.87s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:07<00:00, 1.87s/it]
(EngineCore_DP0 pid=72)
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:06<00:00, 7.40it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 35/35 [00:04<00:00, 7.97it/s]
(APIServer pid=14) INFO: Started server process [14]
(APIServer pid=14) INFO: Waiting for application startup.
(APIServer pid=14) INFO: Application startup complete.
To Reproduce
I used the standard python API call (client.launch_model). The behaviour does not consistently show up.
Following is the most recent occurrence on Killarney:
slurmstepd: error: *** JOB 2241787 ON kn120 CANCELLED AT 2026-02-20T22:41:49 ***
Expected behavior
From the output of vec-inf logs, the models are loaded and running. The status should be READY if this is indeed the case.
Otherwise, if the launch FAILED, the job should end and take up no computing resources.
Screenshots
If applicable, add screenshots to help explain your problem.
Version
v0.8.1
Additional context
Add any other context about the problem here.