fix: drop incorrect critic GPU add to rollout_num_gpus in colocate mode by aoshen02 · Pull Request #1950 · THUDM/slime

aoshen02 · 2026-05-27T01:30:07Z

Summary

Closes the root cause behind #1896 (which PR #1934 only worked around with a defensive boundary check).

In colocate mode slime/ray/placement_group.py:_create_placement_group allocates only actor_num_gpus_per_node * actor_num_nodes bundles in the placement group (the elif args.colocate: branch). The critic shares the actor's PG (result[\"critic\"] = result[\"actor\"] if args.use_critic else None) and so contributes no additional slots — actor / critic / rollout time-share the same physical GPUs via offload/onload.

But slime/utils/arguments.py still adds critic_num_gpus_per_node * critic_num_nodes onto args.rollout_num_gpus inside the colocate branch:

```python
if args.rollout_num_gpus != args.actor_num_gpus_per_node * args.actor_num_nodes:
logger.info(...)
args.rollout_num_gpus = args.actor_num_gpus_per_node * args.actor_num_nodes
if args.use_critic:
args.rollout_num_gpus += args.critic_num_gpus_per_node * args.critic_num_nodes # <-- buggy
```

This over-allocates the regular ServerGroupConfig.num_gpus past the PG size. start_engines (slime/ray/rollout.py) then iterates `len(all_engines) = rollout_num_gpus // gpus_per_engine` and indexes `reordered_gpu_ids[gpu_index]` beyond its length, raising the IndexError tracked in #1896:

```
IndexError: list index out of range
at base_gpu_id = int(reordered_gpu_ids[gpu_index])
in ServerGroup.start_engines (slime/ray/rollout.py)
```

Reproducer

`tests/test_qwen2.5_0.5B_ppo_critic_only_short.py` — `--advantage-estimator ppo` (so `use_critic=True`) + `--colocate` + `--rollout-num-gpus 2 --actor-num-gpus-per-node 4`. With the addition still in place, `rollout_num_gpus` is overridden to `4 + 4 = 8` but the colocate PG only has 4 bundles. Without it the rollout sizes correctly to the 4 colocated GPUs.

Verified locally on a 4-GPU H200 setup: the test crashed with the IndexError before this patch, and completes the full 2-step CI loop (`perf 1:` / `perf 2:` / `Job succeeded`) after.

Relationship to #1896 / #1934

[Bug] test_qwen2.5_0.5B_ppo_critic_only_short.py fails with IndexError at start_engines after #1866 (multi-role megatron config + stale sglang defaults) #1896 reported the exact same IndexError on the same test after Rename critic config to megatron config #1866 (`Rename critic config to megatron config`). The triage in [Bug] test_qwen2.5_0.5B_ppo_critic_only_short.py fails with IndexError at start_engines after #1866 (multi-role megatron config + stale sglang defaults) #1896 attributed it to stale sglang defaults in the test and suggested either fixing the test config or adding a boundary check.
Add GPU placement validation before starting rollout engines #1934 took the boundary-check route and merged a clearer ValueError in `ServerGroup.start_engines`. That converted a silent IndexError into a friendlier failure but did not change the underlying resource math — `rollout_num_gpus` is still over-allocated whenever `use_critic` and `colocate` are both set.

This PR fixes the resource math itself.

How the bug got in

`371c030e` ([Fix] ppo rollout engins for distribute, #394) added the `if args.use_critic: args.rollout_num_gpus += ...` lines but only touched `arguments.py`. It did not extend `placement_group.py` to give the critic its own bundles in colocate mode. A downstream fork (Miles, `radixark/miles`) does carry the matching `placement_group.py` change — Miles's colocate branch in `create_placement_groups` adds `if args.use_critic: num_gpus += critic_num_nodes * critic_num_gpus_per_node` and slices a separate `critic_pg_reordered_*`. In that world the `arguments.py` addition is consistent (PG grew, so rollout grows with it). In slime, since #394 only landed half of the change, the two files have been inconsistent and this code path was effectively broken whenever both `use_critic` and `colocate` were set.

Test plan

Verified `tests/test_qwen2.5_0.5B_ppo_critic_only_short.py` reproduces the original IndexError on `main` (HEAD `41dc3b6d`)
Verified the same test passes 2 RL steps end-to-end with this patch applied
No regression in the colocate non-critic path: `tests/test_qwen2.5_0.5B_short.py` still passes
Run the full PPO/colocate CI matrix

In colocate mode `slime/ray/placement_group.py:_create_placement_group` allocates only `actor_num_gpus_per_node * actor_num_nodes` bundles in the placement group (colocate branch). The critic shares the actor's PG (`result["critic"] = result["actor"]`) and so contributes no additional slots — actor, critic, and rollout time-share the same physical GPUs via offload/onload. The two lines being removed add `critic_num_gpus_per_node * critic_num_nodes` onto `args.rollout_num_gpus`, which over-allocates the regular `ServerGroupConfig.num_gpus` past the PG size. `start_engines` (`slime/ray/rollout.py`) then iterates `len(all_engines) = rollout_num_gpus // gpus_per_engine` and indexes `reordered_gpu_ids[gpu_index]` beyond its length, raising: IndexError: list index out of range at base_gpu_id = int(reordered_gpu_ids[gpu_index]) in ServerGroup.start_engines (slime/ray/rollout.py) Reproducer: `tests/test_qwen2.5_0.5B_ppo_critic_only_short.py` (`--advantage-estimator ppo` + `--colocate` + `--rollout-num-gpus 2` against actor 4 GPU). With the addition still in place, `rollout_num_gpus` is overridden to `4 + 4 = 8` but the PG only has 4 bundles; without it the rollout sizes correctly to the 4 colocated GPUs. This is the root cause of THUDM#1896 — PR THUDM#1934 added a defensive boundary check that surfaces the mismatch as a clearer `ValueError`, but the underlying resource math is still wrong. This patch removes the source of the mismatch instead of just catching its symptom. The blamed `if args.use_critic: args.rollout_num_gpus += ...` lines were added in 371c030 ([Fix] ppo rollout engins for distribute, THUDM#394), which only touched `arguments.py` and did not extend `placement_group.py` to match. The Miles RL framework (a downstream fork) does carry the matching `placement_group.py` change (critic gets its own bundles in colocate mode), where the addition would make sense; slime does not, so this code path was inconsistent in slime since THUDM#394. Signed-off-by: aoshen02 <aoshen@inferact.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: drop incorrect critic GPU add to rollout_num_gpus in colocate mode#1950

fix: drop incorrect critic GPU add to rollout_num_gpus in colocate mode#1950
aoshen02 wants to merge 1 commit into
THUDM:mainfrom
aoshen02:aoshen/fix-colocate-critic-rollout-num-gpus

aoshen02 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aoshen02 commented May 27, 2026

Summary

Reproducer

Relationship to #1896 / #1934

How the bug got in

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant