Skip to content

Add Miles Qwen3-8B GRPO training example on H100#44

Open
xyuzh wants to merge 7 commits intomainfrom
miles-qwen3-8b-h100
Open

Add Miles Qwen3-8B GRPO training example on H100#44
xyuzh wants to merge 7 commits intomainfrom
miles-qwen3-8b-h100

Conversation

@xyuzh
Copy link
Contributor

@xyuzh xyuzh commented Feb 23, 2026

Summary

  • Adds a single-node RL training example for Qwen3-8B with GRPO on 8x H100-80GB using Anyscale
  • Uses Miles with Megatron backend (TP=2, DP=2) for training and 3 disaggregated SGLang engines for rollout
  • Includes Dockerfile, Anyscale job config, and entrypoint script that handles model download, HF→Megatron weight conversion, and async GRPO training

Files

File Description
miles_qwen3_8b_h100/Dockerfile.anyscale Docker image with Miles, Megatron-LM, SGLang, flash-attn, TE
miles_qwen3_8b_h100/job.yaml Anyscale job config (m5.2xlarge head + 1x p5.48xlarge worker)
miles_qwen3_8b_h100/entrypoint.sh Downloads model/data, converts weights, runs async GRPO training
miles_qwen3_8b_h100/README.md Setup instructions, cluster layout, and tuning guide

Cluster Layout

Head node (m5.2xlarge):  driver only, no GPUs
Worker 0 (8x H100-80GB):
  GPU 0-3: Training (TP=2, DP=2)
  GPU 4-7: Rollout (3 SGLang engines + 1 driver)

Test plan

  • Build the Docker image via Anyscale
  • Submit the job with anyscale job submit -f job.yaml
  • Verify weight conversion completes successfully
  • Confirm training loss decreases and reward increases over rollouts

xyuzh and others added 7 commits February 22, 2026 20:25
Single-node RL training of Qwen3-8B with GRPO on 8x H100-80GB using
Anyscale. Includes Dockerfile, job config, and entrypoint script that
handles model download, weight conversion, and async GRPO training
with Megatron backend (TP=2, DP=2) and 3 SGLang rollout engines.
- Remove ray job submit, call python directly
- Move env vars to appropriate locations (PYTHONPATH in Dockerfile, CUDA_DEVICE_MAX_CONNECTIONS in job.yaml)
- Simplify entrypoint.sh (remove unused vars, fix paths)
- Add timeout_s to job.yaml
- Restructure README to match other examples pattern
- Rename Dockerfile.anyscale -> Dockerfile
- Change python3 -> python throughout

Signed-off-by: Robert Nishihara <rkn@anyscale.com>
- Replace instance_type with required_resources and required_labels
- Specify H100 accelerator type using ray.io/accelerator-type label
- Define resource requirements: 8 CPUs/32Gi for head, 96 CPUs/512Gi/8 GPUs for workers
- Allows Anyscale to select optimal H100 instance type (e.g., p5.48xlarge)

Signed-off-by: Robert Nishihara <rkn@anyscale.com>
- Update worker resources to match p5.48xlarge specs: 192 vCPUs, 2048Gi memory
- Keeps 8 H100 GPUs with H100 accelerator type label

Signed-off-by: Robert Nishihara <rkn@anyscale.com>
- Add convert_weights_remote.py wrapper with @ray.remote(num_gpus=1)
- Ensures weight conversion runs on GPU worker instead of head node
- Fixes 'No NVIDIA driver' error when running conversion

Signed-off-by: Robert Nishihara <rkn@anyscale.com>
- Create train_remote.py with @ray.remote(num_gpus=4)
- Ensures training runs on GPU workers instead of head node
- Both weight conversion and training now use Ray remote

Signed-off-by: Robert Nishihara <rkn@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants