Add Miles Qwen3-8B GRPO training example on H100 by xyuzh · Pull Request #44 · anyscale/examples

xyuzh · 2026-02-23T04:25:52Z

Summary

Adds a single-node RL training example for Qwen3-8B with GRPO on 8x H100-80GB using Anyscale
Uses Miles with Megatron backend (TP=2, DP=2) for training and 3 disaggregated SGLang engines for rollout
Includes Dockerfile, Anyscale job config, and entrypoint script that handles model download, HF→Megatron weight conversion, and async GRPO training

Files

File	Description
`miles_qwen3_8b_h100/Dockerfile.anyscale`	Docker image with Miles, Megatron-LM, SGLang, flash-attn, TE
`miles_qwen3_8b_h100/job.yaml`	Anyscale job config (`m5.2xlarge` head + 1x `p5.48xlarge` worker)
`miles_qwen3_8b_h100/entrypoint.sh`	Downloads model/data, converts weights, runs async GRPO training
`miles_qwen3_8b_h100/README.md`	Setup instructions, cluster layout, and tuning guide

Cluster Layout

Head node (m5.2xlarge):  driver only, no GPUs
Worker 0 (8x H100-80GB):
  GPU 0-3: Training (TP=2, DP=2)
  GPU 4-7: Rollout (3 SGLang engines + 1 driver)

Test plan

Build the Docker image via Anyscale
Submit the job with anyscale job submit -f job.yaml
Verify weight conversion completes successfully
Confirm training loss decreases and reward increases over rollouts

Single-node RL training of Qwen3-8B with GRPO on 8x H100-80GB using Anyscale. Includes Dockerfile, job config, and entrypoint script that handles model download, weight conversion, and async GRPO training with Megatron backend (TP=2, DP=2) and 3 SGLang rollout engines.

- Remove ray job submit, call python directly - Move env vars to appropriate locations (PYTHONPATH in Dockerfile, CUDA_DEVICE_MAX_CONNECTIONS in job.yaml) - Simplify entrypoint.sh (remove unused vars, fix paths) - Add timeout_s to job.yaml - Restructure README to match other examples pattern - Rename Dockerfile.anyscale -> Dockerfile - Change python3 -> python throughout Signed-off-by: Robert Nishihara <rkn@anyscale.com>

- Replace instance_type with required_resources and required_labels - Specify H100 accelerator type using ray.io/accelerator-type label - Define resource requirements: 8 CPUs/32Gi for head, 96 CPUs/512Gi/8 GPUs for workers - Allows Anyscale to select optimal H100 instance type (e.g., p5.48xlarge) Signed-off-by: Robert Nishihara <rkn@anyscale.com>

- Update worker resources to match p5.48xlarge specs: 192 vCPUs, 2048Gi memory - Keeps 8 H100 GPUs with H100 accelerator type label Signed-off-by: Robert Nishihara <rkn@anyscale.com>

- Add convert_weights_remote.py wrapper with @ray.remote(num_gpus=1) - Ensures weight conversion runs on GPU worker instead of head node - Fixes 'No NVIDIA driver' error when running conversion Signed-off-by: Robert Nishihara <rkn@anyscale.com>

- Create train_remote.py with @ray.remote(num_gpus=4) - Ensures training runs on GPU workers instead of head node - Both weight conversion and training now use Ray remote Signed-off-by: Robert Nishihara <rkn@anyscale.com>

xyuzh and others added 7 commits February 22, 2026 20:25

Change num-rollout from 3000 to 5

2c18184

Fix declarative compute config resource requirements

4c57296

- Update worker resources to match p5.48xlarge specs: 192 vCPUs, 2048Gi memory - Keeps 8 H100 GPUs with H100 accelerator type label Signed-off-by: Robert Nishihara <rkn@anyscale.com>

Add Ray remote wrapper for training script

eba802a

- Create train_remote.py with @ray.remote(num_gpus=4) - Ensures training runs on GPU workers instead of head node - Both weight conversion and training now use Ray remote Signed-off-by: Robert Nishihara <rkn@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Miles Qwen3-8B GRPO training example on H100#44

Add Miles Qwen3-8B GRPO training example on H100#44
xyuzh wants to merge 7 commits intomainfrom
miles-qwen3-8b-h100

xyuzh commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xyuzh commented Feb 23, 2026

Summary

Files

Cluster Layout

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants