compare to remote HEAD by CharlieFRuan · Pull Request #1 · FrontierCS/SkyRL

CharlieFRuan · 2026-03-09T03:38:48Z

No description provided.

Adds examples/evolve/ with the SkyRL training integration for the EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

enable_auto_tool_choice + tool_call_parser=qwen3_coder (advisor uses get_call_code tool) language_model_only=true + attention_backend=FLASH_ATTN Intentionally omitting reasoning_parser so thinking tokens stay in content and are captured in the training token sequence.

- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home) - HF_HOME → /data/qmang/hf_cache - TRITON_CACHE_DIR → /data/qmang/triton_cache - TORCH_HOME → /data/qmang/torch_cache

Make script runnable with Qwen3 and non-QMang machine

…on (NovaSky-AI#1294) minor typo fix for your consideration.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1294" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

…est for new inference codepath (NovaSky-AI#1301) # What does this PR do? Fixes outstanding CI failures. 1. FlashRL integration test is failing on CI after the vllm 0.16.0 upgrade. The correct fix is to migrate the flashRL fork: https://github.com/SumanthRH/vllm/tree/flashrl to the latest version. (We currently only support one vllm version in SkyRL) 2. `tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py::test_token_based_generation` is failing for the new inference codepath (Note that the corresponding text based generation - `test_text_based_generation` is not enabled for the new path yet) The error is as follows: ```bash > with InferenceEngineState.create(cfg, sleep_level=1) as engines: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py:218: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/backends/skyrl_train/gpu/utils.py:484: in create server_infos = server_group.start() ^^^^^^^^^^^^^^^^^^^^ skyrl/backends/skyrl_train/inference_servers/server_group.py:166: in start server_infos = self._pool.start() ^^^^^^^^^^^^^^^^^^ skyrl/backends/skyrl_train/inference_servers/server_pool.py:39: in start self._server_infos = ray.get(start_refs) ^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/auto_init_hook.py:22: in auto_init_wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/client_mode_hook.py:104: in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/worker.py:2961: in get values, debugger_breakpoint = worker.get_objects( _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <ray._private.worker.Worker object at 0x7d598c0f89b0> object_refs = [ObjectRef(637d71cfd8c3d0934199ab468c294a5876a09f88c803000001000000), ObjectRef(4ff1d0239b93c6f55967e35303f114e59f3eddd7c803000001000000)] timeout = None, return_exceptions = False, skip_deserialization = False _tensor_transport = None def get_objects( self, object_refs: list, timeout: Optional[float] = None, return_exceptions: bool = False, skip_deserialization: bool = False, _tensor_transport: Optional[str] = None, ) -> Tuple[List[serialization.SerializedRayObject], bytes]: """Get the values in the object store associated with the IDs. Return the values from the local object store for object_refs. This will block until all the values for object_refs have been written to the local object store. Args: object_refs: A list of the object refs whose values should be retrieved. timeout: The maximum amount of time in seconds to wait before returning. return_exceptions: If any of the objects deserialize to an Exception object, whether to return them as values in the returned list. If False, then the first found exception will be raised. skip_deserialization: If true, only the buffer will be released and the object associated with the buffer will not be deserialized. _tensor_transport: [Alpha] The tensor transport to use to fetch `torch.Tensors` found in the Ray Direct Transport object. Currently, this supports "object_store" and "nixl". Returns: list: List of deserialized objects or None if skip_deserialization is True. bytes: UUID of the debugger breakpoint we should drop into or b"" if there is no breakpoint. """ # Make sure that the values are object refs. for object_ref in object_refs: if not isinstance(object_ref, ObjectRef): raise TypeError( f"Attempting to call `get` on the value {object_ref}, " "which is not an ray.ObjectRef." ) tensor_transport: TensorTransportEnum = ( TensorTransportEnum.from_str(_tensor_transport) if _tensor_transport is not None else None ) assert tensor_transport in [ TensorTransportEnum.OBJECT_STORE, TensorTransportEnum.NIXL, None, ], "Currently, RDT only supports 'object_store' and 'nixl' for tensor transport in ray.get()." timeout_ms = ( int(timeout * 1000) if timeout is not None and timeout != -1 else -1 ) serialized_objects: List[ serialization.SerializedRayObject ] = self.core_worker.get_objects( object_refs, timeout_ms, ) debugger_breakpoint = b"" for data, metadata, _ in serialized_objects: if metadata: metadata_fields = metadata.split(b",") if len(metadata_fields) >= 2 and metadata_fields[1].startswith( ray_constants.OBJECT_METADATA_DEBUG_PREFIX ): debugger_breakpoint = metadata_fields[1][ len(ray_constants.OBJECT_METADATA_DEBUG_PREFIX) : ] if skip_deserialization: return None, debugger_breakpoint values = self.deserialize_objects( serialized_objects, object_refs, tensor_transport_hint=tensor_transport ) if not return_exceptions: # Raise exceptions instead of returning them to the user. for i, value in enumerate(values): if isinstance(value, RayError): if isinstance(value, ray.exceptions.ObjectLostError): global_worker.core_worker.log_plasma_usage() if isinstance(value, RayTaskError): > raise value.as_instanceof_cause() E ray.exceptions.RayTaskError(AssertionError): �[36mray::VLLMServerActor.start()�[39m (pid=2304492, ip=10.0.143.202, actor_id=4199ab468c294a5876a09f88c8030000, repr=<skyrl.backends.skyrl_train.inference_servers.vllm_server_actor.VLLMServerActor object at 0x73f1e5133950>) E File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result E return self.__get_result() E ^^^^^^^^^^^^^^^^^^^ E File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result E raise self._exception E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 209, in start E await self._wait_until_healthy() E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 224, in _wait_until_healthy E raise exc E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 248, in _run_server E self._engine = AsyncLLMEngine.from_engine_args( E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 251, in from_engine_args E return cls( E ^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 148, in __init__ E self.engine_core = EngineCoreClient.make_async_mp_client( E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client E return DPAsyncMPClient(*client_args) E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1082, in __init__ E self._ensure_stats_update_task() E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1091, in _ensure_stats_update_task E assert self.stats_update_address is not None E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E AssertionError ``` The issue is that we are using a dense model and testing DP > 1 setting in external load balancer mode. However, in vLLM, the DPCoordinator is configured only for MoE models in this setting. For a dense model with external load balancing, DP > 1 simply means that we are handling all the routing ourselves. So vLLM really expects us to not pass any DP arguments. See vllm-project/vllm#32252 for discussion. The right fix here is to test an MoE model - The PR switches to use the `Qwen/Qwen1.5-moe-a2.7b` 3. Flakiness in `tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py::test_generator_multi_turn_search ` -> I again saw some flakiness for this test because of the strict validation. The issue is that the Env expects the stop string to be the ending *text*, but in reality vllm will stop when the stop string is a *part* of the ending *token*. So the trailing character can be comma, period, etc. I have removed this validation - it doesn't affect correctness.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1301" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

…Sky-AI#1295) Hi!! I was trying to use SkyRL to do RL alignment for the Granite 4.0 models (e.g., [granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro)). These models use a hybrid Mamba + Attention + MoE architecture where the decoder layers don't return router logits in their output tuples. This causes an `IndexError` during training because SkyRL unconditionally sets `output_router_logits=True` for any model that has this field in its config (`model_wrapper.py`). When the ForCausalLM wrapper then calls load_balancing_loss_func with the empty gate_logits tuple, it crashes: ```python transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1752, in forward aux_loss = load_balancing_loss_func( transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1598, in load_balancing_loss_func compute_device = gate_logits[0].device IndexError: tuple index out of range ``` This PR skips setting `output_router_logits=True` when `model_type == "granitemoehybrid"`, fixing training for Granite hybrid MoE models.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1295" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

…vaSky-AI#1310) ## Summary Restore GLM-4.7-Flash Megatron GRPO example script (from PR NovaSky-AI#1215), adapted for the new `skyrl.train` entrypoint and config key format. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-46-48.ap-northeast-1.compute.internal> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ovaSky-AI#1312) # What does this PR do? Updates `skyrl` and `skyrl-gym` package versions after 0.1.0 release  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1312" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

# What does this PR do? Adds isort to `ruff` pre-commit hook  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1267" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

@justinvyu

…ovaSky-AI#1317) After NovaSky-AI#1187, the `eval_sampling_params.max_generate_length` no longer was automatically set to `sampling_params.max_generate_length` in the case that `eval_sampling_params` was not `None`. ([code link](https://github.com/NovaSky-AI/SkyRL/blob/073b1f3b626b760885b94e6e53e4a8bce5df1a38/skyrl/train/config/config.py#L515)) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually setting `eval_sampling_params.max_generate_length` in example scripts should fix this Examples of divergence due to unexpected lower `eval_sampling_params.max_generate_length` cc: @justinvyu <img width="2086" height="1332" alt="image" src="https://github.com/user-attachments/assets/0b1cfee9-1dfd-4a25-b85c-cbeb1991e515" /> <img width="696" height="664" alt="image" src="https://github.com/user-attachments/assets/07c0b300-6785-4b44-b46f-f4b840fa2a0c" />  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1317" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

…ky-AI#1318) # What does this PR do? Follow up to NovaSky-AI#1317 - we need to also add this to the doc pages.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

…ngines) (NovaSky-AI#1300) ## Overview Adds a `distributed_executor_backend` config option (`"mp"` | `"ray"` ) to `InferenceEngineConfig`, enabling the vLLM multiprocessing backend for single-node multi-GPU serving without relying on Ray for intra-engine worker management. ### Legacy inference stack (`ray_wrapped_inference_engine`) When `"mp"` is selected, physical GPU IDs are pre-computed per `(engine_idx, dp_rank)` from the shared placement group and passed to each actor so `CUDA_VISIBLE_DEVICES` is set correctly for the mp-spawned workers, ensuring each DP rank only sees its own `TP×PP` slice of GPUs. The actor is placed on the correct node using `PlacementGroupSchedulingStrategy` with the appropriate bundle index, identical to the `"ray"` backend path. ### New inference stack (`inference_servers`) `ServerGroup` is made engine-agnostic via a `prepare_server_kwargs` hook on `ServerActorProtocol`. This static method is called per-server before actor creation, giving the actor class access to the placement group to compute PG-dependent kwargs (e.g. resolving physical GPU IDs). `VLLMServerActor` implements this hook to pre-compute `mp_cuda_visible_devices` when the `"mp"` backend is requested, and plumbs through backend selection and GPU visibility to vLLM. Engine-specific kwargs (`distributed_executor_backend`, etc.) are passed via `**server_actor_kwargs` rather than named parameters on `ServerGroup`. > [!NOTE] > **Limitation (new inference stack):** The `mp` backend is **not yet fully supported** with the new HTTP-based inference servers in colocated mode (and is flaky in non-colocated mode) and is blocked at config validation. Additionally, the new inference path requires each engine to fit within a single node. These restrictions will be addressed in follow-up PRs — issue tracking in NovaSky-AI#1309  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1300" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Follow up to NovaSky-AI#1271, introduces ipc for colocated case. Since cuda ipc is not natively supported in 0.16 (will be available in next version), I copied the implementation and registered as a custom engine. This will eventually be removed in favor of the built in version. Tests: - [x] enable colocated tests in `test_policy_local_engines_e2e.py` - [x] new weight sync unit test in `test_weight_sync.py` - [x] e2e CI run: --------- Signed-off-by: ahao-anyscale <ahao@anyscale.com>

joyemang33 and others added 30 commits March 6, 2026 22:36

feat: add EvolveGenerator training entrypoint

63f5b16

Adds examples/evolve/ with the SkyRL training integration for the EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: modify max_advisor_context_iters

28f7098

fix: add PYTHONPATH, n_samples_per_prompt=8, max_advisor_context_iter…

696376d

…s=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use 2 GPUs, Qwen3.5-9B local path, correct model name handling

80f44fb

fix: change HTTP endpoint port to 8002 (8000 and 8001 occupied)

166bbc6

fix: use 4 GPUs

e871474

fix: set UV_CACHE_DIR to /data/qmang/uv_cache

c68e99a

fix: redirect all outputs/caches to /data/qmang

c0e6ddf

- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home) - HF_HOME → /data/qmang/hf_cache - TRITON_CACHE_DIR → /data/qmang/triton_cache - TORCH_HOME → /data/qmang/torch_cache

fix: use UV_PROJECT_ENVIRONMENT instead of symlink for venv on /data

c1da2ee

update training script

6df8627

go

447fcb1

111

77d1f23

good

3af3207

feat: separate solver and advisor

8c53499

clean up unused things

e718c6f

feat: add logging dir

bb6444a

remove jinja unused

0ff6c86

Make script runnable with Qwen3 and non-QMang machine

47436ff

Make script runnable with Qwen3 and non-QMang machine

8597aaa

Make script runnable with Qwen3 and non-QMang machine

train-train-train

d36e88e

feat:

17a068a

update training script

484cbf5

update

863d7d1

runnable on b200 e2e

d72f41c

SumanthRH and others added 7 commits March 11, 2026 12:28

Merge remote-tracking branch 'upstream/main'

622c397

nonfrozen-training

ceb8367

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare to remote HEAD#1

compare to remote HEAD#1
CharlieFRuan wants to merge 37 commits intoremote-originfrom
main

CharlieFRuan commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

CharlieFRuan commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants