Skip to content

compare to remote HEAD#1

Draft
CharlieFRuan wants to merge 37 commits intoremote-originfrom
main
Draft

compare to remote HEAD#1
CharlieFRuan wants to merge 37 commits intoremote-originfrom
main

Conversation

@CharlieFRuan
Copy link
Collaborator

No description provided.

joyemang33 and others added 30 commits March 6, 2026 22:36
Adds examples/evolve/ with the SkyRL training integration for the
EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s=10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
enable_auto_tool_choice + tool_call_parser=qwen3_coder (advisor uses get_call_code tool)
language_model_only=true + attention_backend=FLASH_ATTN

Intentionally omitting reasoning_parser so thinking tokens stay in
content and are captured in the training token sequence.
- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home)
- HF_HOME → /data/qmang/hf_cache
- TRITON_CACHE_DIR → /data/qmang/triton_cache
- TORCH_HOME → /data/qmang/torch_cache
Make script runnable with Qwen3 and non-QMang machine
…on (NovaSky-AI#1294)

minor typo fix for your consideration.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1294"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
…est for new inference codepath (NovaSky-AI#1301)

# What does this PR do?

Fixes outstanding CI failures. 

1. FlashRL integration test is failing on CI after the vllm 0.16.0
upgrade. The correct fix is to migrate the flashRL fork:
https://github.com/SumanthRH/vllm/tree/flashrl to the latest version.
(We currently only support one vllm version in SkyRL)
2.
`tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py::test_token_based_generation`
is failing for the new inference codepath (Note that the corresponding
text based generation - `test_text_based_generation` is not enabled for
the new path yet)
The error is as follows:

```bash
>       with InferenceEngineState.create(cfg, sleep_level=1) as engines:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py:218: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/backends/skyrl_train/gpu/utils.py:484: in create
    server_infos = server_group.start()
                   ^^^^^^^^^^^^^^^^^^^^
skyrl/backends/skyrl_train/inference_servers/server_group.py:166: in start
    server_infos = self._pool.start()
                   ^^^^^^^^^^^^^^^^^^
skyrl/backends/skyrl_train/inference_servers/server_pool.py:39: in start
    self._server_infos = ray.get(start_refs)
                         ^^^^^^^^^^^^^^^^^^^
../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/auto_init_hook.py:22: in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/client_mode_hook.py:104: in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/worker.py:2961: in get
    values, debugger_breakpoint = worker.get_objects(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ray._private.worker.Worker object at 0x7d598c0f89b0>
object_refs = [ObjectRef(637d71cfd8c3d0934199ab468c294a5876a09f88c803000001000000), ObjectRef(4ff1d0239b93c6f55967e35303f114e59f3eddd7c803000001000000)]
timeout = None, return_exceptions = False, skip_deserialization = False
_tensor_transport = None

    def get_objects(
        self,
        object_refs: list,
        timeout: Optional[float] = None,
        return_exceptions: bool = False,
        skip_deserialization: bool = False,
        _tensor_transport: Optional[str] = None,
    ) -> Tuple[List[serialization.SerializedRayObject], bytes]:
        """Get the values in the object store associated with the IDs.
    
        Return the values from the local object store for object_refs. This
        will block until all the values for object_refs have been written to
        the local object store.
    
        Args:
            object_refs: A list of the object refs
                whose values should be retrieved.
            timeout: The maximum amount of time in
                seconds to wait before returning.
            return_exceptions: If any of the objects deserialize to an
                Exception object, whether to return them as values in the
                returned list. If False, then the first found exception will be
                raised.
            skip_deserialization: If true, only the buffer will be released and
                the object associated with the buffer will not be deserialized.
            _tensor_transport: [Alpha] The tensor transport to use to fetch `torch.Tensors` found in the Ray Direct Transport object. Currently, this supports "object_store" and "nixl".
        Returns:
            list: List of deserialized objects or None if skip_deserialization is True.
            bytes: UUID of the debugger breakpoint we should drop
                into or b"" if there is no breakpoint.
        """
        # Make sure that the values are object refs.
        for object_ref in object_refs:
            if not isinstance(object_ref, ObjectRef):
                raise TypeError(
                    f"Attempting to call `get` on the value {object_ref}, "
                    "which is not an ray.ObjectRef."
                )
        tensor_transport: TensorTransportEnum = (
            TensorTransportEnum.from_str(_tensor_transport)
            if _tensor_transport is not None
            else None
        )
        assert tensor_transport in [
            TensorTransportEnum.OBJECT_STORE,
            TensorTransportEnum.NIXL,
            None,
        ], "Currently, RDT only supports 'object_store' and 'nixl' for tensor transport in ray.get()."
        timeout_ms = (
            int(timeout * 1000) if timeout is not None and timeout != -1 else -1
        )
        serialized_objects: List[
            serialization.SerializedRayObject
        ] = self.core_worker.get_objects(
            object_refs,
            timeout_ms,
        )
    
        debugger_breakpoint = b""
        for data, metadata, _ in serialized_objects:
            if metadata:
                metadata_fields = metadata.split(b",")
                if len(metadata_fields) >= 2 and metadata_fields[1].startswith(
                    ray_constants.OBJECT_METADATA_DEBUG_PREFIX
                ):
                    debugger_breakpoint = metadata_fields[1][
                        len(ray_constants.OBJECT_METADATA_DEBUG_PREFIX) :
                    ]
        if skip_deserialization:
            return None, debugger_breakpoint
    
        values = self.deserialize_objects(
            serialized_objects, object_refs, tensor_transport_hint=tensor_transport
        )
        if not return_exceptions:
            # Raise exceptions instead of returning them to the user.
            for i, value in enumerate(values):
                if isinstance(value, RayError):
                    if isinstance(value, ray.exceptions.ObjectLostError):
                        global_worker.core_worker.log_plasma_usage()
                    if isinstance(value, RayTaskError):
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(AssertionError): �[36mray::VLLMServerActor.start()�[39m (pid=2304492, ip=10.0.143.202, actor_id=4199ab468c294a5876a09f88c8030000, repr=<skyrl.backends.skyrl_train.inference_servers.vllm_server_actor.VLLMServerActor object at 0x73f1e5133950>)
E                         File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
E                           return self.__get_result()
E                                  ^^^^^^^^^^^^^^^^^^^
E                         File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
E                           raise self._exception
E                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 209, in start
E                           await self._wait_until_healthy()
E                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 224, in _wait_until_healthy
E                           raise exc
E                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 248, in _run_server
E                           self._engine = AsyncLLMEngine.from_engine_args(
E                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 251, in from_engine_args
E                           return cls(
E                                  ^^^^
E                         File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 148, in __init__
E                           self.engine_core = EngineCoreClient.make_async_mp_client(
E                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
E                           return DPAsyncMPClient(*client_args)
E                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                         File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1082, in __init__
E                           self._ensure_stats_update_task()
E                         File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1091, in _ensure_stats_update_task
E                           assert self.stats_update_address is not None
E                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                       AssertionError

```

The issue is that we are using a dense model and testing DP > 1 setting
in external load balancer mode. However, in vLLM, the DPCoordinator is
configured only for MoE models in this setting. For a dense model with
external load balancing, DP > 1 simply means that we are handling all
the routing ourselves. So vLLM really expects us to not pass any DP
arguments. See vllm-project/vllm#32252 for
discussion. The right fix here is to test an MoE model - The PR switches
to use the `Qwen/Qwen1.5-moe-a2.7b`
3. Flakiness in
`tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py::test_generator_multi_turn_search
` -> I again saw some flakiness for this test because of the strict
validation. The issue is that the Env expects the stop string to be the
ending *text*, but in reality vllm will stop when the stop string is a
*part* of the ending *token*. So the trailing character can be comma,
period, etc. I have removed this validation - it doesn't affect
correctness.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1301"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…Sky-AI#1295)

Hi!!
I was trying to use SkyRL to do RL alignment for the Granite 4.0 models
(e.g.,
[granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro)).
These models use a hybrid Mamba + Attention + MoE architecture where the
decoder layers don't return router logits in their output tuples.

This causes an `IndexError` during training because SkyRL
unconditionally sets `output_router_logits=True` for any model that has
this field in its config (`model_wrapper.py`). When the ForCausalLM
wrapper then calls load_balancing_loss_func with the empty gate_logits
tuple, it crashes:

```python
transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1752, in forward
      aux_loss = load_balancing_loss_func(
  transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1598, in load_balancing_loss_func
      compute_device = gate_logits[0].device
  IndexError: tuple index out of range
```

This PR skips setting `output_router_logits=True` when `model_type ==
"granitemoehybrid"`, fixing training for Granite hybrid MoE models.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1295"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
…vaSky-AI#1310)

## Summary
Restore GLM-4.7-Flash Megatron GRPO example script (from PR NovaSky-AI#1215),
adapted for the new `skyrl.train` entrypoint and config key format.

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-46-48.ap-northeast-1.compute.internal>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ovaSky-AI#1312)

# What does this PR do?

Updates `skyrl` and `skyrl-gym` package versions after 0.1.0 release
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1312"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
SumanthRH and others added 7 commits March 11, 2026 12:28
# What does this PR do?

Adds isort to `ruff` pre-commit hook
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1267"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

---------

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
…ovaSky-AI#1317)

After NovaSky-AI#1187, the `eval_sampling_params.max_generate_length` no longer
was automatically set to `sampling_params.max_generate_length` in the
case that `eval_sampling_params` was not `None`. ([code
link](https://github.com/NovaSky-AI/SkyRL/blob/073b1f3b626b760885b94e6e53e4a8bce5df1a38/skyrl/train/config/config.py#L515))
This was causing example scripts to have matching training behavior but
diverging eval behavior. Manually setting
`eval_sampling_params.max_generate_length` in example scripts should fix
this


Examples of divergence due to unexpected lower
`eval_sampling_params.max_generate_length` cc: @justinvyu
<img width="2086" height="1332" alt="image"
src="https://github.com/user-attachments/assets/0b1cfee9-1dfd-4a25-b85c-cbeb1991e515"
/>
<img width="696" height="664" alt="image"
src="https://github.com/user-attachments/assets/07c0b300-6785-4b44-b46f-f4b840fa2a0c"
/>

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1317"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
…ky-AI#1318)

# What does this PR do?

Follow up to NovaSky-AI#1317 - we need to also add this to the doc pages. 
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

---------

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
…ngines) (NovaSky-AI#1300)

## Overview

Adds a `distributed_executor_backend` config option (`"mp"` | `"ray"` )
to `InferenceEngineConfig`, enabling the vLLM multiprocessing backend
for single-node multi-GPU serving without relying on Ray for
intra-engine worker management.

### Legacy inference stack (`ray_wrapped_inference_engine`)

When `"mp"` is selected, physical GPU IDs are pre-computed per
`(engine_idx, dp_rank)` from the shared placement group and passed to
each actor so `CUDA_VISIBLE_DEVICES` is set correctly for the mp-spawned
workers, ensuring each DP rank only sees its own `TP×PP` slice of GPUs.
The actor is placed on the correct node using
`PlacementGroupSchedulingStrategy` with the appropriate bundle index,
identical to the `"ray"` backend path.

### New inference stack (`inference_servers`)

`ServerGroup` is made engine-agnostic via a `prepare_server_kwargs` hook
on `ServerActorProtocol`. This static method is called per-server before
actor creation, giving the actor class access to the placement group to
compute PG-dependent kwargs (e.g. resolving physical GPU IDs).
`VLLMServerActor` implements this hook to pre-compute
`mp_cuda_visible_devices` when the `"mp"` backend is requested, and
plumbs through backend selection and GPU visibility to vLLM.
Engine-specific kwargs (`distributed_executor_backend`, etc.) are passed
via `**server_actor_kwargs` rather than named parameters on
`ServerGroup`.

> [!NOTE]
> **Limitation (new inference stack):** The `mp` backend is **not yet
fully supported** with the new HTTP-based inference servers in colocated
mode (and is flaky in non-colocated mode) and is blocked at config
validation. Additionally, the new inference path requires each engine to
fit within a single node. These restrictions will be addressed in
follow-up PRs — issue tracking in
NovaSky-AI#1309

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1300"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

---------

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Follow up to NovaSky-AI#1271,
introduces ipc for colocated case.

Since cuda ipc is not natively supported in 0.16 (will be available in
next version), I copied the implementation and registered as a custom
engine. This will eventually be removed in favor of the built in
version.

Tests:
- [x] enable colocated tests in `test_policy_local_engines_e2e.py`
- [x] new weight sync unit test in `test_weight_sync.py`
- [x] e2e CI run:

---------

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants