Draft
Conversation
Adds examples/evolve/ with the SkyRL training integration for the EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
enable_auto_tool_choice + tool_call_parser=qwen3_coder (advisor uses get_call_code tool) language_model_only=true + attention_backend=FLASH_ATTN Intentionally omitting reasoning_parser so thinking tokens stay in content and are captured in the training token sequence.
- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home) - HF_HOME → /data/qmang/hf_cache - TRITON_CACHE_DIR → /data/qmang/triton_cache - TORCH_HOME → /data/qmang/torch_cache
Make script runnable with Qwen3 and non-QMang machine
…on (NovaSky-AI#1294) minor typo fix for your consideration. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1294" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
…est for new inference codepath (NovaSky-AI#1301) # What does this PR do? Fixes outstanding CI failures. 1. FlashRL integration test is failing on CI after the vllm 0.16.0 upgrade. The correct fix is to migrate the flashRL fork: https://github.com/SumanthRH/vllm/tree/flashrl to the latest version. (We currently only support one vllm version in SkyRL) 2. `tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py::test_token_based_generation` is failing for the new inference codepath (Note that the corresponding text based generation - `test_text_based_generation` is not enabled for the new path yet) The error is as follows: ```bash > with InferenceEngineState.create(cfg, sleep_level=1) as engines: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py:218: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/backends/skyrl_train/gpu/utils.py:484: in create server_infos = server_group.start() ^^^^^^^^^^^^^^^^^^^^ skyrl/backends/skyrl_train/inference_servers/server_group.py:166: in start server_infos = self._pool.start() ^^^^^^^^^^^^^^^^^^ skyrl/backends/skyrl_train/inference_servers/server_pool.py:39: in start self._server_infos = ray.get(start_refs) ^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/auto_init_hook.py:22: in auto_init_wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/client_mode_hook.py:104: in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ ../../.cache/uv/builds-v0/.tmprBPQeX/lib/python3.12/site-packages/ray/_private/worker.py:2961: in get values, debugger_breakpoint = worker.get_objects( _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <ray._private.worker.Worker object at 0x7d598c0f89b0> object_refs = [ObjectRef(637d71cfd8c3d0934199ab468c294a5876a09f88c803000001000000), ObjectRef(4ff1d0239b93c6f55967e35303f114e59f3eddd7c803000001000000)] timeout = None, return_exceptions = False, skip_deserialization = False _tensor_transport = None def get_objects( self, object_refs: list, timeout: Optional[float] = None, return_exceptions: bool = False, skip_deserialization: bool = False, _tensor_transport: Optional[str] = None, ) -> Tuple[List[serialization.SerializedRayObject], bytes]: """Get the values in the object store associated with the IDs. Return the values from the local object store for object_refs. This will block until all the values for object_refs have been written to the local object store. Args: object_refs: A list of the object refs whose values should be retrieved. timeout: The maximum amount of time in seconds to wait before returning. return_exceptions: If any of the objects deserialize to an Exception object, whether to return them as values in the returned list. If False, then the first found exception will be raised. skip_deserialization: If true, only the buffer will be released and the object associated with the buffer will not be deserialized. _tensor_transport: [Alpha] The tensor transport to use to fetch `torch.Tensors` found in the Ray Direct Transport object. Currently, this supports "object_store" and "nixl". Returns: list: List of deserialized objects or None if skip_deserialization is True. bytes: UUID of the debugger breakpoint we should drop into or b"" if there is no breakpoint. """ # Make sure that the values are object refs. for object_ref in object_refs: if not isinstance(object_ref, ObjectRef): raise TypeError( f"Attempting to call `get` on the value {object_ref}, " "which is not an ray.ObjectRef." ) tensor_transport: TensorTransportEnum = ( TensorTransportEnum.from_str(_tensor_transport) if _tensor_transport is not None else None ) assert tensor_transport in [ TensorTransportEnum.OBJECT_STORE, TensorTransportEnum.NIXL, None, ], "Currently, RDT only supports 'object_store' and 'nixl' for tensor transport in ray.get()." timeout_ms = ( int(timeout * 1000) if timeout is not None and timeout != -1 else -1 ) serialized_objects: List[ serialization.SerializedRayObject ] = self.core_worker.get_objects( object_refs, timeout_ms, ) debugger_breakpoint = b"" for data, metadata, _ in serialized_objects: if metadata: metadata_fields = metadata.split(b",") if len(metadata_fields) >= 2 and metadata_fields[1].startswith( ray_constants.OBJECT_METADATA_DEBUG_PREFIX ): debugger_breakpoint = metadata_fields[1][ len(ray_constants.OBJECT_METADATA_DEBUG_PREFIX) : ] if skip_deserialization: return None, debugger_breakpoint values = self.deserialize_objects( serialized_objects, object_refs, tensor_transport_hint=tensor_transport ) if not return_exceptions: # Raise exceptions instead of returning them to the user. for i, value in enumerate(values): if isinstance(value, RayError): if isinstance(value, ray.exceptions.ObjectLostError): global_worker.core_worker.log_plasma_usage() if isinstance(value, RayTaskError): > raise value.as_instanceof_cause() E ray.exceptions.RayTaskError(AssertionError): �[36mray::VLLMServerActor.start()�[39m (pid=2304492, ip=10.0.143.202, actor_id=4199ab468c294a5876a09f88c8030000, repr=<skyrl.backends.skyrl_train.inference_servers.vllm_server_actor.VLLMServerActor object at 0x73f1e5133950>) E File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result E return self.__get_result() E ^^^^^^^^^^^^^^^^^^^ E File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result E raise self._exception E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 209, in start E await self._wait_until_healthy() E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 224, in _wait_until_healthy E raise exc E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/tmp/ray/session_2026-02-19_20-32-05_605637_5843/runtime_resources/working_dir_files/_ray_pkg_92c1802cabc39cf5/skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py", line 248, in _run_server E self._engine = AsyncLLMEngine.from_engine_args( E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 251, in from_engine_args E return cls( E ^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 148, in __init__ E self.engine_core = EngineCoreClient.make_async_mp_client( E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client E return DPAsyncMPClient(*client_args) E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1082, in __init__ E self._ensure_stats_update_task() E File "/home/ray/.cache/uv/builds-v0/.tmpjRRfSQ/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1091, in _ensure_stats_update_task E assert self.stats_update_address is not None E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E AssertionError ``` The issue is that we are using a dense model and testing DP > 1 setting in external load balancer mode. However, in vLLM, the DPCoordinator is configured only for MoE models in this setting. For a dense model with external load balancing, DP > 1 simply means that we are handling all the routing ourselves. So vLLM really expects us to not pass any DP arguments. See vllm-project/vllm#32252 for discussion. The right fix here is to test an MoE model - The PR switches to use the `Qwen/Qwen1.5-moe-a2.7b` 3. Flakiness in `tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py::test_generator_multi_turn_search ` -> I again saw some flakiness for this test because of the strict validation. The issue is that the Env expects the stop string to be the ending *text*, but in reality vllm will stop when the stop string is a *part* of the ending *token*. So the trailing character can be comma, period, etc. I have removed this validation - it doesn't affect correctness. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1301" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…Sky-AI#1295) Hi!! I was trying to use SkyRL to do RL alignment for the Granite 4.0 models (e.g., [granite-4.0-micro](https://huggingface.co/ibm-granite/granite-4.0-micro)). These models use a hybrid Mamba + Attention + MoE architecture where the decoder layers don't return router logits in their output tuples. This causes an `IndexError` during training because SkyRL unconditionally sets `output_router_logits=True` for any model that has this field in its config (`model_wrapper.py`). When the ForCausalLM wrapper then calls load_balancing_loss_func with the empty gate_logits tuple, it crashes: ```python transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1752, in forward aux_loss = load_balancing_loss_func( transformers/models/granitemoehybrid/modeling_granitemoehybrid.py, line 1598, in load_balancing_loss_func compute_device = gate_logits[0].device IndexError: tuple index out of range ``` This PR skips setting `output_router_logits=True` when `model_type == "granitemoehybrid"`, fixing training for Granite hybrid MoE models. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1295" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
…vaSky-AI#1310) ## Summary Restore GLM-4.7-Flash Megatron GRPO example script (from PR NovaSky-AI#1215), adapted for the new `skyrl.train` entrypoint and config key format. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-46-48.ap-northeast-1.compute.internal> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ovaSky-AI#1312) # What does this PR do? Updates `skyrl` and `skyrl-gym` package versions after 0.1.0 release <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1312" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
# What does this PR do? Adds isort to `ruff` pre-commit hook <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1267" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
…ovaSky-AI#1317) After NovaSky-AI#1187, the `eval_sampling_params.max_generate_length` no longer was automatically set to `sampling_params.max_generate_length` in the case that `eval_sampling_params` was not `None`. ([code link](https://github.com/NovaSky-AI/SkyRL/blob/073b1f3b626b760885b94e6e53e4a8bce5df1a38/skyrl/train/config/config.py#L515)) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually setting `eval_sampling_params.max_generate_length` in example scripts should fix this Examples of divergence due to unexpected lower `eval_sampling_params.max_generate_length` cc: @justinvyu <img width="2086" height="1332" alt="image" src="https://github.com/user-attachments/assets/0b1cfee9-1dfd-4a25-b85c-cbeb1991e515" /> <img width="696" height="664" alt="image" src="https://github.com/user-attachments/assets/07c0b300-6785-4b44-b46f-f4b840fa2a0c" /> <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1317" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
…ky-AI#1318) # What does this PR do? Follow up to NovaSky-AI#1317 - we need to also add this to the doc pages. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1318" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
…ngines) (NovaSky-AI#1300) ## Overview Adds a `distributed_executor_backend` config option (`"mp"` | `"ray"` ) to `InferenceEngineConfig`, enabling the vLLM multiprocessing backend for single-node multi-GPU serving without relying on Ray for intra-engine worker management. ### Legacy inference stack (`ray_wrapped_inference_engine`) When `"mp"` is selected, physical GPU IDs are pre-computed per `(engine_idx, dp_rank)` from the shared placement group and passed to each actor so `CUDA_VISIBLE_DEVICES` is set correctly for the mp-spawned workers, ensuring each DP rank only sees its own `TP×PP` slice of GPUs. The actor is placed on the correct node using `PlacementGroupSchedulingStrategy` with the appropriate bundle index, identical to the `"ray"` backend path. ### New inference stack (`inference_servers`) `ServerGroup` is made engine-agnostic via a `prepare_server_kwargs` hook on `ServerActorProtocol`. This static method is called per-server before actor creation, giving the actor class access to the placement group to compute PG-dependent kwargs (e.g. resolving physical GPU IDs). `VLLMServerActor` implements this hook to pre-compute `mp_cuda_visible_devices` when the `"mp"` backend is requested, and plumbs through backend selection and GPU visibility to vLLM. Engine-specific kwargs (`distributed_executor_backend`, etc.) are passed via `**server_actor_kwargs` rather than named parameters on `ServerGroup`. > [!NOTE] > **Limitation (new inference stack):** The `mp` backend is **not yet fully supported** with the new HTTP-based inference servers in colocated mode (and is flaky in non-colocated mode) and is blocked at config validation. Additionally, the new inference path requires each engine to fit within a single node. These restrictions will be addressed in follow-up PRs — issue tracking in NovaSky-AI#1309 <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1300" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Follow up to NovaSky-AI#1271, introduces ipc for colocated case. Since cuda ipc is not natively supported in 0.16 (will be available in next version), I copied the implementation and registered as a custom engine. This will eventually be removed in favor of the built in version. Tests: - [x] enable colocated tests in `test_policy_local_engines_e2e.py` - [x] new weight sync unit test in `test_weight_sync.py` - [x] e2e CI run: --------- Signed-off-by: ahao-anyscale <ahao@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.