chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b by AlpinDale · Pull Request #1666 · dphnAI/aphrodite-engine

AlpinDale · 2026-05-02T04:39:09Z

No description provided.

gemini-code-assist

Code Review

This pull request adds support for several new models, including CohereMoe, Laguna, MiMoV2Omni, and Moondream3, along with their respective reasoning and tool parsers. Key performance improvements include multi-stream GEMM overlap for DeepSeek-V4 and the introduction of ROCm AITER fusion passes. Additionally, it enables prompt_embeds in chat completions and introduces a system fingerprinting feature. Feedback highlights two significant concerns: the removal of early quant_dtype validation in all-to-all utilities, which may cause inefficient resource allocation before an error occurs, and the removal of router_logits_dtype validation in the NVFP4 MoE implementation, potentially leading to silent failures or incorrect computations when incompatible types are used.

gemini-code-assist · 2026-05-02T04:43:32Z

        max_num_tokens = get_current_aphrodite_config().scheduler_config.max_num_batched_tokens
+        if quant_config.quant_dtype is None:
+            dispatch_dtype_bytes_per_elem = 2
+            dispatch_scale_bytes_per_token = 0
+        elif quant_config.quant_dtype == "nvfp4":
+            dispatch_dtype_bytes_per_elem = 0
+            dispatch_scale_bytes_per_token = moe.hidden_dim // 16
+        elif quant_config.quant_dtype == "mxfp8":


The previous check if quant_config.quant_dtype != "nvfp4": was removed, which could lead to runtime errors if an unsupported quant_dtype is passed to FlashInferNVLinkOneSidedPrepareAndFinalize. While a NotImplementedError is added later, it is better to validate this early to avoid unnecessary setup.

gemini-code-assist · 2026-05-02T04:43:32Z

    def _supports_router_logits_dtype(
        router_logits_dtype: torch.dtype | None,
        routing_method: RoutingMethodType,
    ) -> bool:
-        """
-        The FlashInfer TRTLLM NvFp4 kernel expects bfloat16 router_logits by default.
-        DeepSeekV3 routing supports float32 router_logits (converted internally).
-        Simulated routing generates synthetic decisions and is agnostic to dtype.
-        """
-        if router_logits_dtype == torch.float32:
-            # DeepSeekV3 routing handles float32 logits internally.
-            # Simulated routing generates synthetic decisions, so the
-            # kernel doesn't care about the actual logits dtype.
-            # https://github.com/flashinfer-ai/flashinfer/issues/2469
-            return routing_method in (
-                RoutingMethodType.DeepSeekV3,
-                RoutingMethodType.Simulated,
-            )
        return True


The _supports_router_logits_dtype method now unconditionally returns True, removing the validation that previously ensured router_logits_dtype was compatible with the kernel. This could lead to silent failures or incorrect computations if incompatible dtypes are used.

chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b

9bbcb88

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

AlpinDale merged commit 18f852d into main May 2, 2026
1 check failed

AlpinDale deleted the sync/vllm-0c99629 branch May 2, 2026 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b#1666

chore: sync to upstream 985961345a13f3e3bb15d29c94b011ba9a6b858b#1666
AlpinDale merged 1 commit into
mainfrom
sync/vllm-0c99629

AlpinDale commented May 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlpinDale commented May 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant