[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647 by MissLostCodes · Pull Request #1648 · huggingface/optimum-intel

MissLostCodes · 2026-03-24T16:05:52Z

What does this PR do?

This PR introduces OpenVINO export and inference support for the Kimi k2.5 Vision-Language Model (kimi_k25).
HF Model ID : moonshotai/Kimi-K2.5

Because Kimi k2.5 is a highly complex multi-modal architecture . Instead of introducing thousands of lines of redundant dummy input generators and custom patchers, this PR dynamically routes Kimi's sub-components to existing, heavily tested OpenVINO configurations.

Architectural Design & Routing

By analyzing the KimiK25Config, the model structure maps perfectly to existing components in the optimum-intel library:

1. Vision Components (Spatio-Temporal ViT)

Kimi's vision encoder and patch-merging structure strongly align with the Qwen 3 Vision architecture.

Implementation:

KimiK25OpenVINOConfig directly inherits from Qwen3VLOpenVINOConfig.

Result:

All vision-related behaviors (VISION_EMBEDDINGS, VISION_EMBEDDINGS_MERGER, VISION_EMBEDDINGS_POS) are gracefully routed back to the parent class.

This perfectly reuses Qwen's DummyQwen3VLVisionEmbedInputGenerator (for handling window_index, rotary_pos_emb, etc.) without requiring custom vision patchers.

2. Text & Language Components (Deepseek V3 MoE)

According to the Kimi configuration ("architectures": ["DeepseekV3ForCausalLM"]), its text backbone is a Deepseek V3 Mixture of Experts (MoE) model.

Implementation:

Inside the with_behavior router, both TEXT_EMBEDDINGS and LANGUAGE behaviors are redirected specifically to the "deepseek_v3" string identifier.

Result:

The text generation pipeline perfectly reuses the highly optimized DeepseekOpenVINOConfig and DeepseekPatcher.
We avoid manually rewriting the complex routing weights, expert gating, and shared expert layers.

Note on Stateful Patching:

Since Deepseek V3 uses standard attention mechanisms (not a State Space Model like Mamba or LFM2), kimi_k25 was intentionally omitted from the SSM_MODELS list in stateful.py to ensure standard patch_stateful_decoder() handles the KV-cache correctly.

Resolves issue ##1647

MissLostCodes · 2026-03-25T09:56:09Z

Hi @rkazants and @popovaan!
While working on this implementation, I ran into a conceptual question about our exporter design that I was hoping you could clarify.

Why do some complex multi-modal architectures strictly require custom dummy generators, while others inherit perfectly from their parent classes?

For example:
Hunyuan V1 Dense flawlessly inherits from LlamaOpenVINOConfig and cleanly reuses the existing GemmaDummyPastKeyValuesGenerator
.
Kimi 2.5 (in my PR) seamlessly inherits from Qwen3VLOpenVINOConfig and routes to existing DeepSeek components without needing any new custom dummy generators.

Contrast this with Qwen 3.5, which strictly required the heavily customized Qwen3_5DummyPastKeyValuesGenerator to be written from scratch
.

Is this need for custom generators strictly dictated by the novelty of internal memory states (like Qwen 3.5's hybrid linear/full attention cache parameters), or are there other specific architectural quirks that force us to write custom dummy generators instead of relying on standard inheritance?

MissLostCodes · 2026-03-27T18:44:14Z

Hi @rkazants,

I tried to follow the same methodology used in your previous PR- Qwen support while adding support for this model in OpenVINO. Going through your approach really helped me better understand the overall architecture and integration pattern for Hugging Face models .

I’d really appreciate it if you could take a look at this PR and share your feedback. Please let me know if there’s anything I might have missed or could improve.

Thanks.

feilonguu · 2026-03-28T05:51:11Z

GOOD WORK!!

MissLostCodes added 4 commits March 23, 2026 22:18

Improve numpy conversion handling for model inputs

2f79c54

added- support for kimi k2.5 in openvino

ec23551

Add .idea to .gitignore

fbc752f

Revert unintended changes in modeling.py and utils.py

64c4d5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648

[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648
MissLostCodes wants to merge 4 commits intohuggingface:mainfrom
MissLostCodes:add-kimik2.5-support

MissLostCodes commented Mar 24, 2026

Uh oh!

MissLostCodes commented Mar 25, 2026

Uh oh!

MissLostCodes commented Mar 27, 2026

Uh oh!

feilonguu commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MissLostCodes commented Mar 24, 2026

What does this PR do?

Architectural Design & Routing

1. Vision Components (Spatio-Temporal ViT)

Implementation:

Result:

2. **Text & Language Components (Deepseek V3 MoE) **

Implementation:

Result:

Note on Stateful Patching:

Uh oh!

MissLostCodes commented Mar 25, 2026

Why do some complex multi-modal architectures strictly require custom dummy generators, while others inherit perfectly from their parent classes?

Is this need for custom generators strictly dictated by the novelty of internal memory states (like Qwen 3.5's hybrid linear/full attention cache parameters), or are there other specific architectural quirks that force us to write custom dummy generators instead of relying on standard inheritance?

Uh oh!

MissLostCodes commented Mar 27, 2026

Uh oh!

feilonguu commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. Text & Language Components (Deepseek V3 MoE)