Skip to content

[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648

Open
MissLostCodes wants to merge 4 commits intohuggingface:mainfrom
MissLostCodes:add-kimik2.5-support
Open

[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648
MissLostCodes wants to merge 4 commits intohuggingface:mainfrom
MissLostCodes:add-kimik2.5-support

Conversation

@MissLostCodes
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR introduces OpenVINO export and inference support for the Kimi k2.5 Vision-Language Model (kimi_k25).
HF Model ID : moonshotai/Kimi-K2.5

Because Kimi k2.5 is a highly complex multi-modal architecture . Instead of introducing thousands of lines of redundant dummy input generators and custom patchers, this PR dynamically routes Kimi's sub-components to existing, heavily tested OpenVINO configurations.

Architectural Design & Routing

By analyzing the KimiK25Config, the model structure maps perfectly to existing components in the optimum-intel library:

1. Vision Components (Spatio-Temporal ViT)

Kimi's vision encoder and patch-merging structure strongly align with the Qwen 3 Vision architecture.

Implementation:

KimiK25OpenVINOConfig directly inherits from Qwen3VLOpenVINOConfig.

Result:

All vision-related behaviors (VISION_EMBEDDINGS, VISION_EMBEDDINGS_MERGER, VISION_EMBEDDINGS_POS) are gracefully routed back to the parent class.

This perfectly reuses Qwen's DummyQwen3VLVisionEmbedInputGenerator (for handling window_index, rotary_pos_emb, etc.) without requiring custom vision patchers.

2. **Text & Language Components (Deepseek V3 MoE) **

According to the Kimi configuration ("architectures": ["DeepseekV3ForCausalLM"]), its text backbone is a Deepseek V3 Mixture of Experts (MoE) model.

Implementation:

Inside the with_behavior router, both TEXT_EMBEDDINGS and LANGUAGE behaviors are redirected specifically to the "deepseek_v3" string identifier.

Result:

The text generation pipeline perfectly reuses the highly optimized DeepseekOpenVINOConfig and DeepseekPatcher.
We avoid manually rewriting the complex routing weights, expert gating, and shared expert layers.

Note on Stateful Patching:

Since Deepseek V3 uses standard attention mechanisms (not a State Space Model like Mamba or LFM2), kimi_k25 was intentionally omitted from the SSM_MODELS list in stateful.py to ensure standard patch_stateful_decoder() handles the KV-cache correctly.

Resolves issue ##1647

@MissLostCodes
Copy link
Copy Markdown
Contributor Author

Hi @rkazants and @popovaan!
While working on this implementation, I ran into a conceptual question about our exporter design that I was hoping you could clarify.

Why do some complex multi-modal architectures strictly require custom dummy generators, while others inherit perfectly from their parent classes?

For example:
Hunyuan V1 Dense flawlessly inherits from LlamaOpenVINOConfig and cleanly reuses the existing GemmaDummyPastKeyValuesGenerator
.
Kimi 2.5 (in my PR) seamlessly inherits from Qwen3VLOpenVINOConfig and routes to existing DeepSeek components without needing any new custom dummy generators.

Contrast this with Qwen 3.5, which strictly required the heavily customized Qwen3_5DummyPastKeyValuesGenerator to be written from scratch
.

Is this need for custom generators strictly dictated by the novelty of internal memory states (like Qwen 3.5's hybrid linear/full attention cache parameters), or are there other specific architectural quirks that force us to write custom dummy generators instead of relying on standard inheritance?

@MissLostCodes
Copy link
Copy Markdown
Contributor Author

Hi @rkazants,

I tried to follow the same methodology used in your previous PR- Qwen support while adding support for this model in OpenVINO. Going through your approach really helped me better understand the overall architecture and integration pattern for Hugging Face models .

I’d really appreciate it if you could take a look at this PR and share your feedback. Please let me know if there’s anything I might have missed or could improve.

Thanks.

@feilonguu
Copy link
Copy Markdown

GOOD WORK!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants