[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648
[OpenVINO] Support Kimi 2.5 (kimi_k25) Export and Inference Resolves issue #1647#1648MissLostCodes wants to merge 4 commits intohuggingface:mainfrom
Conversation
|
Hi @rkazants and @popovaan! Why do some complex multi-modal architectures strictly require custom dummy generators, while others inherit perfectly from their parent classes?For example: Contrast this with Qwen 3.5, which strictly required the heavily customized Qwen3_5DummyPastKeyValuesGenerator to be written from scratch Is this need for custom generators strictly dictated by the novelty of internal memory states (like Qwen 3.5's hybrid linear/full attention cache parameters), or are there other specific architectural quirks that force us to write custom dummy generators instead of relying on standard inheritance? |
|
Hi @rkazants, I tried to follow the same methodology used in your previous PR- Qwen support while adding support for this model in OpenVINO. Going through your approach really helped me better understand the overall architecture and integration pattern for Hugging Face models . I’d really appreciate it if you could take a look at this PR and share your feedback. Please let me know if there’s anything I might have missed or could improve. Thanks. |
|
GOOD WORK!! |
What does this PR do?
This PR introduces OpenVINO export and inference support for the Kimi k2.5 Vision-Language Model (kimi_k25).
HF Model ID : moonshotai/Kimi-K2.5
Because Kimi k2.5 is a highly complex multi-modal architecture . Instead of introducing thousands of lines of redundant dummy input generators and custom patchers, this PR dynamically routes Kimi's sub-components to existing, heavily tested OpenVINO configurations.
Architectural Design & Routing
By analyzing the KimiK25Config, the model structure maps perfectly to existing components in the optimum-intel library:
1. Vision Components (Spatio-Temporal ViT)
Kimi's vision encoder and patch-merging structure strongly align with the Qwen 3 Vision architecture.
Implementation:
KimiK25OpenVINOConfig directly inherits from Qwen3VLOpenVINOConfig.
Result:
All vision-related behaviors (VISION_EMBEDDINGS, VISION_EMBEDDINGS_MERGER, VISION_EMBEDDINGS_POS) are gracefully routed back to the parent class.
This perfectly reuses Qwen's DummyQwen3VLVisionEmbedInputGenerator (for handling window_index, rotary_pos_emb, etc.) without requiring custom vision patchers.
2. **Text & Language Components (Deepseek V3 MoE) **
According to the Kimi configuration ("architectures": ["DeepseekV3ForCausalLM"]), its text backbone is a Deepseek V3 Mixture of Experts (MoE) model.
Implementation:
Inside the with_behavior router, both TEXT_EMBEDDINGS and LANGUAGE behaviors are redirected specifically to the "deepseek_v3" string identifier.
Result:
The text generation pipeline perfectly reuses the highly optimized DeepseekOpenVINOConfig and DeepseekPatcher.
We avoid manually rewriting the complex routing weights, expert gating, and shared expert layers.
Note on Stateful Patching:
Since Deepseek V3 uses standard attention mechanisms (not a State Space Model like Mamba or LFM2), kimi_k25 was intentionally omitted from the SSM_MODELS list in stateful.py to ensure standard patch_stateful_decoder() handles the KV-cache correctly.
Resolves issue ##1647