Skip to content

[OpenVINO] Support Qwen3.5 and Qwen3.5-MoE#1634

Closed
rkazants wants to merge 13 commits intohuggingface:transformers-v5from
rkazants:support_qwen3_5
Closed

[OpenVINO] Support Qwen3.5 and Qwen3.5-MoE#1634
rkazants wants to merge 13 commits intohuggingface:transformers-v5from
rkazants:support_qwen3_5

Conversation

@rkazants
Copy link
Copy Markdown
Collaborator

@rkazants rkazants commented Mar 8, 2026

What does this PR do?

Fixes 181271, 181280, 182003

Installation instructions:

pip install git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8B

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "Qwen/Qwen3.5-0.8B"

processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copilot AI and others added 4 commits March 8, 2026 22:37
Add conversion rule for the RecurrentAttentionCellOp operation used
for GatedDeltaNet patching in OpenVINO PyTorch frontend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
@savvadesogle
Copy link
Copy Markdown

Thank you!! 🙏♥️😊

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
@ikirsh
Copy link
Copy Markdown

ikirsh commented Mar 13, 2026

Can we ensure this PR includes a hardware compatibility check for the Core Ultra 200 series (245 through 285) and other Xe platforms?

Previous OpenVino MoE optimizations have caused kernel-level failures on these platforms without any documented warnings. We need to verify that this PR either provides full support or—at a minimum—documented and implements a graceful exit/error message rather than a system crash.

See this issue:

  • gpt-oss-20b-int4-ov runs on CPU but triggers OOM on iGPU #34416

and related issues:

  • qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187
  • Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU #34415

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
@rkazants rkazants changed the title [OpenVINO] Support Qwen3.5 [OpenVINO] Support Qwen3.5 and Qwen3.5-MoE Mar 22, 2026
@rkazants rkazants linked an issue Mar 23, 2026 that may be closed by this pull request
@lzhu41
Copy link
Copy Markdown

lzhu41 commented Mar 27, 2026

Hi, guys, when will this PR be merged? Qwen3.5 dense and MoE are the most important models currently in PRC, which has big biz impact in BU. Thank you!

@malasy
Copy link
Copy Markdown

malasy commented Apr 5, 2026

ValueError: Asked to export a qwen3_5 model for the task text-generation-with-past, but the Optimum OpenVINO exporter only supports the tasks image-text-to-text for qwen3_5. Please use a supported task. Please open an issue at https://github.com/huggingface/optimum-intel/issues if you would like the task text-generation-with-past to be supported in the OpenVINO export for qwen3_5.

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
@echarlaix echarlaix deleted the branch huggingface:transformers-v5 April 15, 2026 07:21
@echarlaix echarlaix closed this Apr 15, 2026
@WizardlyBump17
Copy link
Copy Markdown

Hello. I see this Pull Request and the Gemma 4 one were closed. Any reasons? Will we be seeing those models on OpenVino any time soon?

@sund00bie
Copy link
Copy Markdown

Hello. I see this Pull Request and the Gemma 4 one were closed. Any reasons? Will we be seeing those models on OpenVino any time soon?

These are the prs you'll need to follow now

openvinotoolkit/openvino.genai#3644
openvinotoolkit/openvino.genai#3717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5 Family Support ❤️

9 participants