ggml-zendnn : adaptive fallback to CPU backend for small batch sizes by z-sachin · Pull Request #22681 · ggml-org/llama.cpp

z-sachin · 2026-05-04T14:11:14Z

Overview

Introduces an adaptive fallback mechanism in the ZenDNN backend that ensures ZenDNN never regresses against the native CPU backend, and also updates to the latest ZendNN version (ZenDNN-2026-WW17).

Problem
ZenDNN's lowoha::matmul is slower than ggml-cpu for:

Decode phase (N=1, single token generation)
Small prompts
Small inner dimensions (K ≤ 256)

Adds adaptive fallback in supports_op to avoid ZenDNN overhead on small ops — falls back to ggml-cpu when K ≤ 256, N ≤ 128, M ≤ 96, or n_experts > 32 for MoE. Decode phase (N=1) always falls back. Controlled via GGML_ZENDNN_ADAPTIVE_FALLBACK env var, enabled by default, set to 0 to disable.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes, I have used the AI for refactoring the PR content.

…ntrol adaptive fallback (default: enabled)

ggml-gh-bot · 2026-05-04T14:16:09Z

Hi @z-sachin, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

taronaeo · 2026-05-04T16:36:23Z

cc: @z-vishal

z-vishal · 2026-05-05T05:29:44Z

Thanks @z-sachin for the fallback logic, for now this makes sense since ZenDNN is not performing comapre to CPU backend (for now only) on small batch sizes (1 to 96) on 96 threads, currently ZenDNN team is working on this perf issue, will remove this fallback logic in future when ZenDNN catchup on small batch sizes
cc: @amukho @avinashcpandey

ggml-zendnn : add runtime env var GGML_ZENDNN_ADAPTIVE_FALLBACK to co…

a38a4ca

…ntrol adaptive fallback (default: enabled)

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning AMD ZenDNN Issues related to the AMD ZenDNN backend labels May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes#22681

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes#22681
z-sachin wants to merge 1 commit intoggml-org:masterfrom
z-sachin:ggml-zendnn/adaptive-fallback-env

z-sachin commented May 4, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented May 4, 2026

Uh oh!

taronaeo commented May 4, 2026

Uh oh!

z-vishal commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

z-sachin commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

ggml-gh-bot Bot commented May 4, 2026

Uh oh!

taronaeo commented May 4, 2026

Uh oh!

z-vishal commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

z-sachin commented May 4, 2026 •

edited

Loading