Skip to content

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes#22681

Open
z-sachin wants to merge 1 commit intoggml-org:masterfrom
z-sachin:ggml-zendnn/adaptive-fallback-env
Open

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes#22681
z-sachin wants to merge 1 commit intoggml-org:masterfrom
z-sachin:ggml-zendnn/adaptive-fallback-env

Conversation

@z-sachin
Copy link
Copy Markdown

@z-sachin z-sachin commented May 4, 2026

Overview

Introduces an adaptive fallback mechanism in the ZenDNN backend that ensures ZenDNN never regresses against the native CPU backend, and also updates to the latest ZendNN version (ZenDNN-2026-WW17).

Problem
ZenDNN's lowoha::matmul is slower than ggml-cpu for:

  • Decode phase (N=1, single token generation)
  • Small prompts
  • Small inner dimensions (K ≤ 256)

Adds adaptive fallback in supports_op to avoid ZenDNN overhead on small ops — falls back to ggml-cpu when K ≤ 256, N ≤ 128, M ≤ 96, or n_experts > 32 for MoE. Decode phase (N=1) always falls back. Controlled via GGML_ZENDNN_ADAPTIVE_FALLBACK env var, enabled by default, set to 0 to disable.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, I have used the AI for refactoring the PR content.

@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented May 4, 2026

Hi @z-sachin, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning AMD ZenDNN Issues related to the AMD ZenDNN backend labels May 4, 2026
@taronaeo
Copy link
Copy Markdown
Member

taronaeo commented May 4, 2026

cc: @z-vishal

@z-vishal
Copy link
Copy Markdown
Contributor

z-vishal commented May 5, 2026

Thanks @z-sachin for the fallback logic, for now this makes sense since ZenDNN is not performing comapre to CPU backend (for now only) on small batch sizes (1 to 96) on 96 threads, currently ZenDNN team is working on this perf issue, will remove this fallback logic in future when ZenDNN catchup on small batch sizes
cc: @amukho @avinashcpandey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMD ZenDNN Issues related to the AMD ZenDNN backend ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants