metal : promote mul_mv/mul_mm batch divisors to function constants by guyfischman · Pull Request #22711 · ggml-org/llama.cpp

guyfischman · 2026-05-05T11:21:17Z

Overview

The batch-dimension fold i12 = im % args.ne12; i13 = im / args.ne12; and the GQA divisors i12 / args.r2, i13 / args.r3 execute on every dispatch of every mat-vec / mat-mat kernel. On Apple AGX, runtime int divmod is a ~700-cycle software path, while compile-time-known divisors fold to either a no-op (ne12=1), a shift (power-of-two divisor), or a magic-multiply (any other constant).

This PR promotes ne12, r2, r3 (and ne13 for mul_mm) to function constants so they are baked into the PSO at compile time:

FC_mul_mm_ne12/ne13/r2/r3 - bound by mul_mm pipeline from op shape
FC_mul_mv_ne12/r2/r3 - bound by mul_mv and mul_mv_ext pipelines from op shape; mul_mv_id binds to 1 since kernel_mul_mv_id wraps the impls with args0 = { ne12=1, r2=1, r3=1 }
The pipeline-cache key now includes ne12/r2/r3 so a Gemma-2 GQA PSO is not reused for a TinyLlama (ne12=1) op or vice versa.

This is the metal analog of #22650.

Additional information

Substituted args.ne12/r2/r3 with the FCs in 23 mul_mv impl sites, both mul_mv_ext impls, and both kernel_mul_mm variants (tensor and non-tensor branches).
Other kernels using args.ne12 (kernel_bin_fuse, kernel_soft_max, kernel_set_rows) are built by other pipelines that do not bind these FCs so they're unchanged.

The four FC values are bound as int16_t to match existing convention (nsg, nxpsg). Pipeline-cache call sites assert ne12/ne13/r2/r3 ≤ INT16_MAX before the cast to guard against silent wrap on edge cases like very large prompt batches. No-op for typical inference.

Correctness

./build/bin/test-backend-ops -b Metal
3/3 backends passed
Also verified bit-identicalness with llama-perplexity on all below models.

Performance

Apple M4 Pro, llama-bench -p 512 -n 128 -r 20, tg128 in t/s, master → patched:

TinyLlama Q4_0: 239.05 ± 1.73 → 246.57 ± 0.93 (+3.15%)
Llama 3.2 1B Q4_0: 227.58 ± 1.84 → 230.96 ± 2.65 (+1.49%, within noise)
Gemma 3 1B Q4_K_M (GQA): 164.67 ± 4.59 → 173.52 ± 0.75 (+5.37%)
Gemma 2 2B Q4_K_M (GQA): 103.41 ± 0.39 → 106.66 ± 0.24 (+3.14%)
Mistral 7B Q4_0 (GQA): 52.10 ± 0.08 → 52.73 ± 0.07 (+1.21%)
pp512 is unchanged across all five models (within ±1% noise). mul_mv is not on the prompt path; mul_mm shapes that hit the new FCs are too few to surface above bench noise.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Finding all instances of the antipattern, and looking for prior art (PR#22650)

ggml-gh-bot · 2026-05-05T11:25:24Z

Hi @guyfischman, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

guyfischman · 2026-05-05T15:57:23Z

For reference I (using AI) made this PoC to see when literal vs FC wins and by how much over kernel arg/uniform buffers - https://github.com/SovereignSoft/agx-idiv-demo

metal : promote mul_mv/mul_mm batch divisors to function constants

e449c2b

guyfischman requested a review from a team as a code owner May 5, 2026 11:21

ggerganov self-assigned this May 5, 2026

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : promote mul_mv/mul_mm batch divisors to function constants#22711

metal : promote mul_mv/mul_mm batch divisors to function constants#22711
guyfischman wants to merge 1 commit intoggml-org:masterfrom
guyfischman:metal-idiv-elim

guyfischman commented May 5, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented May 5, 2026

Uh oh!

guyfischman commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guyfischman commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Correctness

Performance

Requirements

Uh oh!

ggml-gh-bot Bot commented May 5, 2026

Uh oh!

guyfischman commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guyfischman commented May 5, 2026 •

edited

Loading