Skip to content

feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3

Open
booth-algo wants to merge 2 commits intomainfrom
kev/aten-analytic
Open

feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3
booth-algo wants to merge 2 commits intomainfrom
kev/aten-analytic

Conversation

@booth-algo
Copy link
Copy Markdown
Collaborator

@booth-algo booth-algo commented Mar 31, 2026

Summary

  • GEMM latency comparison model with RTL validation tables
  • LLaMA/LLaDA performance model (prefill + decode cycle estimation)
  • ASM profiler for instruction-level cycle counting
  • Decoder roofline model (compute vs memory bound analysis)
  • Minor fixes to utilisation_model.py TOML parsing

Files (9)

All under analytic_models/ — fully independent, no dependencies on other PRs.

Test plan

  • No runtime dependencies on testbench or ops code
  • Manual: just perf llama-3.1-8b, just asm-profile

⚠️ nix-build CI job fails due to pre-existing DNS issue (libtorch download in nix sandbox). Unrelated to this PR.

…ofline)

- GEMM latency comparison model with RTL validation tables
- LLaMA/LLaDA performance model (prefill + decode cycle estimation)
- ASM profiler for instruction-level cycle counting
- Decoder roofline model (compute vs memory bound analysis)
- Minor fixes to utilisation_model.py TOML parsing
When seq_len < MLEN (e.g. dLLM block_length=32 vs MLEN=2048), the batched
GEMM cost M_BTMM = (MLEN/BLEN)^2 * BLEN assumes a full MLEN×MLEN tile,
overestimating QKT cost by (MLEN/seq)^2 = 4096x for seq=32.

Fix: compute effective tile cost as (ceil(eff_rows/BLEN) * ceil(eff_cols/BLEN))
* BLEN where eff_rows = min(seq_len, MLEN), eff_cols = min(kv_size, MLEN).

This reduces LLaDA-8B (seq=32, B=16) transformer layer estimate from
147M to 13M cycles, with attention dropping from 135M to 771K.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant