feat: add analytic performance models (GEMM latency, ASM profiler, roofline) by booth-algo · Pull Request #3 · AICrossSim/PLENA_Simulator

booth-algo · 2026-03-31T12:29:10Z

Summary

GEMM latency comparison model with RTL validation tables
LLaMA/LLaDA performance model (prefill + decode cycle estimation)
ASM profiler for instruction-level cycle counting
Decoder roofline model (compute vs memory bound analysis)
Minor fixes to utilisation_model.py TOML parsing

Files (9)

All under analytic_models/ — fully independent, no dependencies on other PRs.

Test plan

No runtime dependencies on testbench or ops code
Manual: just perf llama-3.1-8b, just asm-profile

⚠️ nix-build CI job fails due to pre-existing DNS issue (libtorch download in nix sandbox). Unrelated to this PR.

…ofline) - GEMM latency comparison model with RTL validation tables - LLaMA/LLaDA performance model (prefill + decode cycle estimation) - ASM profiler for instruction-level cycle counting - Decoder roofline model (compute vs memory bound analysis) - Minor fixes to utilisation_model.py TOML parsing

When seq_len < MLEN (e.g. dLLM block_length=32 vs MLEN=2048), the batched GEMM cost M_BTMM = (MLEN/BLEN)^2 * BLEN assumes a full MLEN×MLEN tile, overestimating QKT cost by (MLEN/seq)^2 = 4096x for seq=32. Fix: compute effective tile cost as (ceil(eff_rows/BLEN) * ceil(eff_cols/BLEN)) * BLEN where eff_rows = min(seq_len, MLEN), eff_cols = min(kv_size, MLEN). This reduces LLaDA-8B (seq=32, B=16) transformer layer estimate from 147M to 13M cycles, with attention dropping from 135M to 771K.

booth-algo mentioned this pull request Mar 31, 2026

feat: testbench core infrastructure (PLENAProgram, DeveloperCompiler, SubMatrixManager) #4

Open

2 tasks

booth-algo force-pushed the kev/aten-analytic branch from 15c26c9 to 42a0bde Compare March 31, 2026 12:39

booth-algo mentioned this pull request Mar 31, 2026

feat: ATen-style operator dispatch, compiler, and testbench infrastructure #1

Closed

booth-algo force-pushed the kev/aten-analytic branch 2 times, most recently from 07c9b13 to d221c19 Compare March 31, 2026 13:43

booth-algo force-pushed the kev/aten-analytic branch from d221c19 to 7e91946 Compare March 31, 2026 14:23

booth-algo requested a review from GeorgeWu1204 March 31, 2026 14:24

booth-algo force-pushed the kev/aten-analytic branch from cae6f49 to d15d1f6 Compare April 1, 2026 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3

feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3
booth-algo wants to merge 2 commits intomainfrom
kev/aten-analytic

booth-algo commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

booth-algo commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files (9)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

booth-algo commented Mar 31, 2026 •

edited

Loading