feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3
Open
booth-algo wants to merge 2 commits intomainfrom
Open
feat: add analytic performance models (GEMM latency, ASM profiler, roofline)#3booth-algo wants to merge 2 commits intomainfrom
booth-algo wants to merge 2 commits intomainfrom
Conversation
2 tasks
15c26c9 to
42a0bde
Compare
07c9b13 to
d221c19
Compare
…ofline) - GEMM latency comparison model with RTL validation tables - LLaMA/LLaDA performance model (prefill + decode cycle estimation) - ASM profiler for instruction-level cycle counting - Decoder roofline model (compute vs memory bound analysis) - Minor fixes to utilisation_model.py TOML parsing
d221c19 to
7e91946
Compare
When seq_len < MLEN (e.g. dLLM block_length=32 vs MLEN=2048), the batched GEMM cost M_BTMM = (MLEN/BLEN)^2 * BLEN assumes a full MLEN×MLEN tile, overestimating QKT cost by (MLEN/seq)^2 = 4096x for seq=32. Fix: compute effective tile cost as (ceil(eff_rows/BLEN) * ceil(eff_cols/BLEN)) * BLEN where eff_rows = min(seq_len, MLEN), eff_cols = min(kv_size, MLEN). This reduces LLaDA-8B (seq=32, B=16) transformer layer estimate from 147M to 13M cycles, with attention dropping from 135M to 771K.
cae6f49 to
d15d1f6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Files (9)
All under
analytic_models/— fully independent, no dependencies on other PRs.Test plan
just perf llama-3.1-8b,just asm-profile