Commit c17fe24
committed
feat(burn): AttentionTable intercept in matmul — O(1) table lookup path
When a compiled attention table is registered for a given d_head dimension,
burn's matmul bypasses BLAS entirely and uses precomputed table lookup:
matmul(Q, K^T) where K has d_head columns:
1. Check ATTENTION_CACHE for d_head → CompiledAttention
2. Hit: table[q_palette_idx][k_palette_idx] per element (O(1))
3. Miss: fall through to ndarray::linalg::general_mat_mul (O(d))
API:
register_attention_table(d_head, table) — register compiled table
has_attention_table(d_head) → bool — check if table exists
clear_attention_cache() — remove all tables
CompiledAttention:
- 256×256 u16 distance table (128KB, fits L1 cache)
- q_assignments: per-row palette index (from Base17 projection)
- k_assignments: per-col palette index
- Pipeline: GGUF weights → dequant → Base17 → palette → table
The intercept is transparent: no table registered = BLAS as before.
matmul.rs is now a real file (not symlink) since we modified it.
30 tests passing. Zero regressions.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o71 parent 41b3373 commit c17fe24
1 file changed
Lines changed: 467 additions & 1 deletion
This file was deleted.
0 commit comments