Skip to content

Commit c17fe24

Browse files
committed
feat(burn): AttentionTable intercept in matmul — O(1) table lookup path
When a compiled attention table is registered for a given d_head dimension, burn's matmul bypasses BLAS entirely and uses precomputed table lookup: matmul(Q, K^T) where K has d_head columns: 1. Check ATTENTION_CACHE for d_head → CompiledAttention 2. Hit: table[q_palette_idx][k_palette_idx] per element (O(1)) 3. Miss: fall through to ndarray::linalg::general_mat_mul (O(d)) API: register_attention_table(d_head, table) — register compiled table has_attention_table(d_head) → bool — check if table exists clear_attention_cache() — remove all tables CompiledAttention: - 256×256 u16 distance table (128KB, fits L1 cache) - q_assignments: per-row palette index (from Base17 projection) - k_assignments: per-col palette index - Pipeline: GGUF weights → dequant → Base17 → palette → table The intercept is transparent: no table registered = BLAS as before. matmul.rs is now a real file (not symlink) since we modified it. 30 tests passing. Zero regressions. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
1 parent 41b3373 commit c17fe24

1 file changed

Lines changed: 467 additions & 1 deletion

File tree

crates/burn/src/ops/matmul.rs

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)