manishklach / intent-attention-kernel Star 1 Code Issues Pull requests Intent-aware KV execution prototype for agentic long-context inference: semantic block selection, dynamic scoring, KV quantization modeling, speculative prefetch simulation, CPU references, and future Triton/CUDA kernels. cost-model triton memory-bandwidth gpu-kernels mixed-precision prefetching inference-optimization kv-cache sparse-attention long-context paged-attention semantic-routing agentic-ai block-sparse-attention kernel-research block-attention semantic-attention kv-quantization paged-kv kv-cache-optimization Updated May 29, 2026 Python