Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
-
Updated
May 11, 2026 - Python
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.
[MLSys-26] FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management
Add a description, image, and links to the kvcache-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kvcache-optimization topic, visit your repo's landing page and select "manage topics."