kvcache-optimization

Here are 4 public repositories matching this topic...

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated May 11, 2026
Python

jjang-ai / vmlx

Star

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

macbook persistent-memory mlx openai-api llm lmstudio anthropic-api mcp-server kvcache-optimization kvcache-compression openclaw kvcache-reuse openclaw-agent prefix-cache mlxllm mlxstudio vmlx omlx omlx-alternative

Updated May 12, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

NazmulTakbir / FlexiCache

Star

[MLSys-26] FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management

vllm llm-inference kvcache kvcache-optimization

Updated Mar 9, 2026
Python

Improve this page

Add a description, image, and links to the kvcache-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kvcache-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcache-optimization

Here are 4 public repositories matching this topic...

ovg-project / kvcached

jjang-ai / vmlx

amitshekhariitbhu / turboquant-experiment

NazmulTakbir / FlexiCache

Improve this page

Add this topic to your repo