Skip to content

feat: add TurboQuantStaticCache variant with pre-allocated buffers#1

Closed
tengomucho wants to merge 1 commit intoback2matching:mainfrom
tengomucho:static_turbo_cache
Closed

feat: add TurboQuantStaticCache variant with pre-allocated buffers#1
tengomucho wants to merge 1 commit intoback2matching:mainfrom
tengomucho:static_turbo_cache

Conversation

@tengomucho
Copy link
Copy Markdown

Implements TurboQuantStaticLayer and TurboQuantStaticCache that pre-allocate all memory (compressed indices, norms, residual FP16, output buffers) at init. Zero dynamic growth during generation, allowing a predictable VRAM budget.

  • New file: static_cache.py
  • 17 tests in test_static_cache.py
  • Exported TurboQuantStaticCache from init.py

Implements TurboQuantStaticLayer and TurboQuantStaticCache that
pre-allocate all memory (compressed indices, norms, residual FP16,
output buffers) at init. Zero dynamic growth during generation, allowing
a predictable VRAM budget.

- New file: static_cache.py
- 17 tests in test_static_cache.py
- Exported TurboQuantStaticCache from __init__.py
@tengomucho
Copy link
Copy Markdown
Author

It seems the author is not interested by this, closing

@tengomucho tengomucho closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant