Skip to content

Commit 0f1d1a6

Browse files
committed
docs: v0.6.2 — turbo_kv_4bo/3bo per-channel outlier types
CHANGELOG and ROADMAP updates documenting the Variant G outlier handling work. Headline table shows full Pareto landscape across turbo_kv_3b/4b/3bo/5b/4bo with their bytes/block, compression, PPL on Llama 3.2 3B, and production/research status. Closes the per-channel outlier handling item from issue #15.
1 parent 5b5e4b7 commit 0f1d1a6

File tree

2 files changed

+29
-2
lines changed

2 files changed

+29
-2
lines changed

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,32 @@
11
# Changelog
22

3+
## [0.6.2] — 2026-04-08
4+
5+
### Highlights
6+
7+
- **🆕 `turbo_kv_4bo` / `turbo_kv_3bo`** — Per-block outlier handling research types. Each block stores the K=8 channels with the largest |rotated[i]| as exact FP16 values that overwrite the codebook reconstruction at dequant time. This is a simpler local form of the per-channel outlier handling described in the Google TurboQuant paper.
8+
- **Karpathy-loop validation**: per-channel outliers cut the PPL gap **by more than half** on Llama 3.2 3B (4b: +5.3% → 4bo: +2.2%). Effect is model-dependent — see notes below.
9+
- **Issue #15 progress**: closes the per-channel outlier handling exploration item. 5b remains the recommended quality option; 4bo/3bo ship as experimental.
10+
11+
### KV quantization quality (Llama 3.2 3B, FP32 = 13.56 PPL)
12+
13+
| Type | Bytes/block | Compression | PPL | Δ vs FP32 | Status |
14+
|---|---:|---:|---:|---:|---|
15+
| `turbo_kv_3b` | 56 | 9.1× | 15.39 | +13.5% | aggressive |
16+
| `turbo_kv_4b` ⭐ default | 72 | 7.1× | 14.28 | +5.3% | production |
17+
| **`turbo_kv_3bo`** 🧪 | 80 | 6.4× | 14.03 | +3.5% | research |
18+
| **`turbo_kv_5b`** 🏆 quality | 88 | 5.8× | **13.60** | **+0.34%** | production |
19+
| **`turbo_kv_4bo`** 🧪 | 96 | 5.3× | 13.86 | +2.2% | research |
20+
21+
### Notes on the outlier types
22+
23+
Per-channel outlier handling is **data-dependent**:
24+
- On Llama 3.2 3B (head_dim=128, heavier tails), `3bo` Pareto-improves over `4b`
25+
- On SmolLM2 135M (smaller dimensions), `3bo` regresses past `4b` because the 3-bit base is too coarse
26+
- `4bo` is dominated by `5b` on both models — slightly bigger and slightly worse
27+
28+
Until per-model auto-selection is implemented, the Pareto-optimal recommendations remain `turbo_kv_4b` (default) and `turbo_kv_5b` (quality). The outlier types are exposed for researchers and benchmarking.
29+
330
## [0.6.1] — 2026-04-08
431

532
### Highlights

ROADMAP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,8 @@ A C reference engine for KV cache quantization research.
7070
- [x] Identify the gap in literal port (commit 4da6915 — QJL contributes byte-identical zero)
7171
- [x] Variant F: drop QJL stage, double codebook size (commit ac3c46a — beats baseline)
7272
- [x] 5-bit codebook variant for ~5 bpc quality budget (commit 87e14cb)
73-
- [x] Regression tests pinning quality (commit on this release)
74-
- [ ] Per-channel outlier handling (Google paper's 32-channel split) — issue #15
73+
- [x] Regression tests pinning quality (commit 475872c)
74+
- [x] Per-channel outlier handling (turbo_kv_4bo/3bo, commits 4576910 + 5b5e4b7) — model-dependent, ships as research types; 5b remains the simpler quality champion
7575
- [ ] Paper-faithful Llama 3.1 8B + LongBench-E reproduction — issue #15
7676

7777
### Planned (after Direction 2 reproduction)

0 commit comments

Comments
 (0)