Commit d78b3f8
Custom quantization guide + quant.h full sync
docs/custom-quantization.md (550+ lines):
- Step-by-step: implement 8-bit uniform KV quantization from scratch
- Block structure definition, quantize/dequantize/attention functions
- Registration in tq_traits.c, Google Test suite template
- Verification with score.sh
quant.h full sync with latest source:
- IQ3_XXS: grid codebook + dequant + fused dot (NEON)
- IQ4_NL: NEON tbl lookup optimization
- Gemma 4: dual-FFN, QK-norm KV, GeGLU, attention_scale=1.0
- GGUF embedding: output_gguf for large vocab (saves 2.8GB)
- Llama 3: EOS tokens (128001/128006/128009)
- Thought token filtering (Gemma 4 + Llama 3)
- MoE use_gelu flag
Verified: cc -std=c11 -DQUANT_IMPLEMENTATION quant.h compiles clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 0946455 commit d78b3f8
2 files changed
Lines changed: 520 additions & 17 deletions
0 commit comments