Skip to content

Commit d78b3f8

Browse files
unamedkrclaude
andcommitted
Custom quantization guide + quant.h full sync
docs/custom-quantization.md (550+ lines): - Step-by-step: implement 8-bit uniform KV quantization from scratch - Block structure definition, quantize/dequantize/attention functions - Registration in tq_traits.c, Google Test suite template - Verification with score.sh quant.h full sync with latest source: - IQ3_XXS: grid codebook + dequant + fused dot (NEON) - IQ4_NL: NEON tbl lookup optimization - Gemma 4: dual-FFN, QK-norm KV, GeGLU, attention_scale=1.0 - GGUF embedding: output_gguf for large vocab (saves 2.8GB) - Llama 3: EOS tokens (128001/128006/128009) - Thought token filtering (Gemma 4 + Llama 3) - MoE use_gelu flag Verified: cc -std=c11 -DQUANT_IMPLEMENTATION quant.h compiles clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0946455 commit d78b3f8

2 files changed

Lines changed: 520 additions & 17 deletions

File tree

0 commit comments

Comments
 (0)