Commit d78b3f8

and

committed

Custom quantization guide + quant.h full sync

docs/custom-quantization.md (550+ lines): - Step-by-step: implement 8-bit uniform KV quantization from scratch - Block structure definition, quantize/dequantize/attention functions - Registration in tq_traits.c, Google Test suite template - Verification with score.sh quant.h full sync with latest source: - IQ3_XXS: grid codebook + dequant + fused dot (NEON) - IQ4_NL: NEON tbl lookup optimization - Gemma 4: dual-FFN, QK-norm KV, GeGLU, attention_scale=1.0 - GGUF embedding: output_gguf for large vocab (saves 2.8GB) - Llama 3: EOS tokens (128001/128006/128009) - Thought token filtering (Gemma 4 + Llama 3) - MoE use_gelu flag Verified: cc -std=c11 -DQUANT_IMPLEMENTATION quant.h compiles clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent 0946455 commit d78b3f8Copy full SHA for d78b3f8

2 files changed

docs
- custom-quantization.md
quant.h

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit d78b3f8

File tree

0 commit comments