Commit 2899fb8
fix(gemma4): numeric comparison with MLX-LM — divergence after layer 0
Layer-by-layer comparison with MLX-LM (google/gemma-4-E2B-it BF16):
Embedding (BOS token 2):
MLX: -1.6406, -1.5312, 0.1885, -1.4844
Ours: -1.6290, -1.5228, 0.1948, -1.4874
Diff: < 0.012 (Q5_0 vs BF16 quantization noise) ✅
Attn norm output (layer 0):
MLX: -10.5625, -8.3125, 1.375, -12.1875
Ours: -10.4733, -8.3217, 1.4276, -12.2401
Diff: < 0.1 ✅
Q projection (layer 0):
MLX: -4.375, 21.25, -0.797, 5.125
Ours: -4.306, 21.226, -0.711, 5.157
Diff: < 0.1 ✅
K projection (layer 0):
MLX: 2.547, 3.141, -0.029, 1.133
Ours: 2.298, 3.182, 0.165, 1.169
Diff: < 0.25 (slightly larger but within Q8_0 tolerance)
FINAL LOGITS (last position):
MLX logits[100] (<|channel>): 22.88 (TOP-1)
Ours logits[100]: -16.90 ← WRONG
MLX logits[0:3]: -22.38, 7.09, -3.48
Ours logits[0:3]: -23.73, -2.68, 5.50
CONCLUSION: Embedding → attn_norm → Q/K projection are correct.
Divergence happens INSIDE or AFTER the attention computation in
layer 0, then compounds through 35 layers to produce completely
wrong final logits (~40 logit difference on critical tokens).
Next: compare attention output and FFN output at layer 0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 6ea3215 commit 2899fb8
1 file changed
+12
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14239 | 14239 | | |
14240 | 14240 | | |
14241 | 14241 | | |
| 14242 | + | |
| 14243 | + | |
| 14244 | + | |
| 14245 | + | |
14242 | 14246 | | |
14243 | 14247 | | |
14244 | 14248 | | |
| |||
14285 | 14289 | | |
14286 | 14290 | | |
14287 | 14291 | | |
| 14292 | + | |
| 14293 | + | |
| 14294 | + | |
14288 | 14295 | | |
14289 | 14296 | | |
14290 | 14297 | | |
| |||
15447 | 15454 | | |
15448 | 15455 | | |
15449 | 15456 | | |
| 15457 | + | |
| 15458 | + | |
| 15459 | + | |
| 15460 | + | |
| 15461 | + | |
15450 | 15462 | | |
15451 | 15463 | | |
15452 | 15464 | | |
| |||
0 commit comments