Skip to content

Commit 7e2ca31

Browse files
unamedkrclaude
andcommitted
fix: restore Phi-3.5 as RLV default (Qwen3.5-4B: 6/20 on large-doc)
Qwen3.5-4B large-doc: 6/20 (30%) vs Phi-3.5: 19/20 (95%) Phi-3.5's dense attention + 32K vocab is optimal for document QA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3ad0b80 commit 7e2ca31

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

bench/rlv/stages/_llm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
# quant.h as a single translation unit — no sync issues.
3232
# Phi-3.5: ~1.15 tok/s (CPU NEON), ~6.5 tok/s reported in PR #79.
3333
# Q8_0 is 2x faster than Q4_K_M on NEON (simpler dequant, 3.0 vs 1.5 tok/s).
34-
DEFAULT_MODEL = REPO / "models" / "Qwen3.5-4B-Q4_K_M.gguf"
34+
DEFAULT_MODEL = REPO / "models" / "Phi-3.5-mini-instruct-Q8_0.gguf"
3535
DEFAULT_SERVER_BINARY = REPO / "build_metal" / "quant-server-unified"
3636
DEFAULT_SERVER_HOST = "127.0.0.1"
3737
DEFAULT_SERVER_PORT = 8421 # arbitrary, avoid conflicts with 8080
@@ -44,7 +44,7 @@
4444
CLIFF_BUDGET = {
4545
"models/Llama-3.2-3B-Instruct-Q8_0.gguf": 1024,
4646
"models/Llama-3.2-1B-Instruct-Q8_0.gguf": 512,
47-
"models/Qwen3.5-4B-Q4_K_M.gguf": 1024,
47+
"models/Phi-3.5-mini-instruct-Q8_0.gguf": 1024,
4848
"models/Phi-3.5-mini-instruct-Q4_K_M.gguf": 1024,
4949
}
5050

0 commit comments

Comments
 (0)