feat: disable Qwen3 thinking mode by default (/no_think)

unamedkr · claude · unamedkr · commit bbb915951e01 · 2026-04-13T14:32:13.000+09:00
Qwen3-4B defaults to thinking mode ("Okay, the user asked..."),
wasting tokens on reasoning chains. Adding /no_think to the system
prompt produces direct answers.

Before: "Okay, the user asked... Let me recall... Gravity is a fu"
After:  "Gravity is the force that attracts any object with mass..."

Speed: 4.3 tok/s (unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/bench/rlv/stages/_llm.py b/bench/rlv/stages/_llm.py
@@ -209,7 +209,7 @@ def stop_server():
 # reasoning chains in chat mode. Verified with the Acme test doc:
 # without this, the model picks the first entity (primacy bias);
 # with this, it correctly identifies the requested role.
-DEFAULT_SYSTEM_PROMPT = "Answer in one short sentence. No reasoning steps."
+DEFAULT_SYSTEM_PROMPT = "/no_think\nAnswer in one short sentence. No reasoning steps."
 
 
 MAX_LLM_RETRIES = 2  # retry once on transient server errors