Skip to content

fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM#372

Merged
blove merged 1 commit into
mainfrom
claude/c-genui-llm-config
May 16, 2026
Merged

fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM#372
blove merged 1 commit into
mainfrom
claude/c-genui-llm-config

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 16, 2026

Summary

gpt-5-mini ignored the "EXACTLY ONE tool" directive added in PR #363 and kept calling all four data tools on every filter follow-up. The prompt rewrite was strict and explicit ("call EXACTLY ONE tool", "Do NOT call the other tools") but the model's default reasoning prefers thoroughness over literal directive-following.

Fix

Split the LLMs in dashboard_graph.py / graph.py:

  • _llm (gpt-5-mini) — unchanged for generate_shell + respond. Cheap, good enough for prose + JSON-spec emission.
  • _planner_llm (gpt-5, reasoning_effort="minimal") — bound to tools, used in plan_tools. gpt-5 follows directives more precisely; reasoning_effort="minimal" suppresses the "let me be thorough" deliberation that drives the fan-out.

Test plan

  • Standalone smoke against the planner directly:
    llm = ChatOpenAI(model='gpt-5', reasoning_effort='minimal').bind_tools(ALL_TOOLS)
    await llm.ainvoke([sys, HumanMessage("Filter to cancelled flights only")])
    # → ['query_recent_disruptions']
  • Chrome MCP end-to-end with backend running real LLM:
    • "Show me the dashboard" → 4 stat cards + 2 charts + 1 grid populate
    • "Filter to cancelled flights only" → backend AI tool_calls: ['query_recent_disruptions'] (one tool, not four), data grid updates to 3 cancelled rows
  • CI
  • Cost note: planner now uses gpt-5 per dashboard turn — one extra-effort call per follow-up. Worth it for the directive adherence; can revisit if cost becomes material.

Files

  • cockpit/langgraph/streaming/python/src/dashboard_graph.py — add _planner_llm; swap _llm_with_tools source
  • cockpit/chat/generative-ui/python/src/graph.py — same in standalone

🤖 Generated with Claude Code

gpt-5-mini ignored the "EXACTLY ONE tool" directive added in PR #363
and kept calling all four data tools on every filter follow-up.
Verified live: my prompt rewrite was strict and explicit ("call EXACTLY
ONE tool", "Do NOT call the other tools") but gpt-5-mini still fanned
out — the model's default reasoning prefers thoroughness over literal
directive-following.

Split the LLMs:
- `_llm` (gpt-5-mini) — unchanged for shell-gen + respond. Cheap and
  good enough for prose + JSON-spec emission.
- `_planner_llm` (gpt-5, reasoning_effort='minimal') — bound to tools,
  used in plan_tools. gpt-5 follows directives more precisely;
  reasoning_effort='minimal' suppresses the "let me be thorough"
  deliberation that drives the fan-out.

Standalone smoke (separate from chrome):
  prompt: "Filter to cancelled flights only"
  result: ['query_recent_disruptions']  ← exactly one

Chrome MCP end-to-end:
  backend log confirms AI tool_calls: ['query_recent_disruptions'],
  data grid updates to 3 cancelled rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cacheplane Ready Ready Preview, Comment May 16, 2026 5:54pm

Request Review

@blove blove merged commit 25792f3 into main May 16, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant