fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM by blove · Pull Request #372 · cacheplane/angular-agent-framework

blove · 2026-05-16T17:51:43Z

Summary

gpt-5-mini ignored the "EXACTLY ONE tool" directive added in PR #363 and kept calling all four data tools on every filter follow-up. The prompt rewrite was strict and explicit ("call EXACTLY ONE tool", "Do NOT call the other tools") but the model's default reasoning prefers thoroughness over literal directive-following.

Fix

Split the LLMs in dashboard_graph.py / graph.py:

_llm (gpt-5-mini) — unchanged for generate_shell + respond. Cheap, good enough for prose + JSON-spec emission.
_planner_llm (gpt-5, reasoning_effort="minimal") — bound to tools, used in plan_tools. gpt-5 follows directives more precisely; reasoning_effort="minimal" suppresses the "let me be thorough" deliberation that drives the fan-out.

Test plan

Standalone smoke against the planner directly:

llm = ChatOpenAI(model='gpt-5', reasoning_effort='minimal').bind_tools(ALL_TOOLS)
await llm.ainvoke([sys, HumanMessage("Filter to cancelled flights only")])
# → ['query_recent_disruptions']

Chrome MCP end-to-end with backend running real LLM:
- "Show me the dashboard" → 4 stat cards + 2 charts + 1 grid populate
- "Filter to cancelled flights only" → backend AI tool_calls: ['query_recent_disruptions'] (one tool, not four), data grid updates to 3 cancelled rows
CI
Cost note: planner now uses gpt-5 per dashboard turn — one extra-effort call per follow-up. Worth it for the directive adherence; can revisit if cost becomes material.

Files

cockpit/langgraph/streaming/python/src/dashboard_graph.py — add _planner_llm; swap _llm_with_tools source
cockpit/chat/generative-ui/python/src/graph.py — same in standalone

🤖 Generated with Claude Code

gpt-5-mini ignored the "EXACTLY ONE tool" directive added in PR #363 and kept calling all four data tools on every filter follow-up. Verified live: my prompt rewrite was strict and explicit ("call EXACTLY ONE tool", "Do NOT call the other tools") but gpt-5-mini still fanned out — the model's default reasoning prefers thoroughness over literal directive-following. Split the LLMs: - `_llm` (gpt-5-mini) — unchanged for shell-gen + respond. Cheap and good enough for prose + JSON-spec emission. - `_planner_llm` (gpt-5, reasoning_effort='minimal') — bound to tools, used in plan_tools. gpt-5 follows directives more precisely; reasoning_effort='minimal' suppresses the "let me be thorough" deliberation that drives the fan-out. Standalone smoke (separate from chrome): prompt: "Filter to cancelled flights only" result: ['query_recent_disruptions'] ← exactly one Chrome MCP end-to-end: backend log confirms AI tool_calls: ['query_recent_disruptions'], data grid updates to 3 cancelled rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-16T17:51:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cacheplane	Ready	Preview, Comment	May 16, 2026 5:54pm

blove merged commit 25792f3 into main May 16, 2026
15 of 16 checks passed

blove mentioned this pull request May 16, 2026

feat(c-a2ui): LLM-driven aviation booking form (PR 4 of 4) #380

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM#372

fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM#372
blove merged 1 commit into
mainfrom
claude/c-genui-llm-config

blove commented May 16, 2026

Uh oh!

vercel Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented May 16, 2026

Summary

Fix

Test plan

Files

Uh oh!

vercel Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 16, 2026 •

edited

Loading