fix(c-generative-ui): tighten plan_tools — one tool per filter, not four by blove · Pull Request #363 · cacheplane/angular-agent-framework

blove · 2026-05-16T16:18:06Z

Summary

"Filter to cancelled flights only" calls all 4 dashboard tools (kpis + trend + airlines + disruptions) and then dumps the filtered result as plaintext — visually nothing updates in the data_grid, the LLM hallucinates a workaround in prose. The per-turn plan_tools system context just said "decide which tools to call" — too permissive; gpt-5-mini defaults to refreshing everything.

Fix

Tighter rules in the per-turn system context (not just the static prompt file, which the model demonstrably ignores for tool-selection decisions):

FILTER / SCOPE ("filter to X", "show last N", "limit to Y", "sort by Z", "only show…", "top N") → call EXACTLY ONE tool with new parameters, no spec regen
STRUCTURAL ("add a card", "remove the table") → regen spec, call only tools needed for new components
QUESTION ("why", "how", "explain") → no tools, no JSON, prose only

Calling all four is now explicitly reserved for "refresh" / "reload" / "update everything".

Test plan

Local build passes
CI matrix
Post-merge chrome MCP smoke: "Filter to cancelled flights only" → only query_recent_disruptions fires (with type="cancelled"), data_grid updates to 3 rows, no spec regen

Files

cockpit/langgraph/streaming/python/src/dashboard_graph.py — rewrite plan_tools context (umbrella)
cockpit/chat/generative-ui/python/src/graph.py — same rewrite in standalone copy

🤖 Generated with Claude Code

…scope "Filter to cancelled flights only" was calling all 4 tools (kpis + trend + airlines + disruptions) and dumping the filtered result as plaintext instead of letting the data_grid component re-render. The per-turn plan_tools context just said "decide which tools to call" — too permissive; gpt-5-mini defaults to refreshing everything. Tighter rules now in the per-turn system context (not just the static prompt file, which the model demonstrably ignores for tool-selection): 1) FILTER / SCOPE → exactly ONE tool, the one backing the affected component, with new parameters. No spec regen. 2) STRUCTURAL → regen spec, then call only tools for NEW components. 3) QUESTION → no tools, no JSON, just prose. Calling all four is now explicitly reserved for "refresh" / "reload" / "update everything". Applied to both umbrella backend and standalone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-16T16:18:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cacheplane	Ready	Preview, Comment	May 16, 2026 4:19pm

…372) gpt-5-mini ignored the "EXACTLY ONE tool" directive added in PR #363 and kept calling all four data tools on every filter follow-up. Verified live: my prompt rewrite was strict and explicit ("call EXACTLY ONE tool", "Do NOT call the other tools") but gpt-5-mini still fanned out — the model's default reasoning prefers thoroughness over literal directive-following. Split the LLMs: - `_llm` (gpt-5-mini) — unchanged for shell-gen + respond. Cheap and good enough for prose + JSON-spec emission. - `_planner_llm` (gpt-5, reasoning_effort='minimal') — bound to tools, used in plan_tools. gpt-5 follows directives more precisely; reasoning_effort='minimal' suppresses the "let me be thorough" deliberation that drives the fan-out. Standalone smoke (separate from chrome): prompt: "Filter to cancelled flights only" result: ['query_recent_disruptions'] ← exactly one Chrome MCP end-to-end: backend log confirms AI tool_calls: ['query_recent_disruptions'], data grid updates to 3 cancelled rows. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

blove merged commit f68b562 into main May 16, 2026
16 checks passed

blove mentioned this pull request May 16, 2026

fix(c-generative-ui): use gpt-5 + minimal reasoning for planner LLM #372

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(c-generative-ui): tighten plan_tools — one tool per filter, not four#363

fix(c-generative-ui): tighten plan_tools — one tool per filter, not four#363
blove merged 1 commit into
mainfrom
claude/c-genui-prompt-categorization

blove commented May 16, 2026

Uh oh!

vercel Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented May 16, 2026

Summary

Fix

Test plan

Files

Uh oh!

vercel Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 16, 2026 •

edited

Loading