feat(runs): show AI request + structured response in step inspector#42
Merged
Conversation
- Snapshot AI step request (model, messages, temperature, tools, …) onto step.input before the LLM call so it survives the response overwrite. - Restructure StepInspector to render input alongside output via a responsive 2-column grid in NB/IDE modes. - Detect AI-shaped output (content + usage + model) and render an AI Response card with content, tool calls, usage pills, and a raw JSON toggle instead of a single JSON dump. - Detect agent-shaped output (messages + iterations) and render a conversation thread with status / iteration / token pills. - AI input renders as a request card with model / temp / max_tokens chips and a role-tagged message thread.
There was a problem hiding this comment.
Pull request overview
This PR enhances run step inspection by persisting AI step request payloads on the server and improving the dashboard inspector UI to render AI/agent inputs and outputs in a more structured, side-by-side format.
Changes:
- Server: snapshot AI step request parameters onto
step.inputbefore executing the LiteLLM call. - Dashboard: refactor
StepInspectorto show input/output side-by-side in NB/IDE modes and add shape-aware rendering blocks for AI responses and agent conversations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| server/src/flowforge_server/services/executor.py | Stores the AI request payload in step.input before the model call so it remains visible after step.output is set. |
| dashboard/src/app/(dashboard)/runs/[id]/page.tsx | Adds shape-aware UI blocks for AI request/response + agent conversation and updates NB/IDE layouts to a responsive two-column grid. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+336
to
+342
| type AiOutput = { | ||
| content?: string; | ||
| model?: string; | ||
| provider?: string; | ||
| usage?: { prompt_tokens?: number; completion_tokens?: number; total_tokens?: number; cost_usd?: number; latency_ms?: number }; | ||
| finish_reason?: string; | ||
| tool_calls?: Array<{ id?: string; function?: { name?: string; arguments?: unknown } }>; |
Comment on lines
+446
to
+454
| {toolCalls.map((tc, idx) => { | ||
| const name = tc.function?.name ?? '(unnamed)'; | ||
| const args = tc.function?.arguments; | ||
| const argsStr = typeof args === 'string' ? args : JSON.stringify(args ?? {}, null, 2); | ||
| return ( | ||
| <div key={tc.id ?? idx} style={{ borderLeft: '2px solid var(--brand)', paddingLeft: 10 }}> | ||
| <div className="mono" style={{ fontSize: 12, marginBottom: 4 }}><b>{name}</b></div> | ||
| <CodeBlock language="json" code={argsStr} /> | ||
| </div> |
AIResponse.to_dict() emits tool calls as {id, name, arguments}, not
the OpenAI-wrapped {function: {name, arguments}}. The inspector was
reading tc.function.name / tc.function.arguments and would always
render "(unnamed)" with empty args. Switch to tc.name / tc.arguments
with a fallback to the wrapped shape for safety.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model,messages,temperature,max_tokens,tools,tool_choice,use_cache) ontostep.inputbefore the LiteLLM call, so the dashboard can show what was sent alongside the response. Previously this data was lost whenstep.outputwas overwritten with the LLM result.StepInspectorto render input and output side-by-side in NB/IDE modes via a responsiveauto-fit, minmax(320px, 1fr)grid (collapses to one column on narrow widths).content+usage+model) and render a new AI Response sub-section with content (text or JSON), tool calls (per-call name + args), usage pills (model, tokens, cost, latency, finish_reason), and a collapsible raw JSON toggle — instead of a single JSON dump.messages[]+iterations) and render an Agent Conversation sub-section with status/iteration/token pills, final output, and a role-tagged message thread.model/temp/max_tokens/ tool-count chips above a role-tagged message thread.step.runoutputs (e.g. theplan_searchescase) still render as JSON.Test plan
step.ai; confirmsteps.inputis now populated in the DB and appears in the run detail panelstep.runstep still renders raw JSON (no regression)