diff --git a/docs/AGENTS.md b/docs/AGENTS.md index 835e971cc0..2d05b31fe6 100644 --- a/docs/AGENTS.md +++ b/docs/AGENTS.md @@ -59,6 +59,7 @@ description: Agent instructions for AI assistants working on the Mux codebase Use `agent-browser` for web automation. Run `agent-browser --help` for all commands. Core workflow: + 1. `agent-browser open ` - Navigate to page 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2) 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs @@ -128,7 +129,7 @@ Mobile app tests live in `mobile/src/**/*.test.ts` and use Bun's built-in test r - Never use emoji characters as UI icons or status indicators; emoji rendering varies across platforms and fonts. - Prefer SVG icons (usually from `lucide-react`) or shared icon components under `src/browser/components/icons/`. - For tool call headers, use `ToolIcon` from `src/browser/components/tools/shared/ToolPrimitives.tsx`. -- If a tool/agent provides an emoji string (e.g., `status_set` or `displayStatus`), render via `EmojiIcon` (`src/browser/components/icons/EmojiIcon.tsx`) instead of rendering the emoji. +- If a tool/agent provides an emoji string (e.g., todo-derived status or `displayStatus`), render via `EmojiIcon` (`src/browser/components/icons/EmojiIcon.tsx`) instead of rendering the emoji. - If a new emoji appears in tool output, extend `EmojiIcon` to map it to an SVG icon. - Colors defined in `src/browser/styles/globals.css` (`:root @theme` block). Reference via CSS variables (e.g., `var(--color-plan-mode)`), never hardcode hex values. - For incrementing numeric UI (costs, timers, token counts, percentages), use semantic numeric typography utilities (`counter-nums` / `counter-nums-mono`) to prevent width jitter. @@ -229,9 +230,9 @@ Freely make breaking changes, and reorganize / cleanup IPC as needed. - E2E tests (tests/e2e) work with Radix but are slow (~2min startup); reserve for scenarios that truly need real Electron. - Only use `validateApiKeys()` in tests that actually make AI API calls. -## Tool: status_set +## Tool: todo_write -- Set status url to the Pull Request once opened +- Keep the TODO list current during multi-step work; sidebar progress is derived from it. ## GitHub diff --git a/docs/agents/index.mdx b/docs/agents/index.mdx index a620176d5f..9dc78e03ec 100644 --- a/docs/agents/index.mdx +++ b/docs/agents/index.mdx @@ -632,7 +632,6 @@ tools: - ask_user_question - todo_read - todo_write - - status_set - notify - analytics_query --- diff --git a/docs/agents/instruction-files.mdx b/docs/agents/instruction-files.mdx index a1716706ad..bd653c4d14 100644 --- a/docs/agents/instruction-files.mdx +++ b/docs/agents/instruction-files.mdx @@ -69,7 +69,7 @@ Be terse and to the point. ## Model: openai:.\*codex -Use status reporting tools every few minutes. +Keep the todo list current every few minutes while a task is in flight. ``` ### Tool Prompts @@ -92,12 +92,12 @@ Customize how the AI uses specific tools by appending instructions to their desc - Run `prettier --write` after editing files -## Tool: status_set +## Tool: todo_write -- Set status URL to the Pull Request once opened +- Keep the TODO list current during multi-step work; sidebar progress is derived from it. ``` -**Common tools** (varies by model/provider): `bash`, `file_read`, `file_edit_replace_string`, `file_edit_insert`, `propose_plan`, `ask_user_question`, `todo_write`, `todo_read`, `status_set`, `web_fetch`, `web_search`. +**Common tools** (varies by model/provider): `bash`, `file_read`, `file_edit_replace_string`, `file_edit_insert`, `propose_plan`, `ask_user_question`, `todo_write`, `todo_read`, `web_fetch`, `web_search`. ## Practical layout diff --git a/docs/config/notifications.mdx b/docs/config/notifications.mdx index 4fe62b021a..893ff28f02 100644 --- a/docs/config/notifications.mdx +++ b/docs/config/notifications.mdx @@ -43,7 +43,7 @@ The recommended way to configure the `notify` tool is via a `Tool: notify` scope - Notify on CI failures or deployment issues - Notify when waiting for user input longer than 30 seconds - Do not notify for routine status updates -- Use status_set for progress updates instead +- Use `todo_write` for routine progress updates instead ``` See [Instruction Files](/agents/instruction-files) for more on scoped instructions. @@ -94,7 +94,7 @@ notify: { description: "Send a system notification to the user. Use this to alert the user about important events that require their attention, such as long-running task completion, errors requiring intervention, or questions. " + "Notifications appear as OS-native notifications (macOS Notification Center, Windows Toast, Linux). " + - "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine status updates (use status_set instead).", + "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine progress updates β€” keep the todo list current instead.", schema: z .object({ title: z diff --git a/docs/hooks/tools.mdx b/docs/hooks/tools.mdx index 4c8559c3c7..1dffd488ce 100644 --- a/docs/hooks/tools.mdx +++ b/docs/hooks/tools.mdx @@ -574,17 +574,6 @@ If a value is too large for the environment, it may be omitted (not set). Mux al -
-status_set (3) - -| Env var | JSON path | Type | Description | -| ------------------------ | --------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------- | -| `MUX_TOOL_INPUT_EMOJI` | `emoji` | string | A single emoji character representing the current activity | -| `MUX_TOOL_INPUT_MESSAGE` | `message` | string | A brief description of the current activity (auto-truncated to 60 chars with ellipsis if needed) | -| `MUX_TOOL_INPUT_URL` | `url` | string | Optional URL to external resource with more details (e.g., Pull Request URL). The URL persists and is displayed to the user for easy access. | - -
-
switch_agent (3) diff --git a/src/browser/features/Tools/ProposePlan/ProposePlanToolCall.stories.tsx b/src/browser/features/Tools/ProposePlan/ProposePlanToolCall.stories.tsx index 643877b558..4d1d215bcc 100644 --- a/src/browser/features/Tools/ProposePlan/ProposePlanToolCall.stories.tsx +++ b/src/browser/features/Tools/ProposePlan/ProposePlanToolCall.stories.tsx @@ -6,12 +6,9 @@ import { createUserMessage, createAssistantMessage, createProposePlanTool, - createStatusTool, + createTodoWriteTool, } from "@/browser/stories/mockFactory"; -import { - PLAN_AUTO_ROUTING_STATUS_EMOJI, - PLAN_AUTO_ROUTING_STATUS_MESSAGE, -} from "@/common/constants/planAutoRoutingStatus"; +import { PLAN_AUTO_ROUTING_STATUS_MESSAGE } from "@/common/constants/planAutoRoutingStatus"; const meta = { ...appMeta, title: "App/Chat/Tools/ProposePlan" }; export default meta; @@ -221,13 +218,7 @@ export const ProposePlanAutoRoutingDecisionGap: AppStory = { createAssistantMessage("msg-3", "Selecting the right executor for this plan.", { historySequence: 3, timestamp: STABLE_TIMESTAMP - 220000, - toolCalls: [ - createStatusTool( - "call-status-1", - PLAN_AUTO_ROUTING_STATUS_EMOJI, - PLAN_AUTO_ROUTING_STATUS_MESSAGE - ), - ], + toolCalls: [createTodoWriteTool("call-status-1", PLAN_AUTO_ROUTING_STATUS_MESSAGE)], }), ], }) diff --git a/src/browser/features/Tools/Shared/getToolComponent.ts b/src/browser/features/Tools/Shared/getToolComponent.ts index 47ee7926dc..83863132ad 100644 --- a/src/browser/features/Tools/Shared/getToolComponent.ts +++ b/src/browser/features/Tools/Shared/getToolComponent.ts @@ -56,11 +56,17 @@ interface ToolRegistryEntry { * Registry mapping tool names to their components and validation schemas. * Adding a new tool: add one line here. * - * Note: Some tools (ask_user_question, propose_plan, todo_write, status_set) require + * Note: Some tools (ask_user_question, propose_plan, todo_write) require * props like workspaceId/toolCallId that aren't available in nested context. This is * fine because the backend excludes these from code_execution sandbox (see EXCLUDED_TOOLS * in src/node/services/ptc/toolBridge.ts). They can never appear in nested tool calls. */ +const legacyStatusSetSchema = z.object({ + emoji: z.string(), + message: z.string(), + url: z.string().url().optional().nullable(), +}); + const TOOL_REGISTRY: Record = { bash: { component: BashToolCall, schema: TOOL_DEFINITIONS.bash.schema }, file_read: { component: FileReadToolCall, schema: TOOL_DEFINITIONS.file_read.schema }, @@ -120,7 +126,8 @@ const TOOL_REGISTRY: Record = { schema: TOOL_DEFINITIONS.propose_plan.schema, }, todo_write: { component: TodoToolCall, schema: TOOL_DEFINITIONS.todo_write.schema }, - status_set: { component: StatusSetToolCall, schema: TOOL_DEFINITIONS.status_set.schema }, + // Legacy-only transcript renderer for historical status_set calls. + status_set: { component: StatusSetToolCall, schema: legacyStatusSetSchema }, switch_agent: { component: SwitchAgentToolCall, schema: TOOL_DEFINITIONS.switch_agent.schema, diff --git a/src/browser/features/Tools/StatusSet/StatusSetToolCall.stories.tsx b/src/browser/features/Tools/StatusSet/StatusSetToolCall.stories.tsx deleted file mode 100644 index 6be5a698c1..0000000000 --- a/src/browser/features/Tools/StatusSet/StatusSetToolCall.stories.tsx +++ /dev/null @@ -1,31 +0,0 @@ -import type { Meta, StoryObj } from "@storybook/react-vite"; -import { StatusSetToolCall } from "@/browser/features/Tools/StatusSetToolCall"; -import { lightweightMeta } from "@/browser/stories/meta.js"; - -const meta = { - ...lightweightMeta, - title: "App/Chat/Tools/StatusSet", - component: StatusSetToolCall, -} satisfies Meta; - -export default meta; - -type Story = StoryObj; - -/** Chat with agent status indicator */ -export const WithAgentStatus: Story = { - args: { - args: { - emoji: "πŸš€", - message: "PR #1234 waiting for CI", - url: "https://github.com/example/repo/pull/1234", - }, - result: { - success: true, - emoji: "πŸš€", - message: "PR #1234 waiting for CI", - url: "https://github.com/example/repo/pull/1234", - }, - status: "completed", - }, -}; diff --git a/src/browser/features/Tools/TodoToolCall.tsx b/src/browser/features/Tools/TodoToolCall.tsx index 0138870b6c..e9e31d3863 100644 --- a/src/browser/features/Tools/TodoToolCall.tsx +++ b/src/browser/features/Tools/TodoToolCall.tsx @@ -1,5 +1,8 @@ import React from "react"; +import { EmojiIcon } from "@/browser/components/icons/EmojiIcon/EmojiIcon"; +import { TodoList } from "@/browser/components/TodoList/TodoList"; import type { TodoWriteToolArgs, TodoWriteToolResult } from "@/common/types/tools"; +import { deriveTodoStatus } from "@/common/utils/todoList"; import { ToolContainer, ToolHeader, @@ -9,7 +12,6 @@ import { ToolIcon, } from "./Shared/ToolPrimitives"; import { useToolExpansion, getStatusDisplay, type ToolStatus } from "./Shared/toolUtils"; -import { TodoList } from "@/browser/components/TodoList/TodoList"; interface TodoToolCallProps { args: TodoWriteToolArgs; @@ -24,12 +26,27 @@ export const TodoToolCall: React.FC = ({ }) => { const { expanded, toggleExpanded } = useToolExpansion(false); // Collapsed by default const statusDisplay = getStatusDisplay(status); + const todoStatusPreview = deriveTodoStatus(args.todos); + const fallbackPreview = + args.todos.length === 0 + ? "Cleared todo list" + : `${args.todos.length} item${args.todos.length === 1 ? "" : "s"}`; return ( β–Ά + + {todoStatusPreview ? ( + <> + + {todoStatusPreview.message} + + ) : ( + {fallbackPreview} + )} + {statusDisplay} diff --git a/src/browser/stores/WorkspaceStore.test.ts b/src/browser/stores/WorkspaceStore.test.ts index ec32e9bc03..7b88319e63 100644 --- a/src/browser/stores/WorkspaceStore.test.ts +++ b/src/browser/stores/WorkspaceStore.test.ts @@ -8,6 +8,7 @@ import { getAutoCompactionThresholdKey, getAutoRetryKey, getPinnedTodoExpandedKey, + getStatusStateKey, } from "@/common/constants/storage"; import type { TodoItem } from "@/common/types/tools"; import { WorkspaceStore } from "./WorkspaceStore"; @@ -1783,7 +1784,8 @@ describe("WorkspaceStore", () => { streaming: true, lastModel: "claude-sonnet-4", lastThinkingLevel: "high", - agentStatus: { emoji: "πŸ”§", message: "Running checks", url: "https://example.com" }, + todoStatus: { emoji: "πŸ”„", message: "Run checks" }, + hasTodos: true, }; // Recreate the store so the first activity.list call uses this test snapshot. @@ -1809,10 +1811,122 @@ describe("WorkspaceStore", () => { expect(state.canInterrupt).toBe(true); expect(state.currentModel).toBe(activitySnapshot.lastModel); expect(state.currentThinkingLevel).toBe(activitySnapshot.lastThinkingLevel); - expect(state.agentStatus).toEqual(activitySnapshot.agentStatus ?? undefined); + expect(state.agentStatus).toEqual(activitySnapshot.todoStatus ?? undefined); expect(state.recencyTimestamp).toBe(activitySnapshot.recency); }); + it("falls back to persisted activity todoStatus for active workspaces when replayed todos are absent", async () => { + const workspaceId = "active-activity-todo-fallback"; + const activitySnapshot: WorkspaceActivitySnapshot = { + recency: new Date("2024-01-04T09:00:00.000Z").getTime(), + streaming: true, + lastModel: "claude-sonnet-4", + lastThinkingLevel: null, + todoStatus: { emoji: "πŸ”„", message: "Persisted todo snapshot" }, + hasTodos: true, + }; + + store.dispose(); + store = new WorkspaceStore(mockOnModelUsed); + mockActivityList.mockResolvedValue({ [workspaceId]: activitySnapshot }); + // eslint-disable-next-line @typescript-eslint/no-unsafe-argument, @typescript-eslint/no-explicit-any + store.setClient(mockClient as any); + await new Promise((resolve) => setTimeout(resolve, 0)); + + createAndAddWorkspace(store, workspaceId); + const state = store.getWorkspaceState(workspaceId); + expect(state.agentStatus).toEqual(activitySnapshot.todoStatus ?? undefined); + }); + + it("derives active workspace status from the current todo list", () => { + const workspaceId = "active-todo-status-workspace"; + createAndAddWorkspace(store, workspaceId); + seedPinnedTodos(store, workspaceId, [ + { content: "Run typecheck", status: "in_progress" }, + { content: "Add regression test", status: "pending" }, + ]); + + const state = store.getWorkspaceState(workspaceId); + expect(state.agentStatus).toEqual({ emoji: "πŸ”„", message: "Run typecheck" }); + }); + + it("prefers todo-derived activity status for inactive workspaces", async () => { + const workspaceId = "activity-fallback-todo-status-workspace"; + const activitySnapshot: WorkspaceActivitySnapshot = { + recency: new Date("2024-01-04T12:00:00.000Z").getTime(), + streaming: true, + lastModel: "claude-sonnet-4", + lastThinkingLevel: "high", + todoStatus: { emoji: "πŸ”„", message: "Run typecheck" }, + hasTodos: true, + }; + + store.dispose(); + store = new WorkspaceStore(mockOnModelUsed); + mockActivityList.mockResolvedValue({ [workspaceId]: activitySnapshot }); + // eslint-disable-next-line @typescript-eslint/no-unsafe-argument, @typescript-eslint/no-explicit-any + store.setClient(mockClient as any); + await new Promise((resolve) => setTimeout(resolve, 0)); + + createAndAddWorkspace(store, workspaceId, { createdAt: "2020-01-01T00:00:00.000Z" }, false); + + const state = store.getWorkspaceState(workspaceId); + expect(state.agentStatus).toEqual(activitySnapshot.todoStatus ?? undefined); + }); + + it("prefers transient displayStatus over todo-derived status for inactive workspaces", async () => { + const workspaceId = "activity-fallback-display-status-workspace"; + const activitySnapshot: WorkspaceActivitySnapshot = { + recency: new Date("2024-01-04T15:00:00.000Z").getTime(), + streaming: false, + lastModel: "claude-sonnet-4", + lastThinkingLevel: null, + displayStatus: { emoji: "πŸ€”", message: "Deciding execution strategy" }, + todoStatus: { emoji: "πŸ”„", message: "Run typecheck" }, + hasTodos: true, + }; + + store.dispose(); + store = new WorkspaceStore(mockOnModelUsed); + mockActivityList.mockResolvedValue({ [workspaceId]: activitySnapshot }); + // eslint-disable-next-line @typescript-eslint/no-unsafe-argument, @typescript-eslint/no-explicit-any + store.setClient(mockClient as any); + await new Promise((resolve) => setTimeout(resolve, 0)); + + createAndAddWorkspace(store, workspaceId, { createdAt: "2020-01-01T00:00:00.000Z" }, false); + + const state = store.getWorkspaceState(workspaceId); + expect(state.agentStatus).toEqual(activitySnapshot.displayStatus ?? undefined); + }); + + it("suppresses stale legacy status fallback when activity says the todo list is empty", async () => { + const workspaceId = "activity-fallback-empty-todo-status"; + const activitySnapshot: WorkspaceActivitySnapshot = { + recency: new Date("2024-01-04T18:00:00.000Z").getTime(), + streaming: false, + lastModel: "claude-sonnet-4", + lastThinkingLevel: null, + hasTodos: false, + }; + + localStorageBacking.set( + getStatusStateKey(workspaceId), + JSON.stringify({ emoji: "πŸ”", message: "Old persisted status" }) + ); + + store.dispose(); + store = new WorkspaceStore(mockOnModelUsed); + mockActivityList.mockResolvedValue({ [workspaceId]: activitySnapshot }); + // eslint-disable-next-line @typescript-eslint/no-unsafe-argument, @typescript-eslint/no-explicit-any + store.setClient(mockClient as any); + await new Promise((resolve) => setTimeout(resolve, 0)); + + createAndAddWorkspace(store, workspaceId, { createdAt: "2020-01-01T00:00:00.000Z" }, false); + + const state = store.getWorkspaceState(workspaceId); + expect(state.agentStatus).toBeUndefined(); + }); + it("fires response-complete callback when a background workspace stops streaming", async () => { const activeWorkspaceId = "active-workspace"; const backgroundWorkspaceId = "background-workspace"; diff --git a/src/browser/stores/WorkspaceStore.ts b/src/browser/stores/WorkspaceStore.ts index 9739386d73..88f8aaf270 100644 --- a/src/browser/stores/WorkspaceStore.ts +++ b/src/browser/stores/WorkspaceStore.ts @@ -44,6 +44,7 @@ import { } from "@/common/types/stream"; import { MapStore } from "./MapStore"; import { createDisplayUsage, recomputeUsageCosts } from "@/common/utils/tokens/displayUsage"; +import { deriveTodoStatus } from "@/common/utils/todoList"; import { getModelStats } from "@/common/utils/tokens/modelStats"; import { resolveModelForMetadata } from "@/common/utils/providers/modelEntries"; import { computeProvidersConfigFingerprint } from "@/common/utils/providers/configFingerprint"; @@ -317,8 +318,8 @@ function collapsePinnedTodoOnStreamStop(workspaceId: string, hasTodos: boolean): } function areAgentStatusesEqual( - a: WorkspaceActivitySnapshot["agentStatus"] | undefined, - b: WorkspaceActivitySnapshot["agentStatus"] | undefined + a: { emoji: string; message: string; url?: string } | undefined | null, + b: { emoji: string; message: string; url?: string } | undefined | null ): boolean { if (a === b) { return true; @@ -1566,11 +1567,14 @@ export class WorkspaceStore { !canInterrupt; const isHydratingTranscript = isActiveWorkspace && transient.isHydratingTranscript && !transient.caughtUp; - const agentStatus = useAggregatorState - ? aggregator.getAgentStatus() - : activity - ? (activity.agentStatus ?? undefined) - : aggregator.getAgentStatus(); + const aggregatorTodos = aggregator.getCurrentTodos(); + const displayStatus = useAggregatorState ? undefined : (activity?.displayStatus ?? undefined); + const todoStatus = useAggregatorState + ? (deriveTodoStatus(aggregatorTodos) ?? activity?.todoStatus ?? undefined) + : (activity?.todoStatus ?? + (activity?.hasTodos === false ? undefined : deriveTodoStatus(aggregatorTodos))); + const fallbackAgentStatus = useAggregatorState ? aggregator.getAgentStatus() : undefined; + const agentStatus = displayStatus ?? todoStatus ?? fallbackAgentStatus; // Live streaming stats const activeStreamMessageId = aggregator.getActiveStreamMessageId(); @@ -1597,7 +1601,7 @@ export class WorkspaceStore { currentModel, currentThinkingLevel, recencyTimestamp, - todos: aggregator.getCurrentTodos(), + todos: aggregatorTodos, loadedSkills: aggregator.getLoadedSkills(), skillLoadErrors: aggregator.getSkillLoadErrors(), lastAbortReason: aggregator.getLastAbortReason(), @@ -2275,7 +2279,9 @@ export class WorkspaceStore { previous?.lastModel !== snapshot?.lastModel || previous?.lastThinkingLevel !== snapshot?.lastThinkingLevel || previous?.recency !== snapshot?.recency || - !areAgentStatusesEqual(previous?.agentStatus, snapshot?.agentStatus); + previous?.hasTodos !== snapshot?.hasTodos || + !areAgentStatusesEqual(previous?.displayStatus, snapshot?.displayStatus) || + !areAgentStatusesEqual(previous?.todoStatus, snapshot?.todoStatus); if (!changed) { return; diff --git a/src/browser/stories/App.demo.stories.tsx b/src/browser/stories/App.demo.stories.tsx index 09b97e4a28..348aad776b 100644 --- a/src/browser/stories/App.demo.stories.tsx +++ b/src/browser/stories/App.demo.stories.tsx @@ -14,7 +14,7 @@ import { createFileReadTool, createFileEditTool, createTerminalTool, - createStatusTool, + createTodoWriteTool, createStaticChatHandler, createStreamingChatHandler, type GitStatusFixture, @@ -41,9 +41,9 @@ export default { * - SSH and local runtime badges * - Active workspace with full chat history * - Streaming workspace showing working state - * - All tool types: read_file, file_edit, terminal, status_set + * - All tool types: read_file, file_edit, terminal, todo_write * - Reasoning blocks - * - Agent status indicator + * - Todo-derived workspace status indicator */ export const Comprehensive: AppStory = { render: () => ( @@ -168,12 +168,11 @@ export const Comprehensive: AppStory = { timestamp: STABLE_TIMESTAMP - 200000, reasoning: "All tests pass. Time to create a PR for review.", toolCalls: [ - createStatusTool( - "call-4", - "πŸš€", - "PR #1234 waiting for CI", - "https://github.com/example/repo/pull/1234" - ), + createTodoWriteTool("call-4", [ + { content: "Updated auth endpoint and opened PR #1234", status: "completed" }, + { content: "PR #1234 waiting for CI", status: "in_progress" }, + { content: "Respond to review feedback", status: "pending" }, + ]), ], }), ]; diff --git a/src/browser/stories/App.projectSettings.stories.tsx b/src/browser/stories/App.projectSettings.stories.tsx index a60a1cf6aa..043c9ee123 100644 --- a/src/browser/stories/App.projectSettings.stories.tsx +++ b/src/browser/stories/App.projectSettings.stories.tsx @@ -38,7 +38,6 @@ const MOCK_TOOLS = [ "web_fetch", "todo_write", "todo_read", - "status_set", ]; const POSTHOG_TOOLS = [ @@ -504,8 +503,8 @@ export const ProjectSettingsWithToolAllowlist: AppStory = { const body = within(canvasElement.ownerDocument.body); await body.findByText("mux"); - // Should show "3/8" tools indicator (3 allowed out of 8 total) - await body.findByText(/3\/8/); + // Should show "3/N" tools indicator where N tracks the mocked tool catalog. + await body.findByText(new RegExp(`3/${MOCK_TOOLS.length}`)); }, }; diff --git a/src/browser/stories/App.readmeScreenshots.stories.tsx b/src/browser/stories/App.readmeScreenshots.stories.tsx index 205a14c8c6..841fb99f6d 100644 --- a/src/browser/stories/App.readmeScreenshots.stories.tsx +++ b/src/browser/stories/App.readmeScreenshots.stories.tsx @@ -16,7 +16,7 @@ import { createUserMessage, createAssistantMessage, createProposePlanTool, - createStatusTool, + createTodoWriteTool, createFileReadTool, createFileEditTool, createBashTool, @@ -42,10 +42,11 @@ import { getModelKey, getProjectScopeId, getRightSidebarLayoutKey, - getStatusStateKey, getThinkingLevelKey, } from "@/common/constants/storage"; import { DEFAULT_MODEL } from "@/common/constants/knownModels"; +import type { TodoItem } from "@/common/types/tools"; +import { deriveTodoStatus } from "@/common/utils/todoList"; export default { ...appMeta, @@ -81,6 +82,25 @@ const IPHONE_17_PRO_MAX = { height: 956, } as const; +function buildStoryTodos(message: string, status: TodoItem["status"] = "in_progress"): TodoItem[] { + switch (status) { + case "completed": + return [{ content: message, status: "completed" }]; + case "pending": + return [ + { content: "Captured current context", status: "completed" }, + { content: message, status: "pending" }, + ]; + case "in_progress": + default: + return [ + { content: "Captured current context", status: "completed" }, + { content: message, status: "in_progress" }, + { content: "Share a final update", status: "pending" }, + ]; + } +} + function createMultiModelSessionUsage(totalUsd: number): MockSessionUsage { // Split cost into model rows to make the Costs tab look realistic (cached + cacheCreate present). const primary = totalUsd * 0.62; @@ -450,12 +470,7 @@ index 0000000..def5678 "bun test -- layout", "PASS tests/ui/layout.test.tsx (4 tests)\n\nTests: 4 passed, 4 total\nTime: 0.42s" ), - createStatusTool( - "call-status-1", - "πŸš€", - "PR ready", - "https://github.com/coder/mux/pull/2035" - ), + createTodoWriteTool("call-status-1", "PR ready", "completed"), ], }), ]), @@ -489,8 +504,8 @@ index 0000000..def5678 }; // README: docs/img/agent-status.webp -// This story keeps the left sidebar expanded and seeds varied status_set tool calls -// so workspace rows show realistic in-progress agent activity. +// This story keeps the left sidebar expanded and seeds varied todo lists +// so workspace rows show realistic todo-derived agent activity. export const AgentStatusSidebar: AppStory = { render: () => ( { + const recency = NOW - (index + 1) * 60_000; + return [ + fixture.id, + { + recency, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + todoStatus: deriveTodoStatus(fixture.todos) ?? null, + hasTodos: fixture.todos.length > 0, + }, + ]; + }) + ); const workspaces = workspaceFixtures.map((fixture, index) => { const createdAt = new Date(NOW - (index + 1) * 60_000).toISOString(); @@ -649,25 +655,18 @@ export const AgentStatusSidebar: AppStory = { historySequence: 2, timestamp: STABLE_TIMESTAMP - 110_000, toolCalls: [ - createStatusTool( + createTodoWriteTool( "call-1", - "πŸ”§", - "Regenerating README screenshots and validating Chromatic diffs", - primaryWorkspace.statusUrl + buildStoryTodos( + "Regenerating README screenshots and validating Chromatic diffs" + ) ), ], }), createAssistantMessage("msg-3", primaryWorkspace.assistantText, { historySequence: 3, timestamp: STABLE_TIMESTAMP - 100_000, - toolCalls: [ - createStatusTool( - "call-2", - primaryWorkspace.statusEmoji, - primaryWorkspace.statusMessage, - primaryWorkspace.statusUrl - ), - ], + toolCalls: [createTodoWriteTool("call-2", primaryWorkspace.todos)], }), ]), ], @@ -678,14 +677,7 @@ export const AgentStatusSidebar: AppStory = { createAssistantMessage("msg-1", fixture.assistantText, { historySequence: 1, timestamp: STABLE_TIMESTAMP - (95_000 - index * 4_000), - toolCalls: [ - createStatusTool( - "call-1", - fixture.statusEmoji, - fixture.statusMessage, - fixture.statusUrl - ), - ], + toolCalls: [createTodoWriteTool("call-1", fixture.todos)], }), ]), ] as const; @@ -699,6 +691,7 @@ export const AgentStatusSidebar: AppStory = { return createMockORPCClient({ projects: groupWorkspacesByProject(workspaces), workspaces, + workspaceActivitySnapshots, onChat: createOnChatAdapter(chatHandlers), }); }} @@ -791,9 +784,8 @@ export const GitStatusPopover: AppStory = { historySequence: 2, timestamp: STABLE_TIMESTAMP - 130_000, toolCalls: [ - createStatusTool( + createTodoWriteTool( "call-status-1", - "πŸ”", "Inspecting local vs origin commits to prepare a safe rebase plan" ), ], @@ -966,7 +958,7 @@ graph TD historySequence: 5, timestamp: STABLE_TIMESTAMP - 5_000, toolCalls: [ - createStatusTool("call-status-1", "πŸ“", "Building README screenshot stories"), + createTodoWriteTool("call-status-1", "Building README screenshot stories"), ], }), ]), @@ -1152,12 +1144,7 @@ export const CostsTabRich: AppStory = { historySequence: 6, timestamp: STABLE_TIMESTAMP - 5_000, toolCalls: [ - createStatusTool( - "call-status-1", - "πŸš€", - "PR #427 opened", - "https://github.com/mux/mux/pull/427" - ), + createTodoWriteTool("call-status-1", "PR #427 opened", "completed"), ], }), ]), @@ -1218,9 +1205,8 @@ export const ContextManagementDialog: AppStory = { totalTokens: 118_000, }, toolCalls: [ - createStatusTool( + createTodoWriteTool( "call-status-1", - "πŸ”§", "Reviewing context usage and idle compaction settings" ), ], @@ -1307,9 +1293,8 @@ export const MobileServerMode: AppStory = { historySequence: 2, timestamp: STABLE_TIMESTAMP - 30_000, toolCalls: [ - createStatusTool( + createTodoWriteTool( "call-status-1", - "πŸ”§", "Adapting the layout for mobile-sized viewport constraints" ), ], @@ -1335,7 +1320,7 @@ export const MobileServerMode: AppStory = { // README: docs/img/orchestrate-agents.webp // Parent workspace is selected in plan mode while six running child workspaces -// show nested status indicators in the expanded left sidebar. +// show nested todo-derived progress in the expanded left sidebar. export const OrchestrateAgents: AppStory = { // Override the module-level 1900px decorator so the app itself renders at 1200px, // matching the narrower capture viewport for a tighter orchestrator screenshot. @@ -1364,8 +1349,7 @@ export const OrchestrateAgents: AppStory = { name: "auth-middleware", agentType: "exec" as const, title: "Implement auth middleware", - statusEmoji: "πŸ”§", - statusMessage: "Implementing auth middleware", + todos: buildStoryTodos("Implementing auth middleware"), assistantMessage: "Wiring auth middleware into each service entrypoint.", }, { @@ -1373,8 +1357,7 @@ export const OrchestrateAgents: AppStory = { name: "token-service", agentType: "exec" as const, title: "Build token refresh service", - statusEmoji: "πŸ”", - statusMessage: "Reading token validation logic", + todos: buildStoryTodos("Reading token validation logic"), assistantMessage: "Auditing refresh token validation before implementing rotation.", }, { @@ -1382,8 +1365,7 @@ export const OrchestrateAgents: AppStory = { name: "rbac-policies", agentType: "exec" as const, title: "Add RBAC policy engine", - statusEmoji: "πŸ“", - statusMessage: "Writing policy evaluation tests", + todos: buildStoryTodos("Writing policy evaluation tests"), assistantMessage: "Building RBAC fixtures and policy matching assertions.", }, { @@ -1391,8 +1373,7 @@ export const OrchestrateAgents: AppStory = { name: "session-store", agentType: "exec" as const, title: "Migrate session storage to Redis", - statusEmoji: "πŸš€", - statusMessage: "Running integration tests", + todos: buildStoryTodos("Running integration tests"), assistantMessage: "Running Redis-backed session integration coverage now.", }, { @@ -1400,8 +1381,7 @@ export const OrchestrateAgents: AppStory = { name: "api-gateway", agentType: "exec" as const, title: "Configure API gateway routes", - statusEmoji: "πŸ”§", - statusMessage: "Wiring up rate limiting", + todos: buildStoryTodos("Wiring up rate limiting"), assistantMessage: "Updating gateway route config with auth + throttling guards.", }, { @@ -1409,8 +1389,7 @@ export const OrchestrateAgents: AppStory = { name: "audit-logging", agentType: "explore" as const, title: "Investigate audit log schema", - statusEmoji: "πŸ”", - statusMessage: "Reviewing existing log entries", + todos: buildStoryTodos("Reviewing existing log entries"), assistantMessage: "Inspecting current audit log rows to document schema constraints.", }, ]; @@ -1462,6 +1441,23 @@ Configure gateway auth guards, rate limits, and protected route wiring. Document and validate audit log schema requirements before rollout. `; + const workspaceActivitySnapshots = Object.fromEntries( + subtaskFixtures.map((fixture, index) => { + const recency = NOW - (index + 1) * 2_000; + return [ + fixture.id, + { + recency, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + todoStatus: deriveTodoStatus(fixture.todos) ?? null, + hasTodos: fixture.todos.length > 0, + }, + ]; + }) + ); + const chatHandlers = new Map>([ [ workspaceId, @@ -1495,13 +1491,7 @@ Document and validate audit log schema requirements before rollout. createAssistantMessage(`msg-sub-${index + 1}`, fixture.assistantMessage, { historySequence: 1, timestamp: STABLE_TIMESTAMP - 45_000 + index * 1_000, - toolCalls: [ - createStatusTool( - `call-status-sub-${index + 1}`, - fixture.statusEmoji, - fixture.statusMessage - ), - ], + toolCalls: [createTodoWriteTool(`call-status-sub-${index + 1}`, fixture.todos)], }), ]), ] as const @@ -1511,6 +1501,7 @@ Document and validate audit log schema requirements before rollout. return createMockORPCClient({ projects: groupWorkspacesByProject(workspaces), workspaces, + workspaceActivitySnapshots, onChat: createOnChatAdapter(chatHandlers), }); }} diff --git a/src/browser/stories/mockFactory.ts b/src/browser/stories/mockFactory.ts index 912db5b50e..b8ce11df33 100644 --- a/src/browser/stories/mockFactory.ts +++ b/src/browser/stories/mockFactory.ts @@ -10,6 +10,7 @@ import type { ProjectConfig } from "@/node/config"; import type { FrontendWorkspaceMetadata } from "@/common/types/workspace"; import type { WorkspaceChatMessage, ChatMuxMessage } from "@/common/orpc/types"; +import type { TodoItem } from "@/common/types/tools"; import type { MuxMessageMetadata, MuxTextPart, @@ -383,19 +384,21 @@ export function createTerminalTool( }; } -export function createStatusTool( +export function createTodoWriteTool( toolCallId: string, - emoji: string, - message: string, - url?: string + todosOrMessage: TodoItem[] | string, + status: TodoItem["status"] = "in_progress" ): MuxPart { + const todos = + typeof todosOrMessage === "string" ? [{ content: todosOrMessage, status }] : todosOrMessage; + return { type: "dynamic-tool", toolCallId, - toolName: "status_set", + toolName: "todo_write", state: "output-available", - input: { emoji, message, url }, - output: { success: true, emoji, message, url }, + input: { todos }, + output: { success: true, count: todos.length }, }; } diff --git a/src/browser/stories/mocks/orpc.ts b/src/browser/stories/mocks/orpc.ts index face349ea6..29f295d337 100644 --- a/src/browser/stories/mocks/orpc.ts +++ b/src/browser/stories/mocks/orpc.ts @@ -9,7 +9,10 @@ import type { AgentDefinitionPackage, } from "@/common/types/agentDefinition"; import type { AgentSkillDescriptor, AgentSkillIssue } from "@/common/types/agentSkill"; -import type { FrontendWorkspaceMetadata } from "@/common/types/workspace"; +import type { + FrontendWorkspaceMetadata, + WorkspaceActivitySnapshot, +} from "@/common/types/workspace"; import type { ProjectConfig } from "@/node/config"; import { DEFAULT_LAYOUT_PRESETS_CONFIG, @@ -115,6 +118,8 @@ export interface MockORPCClientOptions { workspaces?: FrontendWorkspaceMetadata[]; /** Pre-seeded multi-project git status rows keyed by workspace ID. */ projectGitStatusesByWorkspace?: Map; + /** Pre-seeded workspace activity snapshots for sidebar status/streaming stories. */ + workspaceActivitySnapshots?: Record; /** Initial task settings for config.getConfig (e.g., Settings β†’ Tasks section) */ taskSettings?: Partial; /** Initial unified AI defaults for agents (plan/exec/compact + subagents) */ @@ -315,6 +320,7 @@ export function createMockORPCClient(options: MockORPCClientOptions = {}): APICl projects: providedProjects = new Map(), workspaces: inputWorkspaces = [], projectGitStatusesByWorkspace = new Map(), + workspaceActivitySnapshots = {}, onChat, executeBash, providersConfig = { anthropic: { apiKeySet: true, isEnabled: true, isConfigured: true } }, @@ -1549,7 +1555,7 @@ export function createMockORPCClient(options: MockORPCClientOptions = {}): APICl await new Promise(() => undefined); }, activity: { - list: () => Promise.resolve({}), + list: () => Promise.resolve(workspaceActivitySnapshots), subscribe: async function* () { yield* []; await new Promise(() => undefined); diff --git a/src/common/constants/storage.ts b/src/common/constants/storage.ts index bb76134fb7..c4289a76ed 100644 --- a/src/common/constants/storage.ts +++ b/src/common/constants/storage.ts @@ -478,8 +478,9 @@ export function getFileTreeExpandStateKey(workspaceId: string): string { export const REVIEW_FILE_TREE_VIEW_MODE_KEY = "reviewFileTreeViewMode"; /** - * Get the localStorage key for persisted agent status for a workspace + * Get the localStorage key for persisted legacy agent status for a workspace. * Stores the most recent successful status_set payload (emoji, message, url) + * so historical status rows and older sessions can still be reconstructed. * Format: "statusState:{workspaceId}" */ diff --git a/src/common/orpc/schemas/workspace.ts b/src/common/orpc/schemas/workspace.ts index d4cf5b0bf9..c78ed9303b 100644 --- a/src/common/orpc/schemas/workspace.ts +++ b/src/common/orpc/schemas/workspace.ts @@ -171,9 +171,13 @@ export const WorkspaceActivitySnapshotSchema = z.object({ lastThinkingLevel: ThinkingLevelSchema.nullable().meta({ description: "Last thinking/reasoning level used in this workspace", }), - agentStatus: WorkspaceAgentStatusSchema.nullable().optional().meta({ + displayStatus: WorkspaceAgentStatusSchema.nullable().optional().meta({ description: - "Most recent status_set value for this workspace (used to surface background progress in sidebar).", + "Transient non-todo status for system-driven background progress (for example executor routing).", + }), + todoStatus: WorkspaceAgentStatusSchema.nullable().optional().meta({ + description: + "Status derived from the current todo list (preferred background progress surface in the sidebar).", }), hasTodos: z.boolean().optional().meta({ description: "Whether the workspace still had todos when streaming last stopped", diff --git a/src/common/types/tools.ts b/src/common/types/tools.ts index ab6d596bdf..f715f0df4c 100644 --- a/src/common/types/tools.ts +++ b/src/common/types/tools.ts @@ -314,8 +314,11 @@ export interface TodoWriteToolResult { count: number; } -// Status Set Tool Types β€” derived from schema (avoid drift) -export type StatusSetToolArgs = z.infer; +export interface StatusSetToolArgs { + emoji: string; + message: string; + url?: string | null; +} // Bash Output Tool Types β€” derived from schema (avoid drift) export type BashOutputToolArgs = z.infer; diff --git a/src/common/utils/todoList.ts b/src/common/utils/todoList.ts index 580d141681..8bdd31955c 100644 --- a/src/common/utils/todoList.ts +++ b/src/common/utils/todoList.ts @@ -6,6 +6,11 @@ interface TodoLikeItem { status: TodoLikeStatus; } +export interface TodoStatusSummary { + emoji: "βœ“" | "πŸ”„" | "β—‹"; + message: string; +} + export function renderTodoItemsAsMarkdownList(todos: TodoItem[]): string { return todos .map((todo) => { @@ -16,6 +21,32 @@ export function renderTodoItemsAsMarkdownList(todos: TodoItem[]): string { .join("\n"); } +/** + * Sidebar and landing-card status should reflect the most actionable todo item, + * so we surface in-progress work first, then the next pending task, and finally + * the most recent completion while the finished list is still visible. + */ +export function deriveTodoStatus(todos: readonly TodoItem[]): TodoStatusSummary | undefined { + const inProgressTodo = todos.find((todo) => todo.status === "in_progress"); + if (inProgressTodo) { + return { emoji: "πŸ”„", message: inProgressTodo.content }; + } + + const pendingTodo = todos.find((todo) => todo.status === "pending"); + if (pendingTodo) { + return { emoji: "β—‹", message: pendingTodo.content }; + } + + for (let index = todos.length - 1; index >= 0; index--) { + const todo = todos[index]; + if (todo.status === "completed") { + return { emoji: "βœ“", message: todo.content }; + } + } + + return undefined; +} + /** * `propose_plan` ends the active planning turn immediately, so any in-progress * todo steps need to flip to completed even though the model does not get a diff --git a/src/common/utils/tools/toolDefinitions.ts b/src/common/utils/tools/toolDefinitions.ts index 1588f7913c..2ebf37e986 100644 --- a/src/common/utils/tools/toolDefinitions.ts +++ b/src/common/utils/tools/toolDefinitions.ts @@ -33,7 +33,6 @@ import { BASH_HARD_MAX_LINES, BASH_MAX_LINE_BYTES, BASH_MAX_TOTAL_BYTES, - STATUS_MESSAGE_MAX_LENGTH, WEB_FETCH_MAX_OUTPUT_BYTES, } from "@/common/constants/toolLimits"; import { @@ -1449,42 +1448,6 @@ export const TOOL_DEFINITIONS = { description: "Read the current todo list", schema: z.object({}), }, - status_set: { - description: - "Set a status indicator to show what Assistant is currently doing. The status is set IMMEDIATELY \n" + - "when this tool is called, even before other tool calls complete.\n" + - "\n" + - "WHEN TO SET STATUS:\n" + - "- Set status when beginning concrete work (file edits, running tests, executing commands)\n" + - "- Update status as work progresses through distinct phases\n" + - "- Set a final status after completion, only claim success when certain (e.g., after confirming checks passed)\n" + - "- DO NOT set status during initial exploration, file reading, or planning phases\n" + - "\n" + - "The status is cleared when a new user message comes in. Validate your approach is feasible \n" + - "before setting status - failed tool calls after setting status indicate premature commitment.\n" + - "\n" + - "URL PARAMETER:\n" + - "- Optional 'url' parameter links to external resources (e.g., PR URL: 'https://github.com/owner/repo/pull/123')\n" + - "- Prefer stable URLs that don't change often - saving the same URL twice is a no-op\n" + - "- URL persists until replaced by a new status with a different URL", - schema: z - .object({ - emoji: z.string().describe("A single emoji character representing the current activity"), - message: z - .string() - .describe( - `A brief description of the current activity (auto-truncated to ${STATUS_MESSAGE_MAX_LENGTH} chars with ellipsis if needed)` - ), - url: z - .string() - .url() - .nullish() - .describe( - "Optional URL to external resource with more details (e.g., Pull Request URL). The URL persists and is displayed to the user for easy access." - ), - }) - .strict(), - }, bash_output: { description: 'DEPRECATED: use task_await instead (pass bash-prefixed taskId like "bash:"). ' + @@ -1645,7 +1608,7 @@ CREATE TABLE IF NOT EXISTS delegation_rollups ( description: "Send a system notification to the user. Use this to alert the user about important events that require their attention, such as long-running task completion, errors requiring intervention, or questions. " + "Notifications appear as OS-native notifications (macOS Notification Center, Windows Toast, Linux). " + - "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine status updates (use status_set instead).", + "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine progress updates β€” keep the todo list current instead.", schema: z .object({ title: z @@ -2118,7 +2081,6 @@ export function getAvailableTools( "system1_keep_ranges", "todo_write", "todo_read", - "status_set", "notify", ...(enableAnalyticsQuery ? ["analytics_query"] : []), "web_fetch", diff --git a/src/common/utils/tools/tools.ts b/src/common/utils/tools/tools.ts index a93bc68a82..c454337656 100644 --- a/src/common/utils/tools/tools.ts +++ b/src/common/utils/tools/tools.ts @@ -12,7 +12,6 @@ import { createFileEditInsertTool } from "@/node/services/tools/file_edit_insert import { createAskUserQuestionTool } from "@/node/services/tools/ask_user_question"; import { createProposePlanTool } from "@/node/services/tools/propose_plan"; import { createTodoWriteTool, createTodoReadTool } from "@/node/services/tools/todo"; -import { createStatusSetTool } from "@/node/services/tools/status_set"; import { createNotifyTool } from "@/node/services/tools/notify"; import { createAnalyticsQueryTool } from "@/node/services/tools/analyticsQuery"; import { createDesktopTools } from "@/node/services/tools/desktopTools"; @@ -393,7 +392,6 @@ export async function getToolsForModel( system1_keep_ranges: createSystem1KeepRangesTool(config), todo_write: createTodoWriteTool(config), todo_read: createTodoReadTool(config), - status_set: createStatusSetTool(config), notify: createNotifyTool(config), ...(config.analyticsService ? { diff --git a/src/node/builtinAgents/mux.md b/src/node/builtinAgents/mux.md index acb48ac02f..fe5cc9ee83 100644 --- a/src/node/builtinAgents/mux.md +++ b/src/node/builtinAgents/mux.md @@ -22,7 +22,6 @@ tools: - ask_user_question - todo_read - todo_write - - status_set - notify - analytics_query --- diff --git a/src/node/services/ExtensionMetadataService.test.ts b/src/node/services/ExtensionMetadataService.test.ts index 9647206b7a..b0ee04559f 100644 --- a/src/node/services/ExtensionMetadataService.test.ts +++ b/src/node/services/ExtensionMetadataService.test.ts @@ -53,44 +53,63 @@ describe("ExtensionMetadataService", () => { expect(snapshot.streaming).toBe(false); expect(snapshot.lastModel).toBeNull(); expect(snapshot.lastThinkingLevel).toBeNull(); - expect(snapshot.agentStatus).toBeNull(); const snapshots = await service.getAllSnapshots(); expect(snapshots.get("workspace-1")).toEqual(snapshot); }); - test("setAgentStatus persists status_set payload", async () => { - const status = { emoji: "πŸ”§", message: "Applying patch", url: "https://example.com/pr/123" }; + test("setAgentStatus persists transient display status payload", async () => { + const displayStatus = { emoji: "πŸ€”", message: "Deciding execution strategy" }; - const snapshot = await service.setAgentStatus("workspace-3", status); - expect(snapshot.agentStatus).toEqual(status); + const snapshot = await service.setAgentStatus("workspace-display-status", displayStatus); + expect(snapshot.displayStatus).toEqual(displayStatus); - const withoutUrl = await service.setAgentStatus("workspace-3", { - emoji: "βœ…", - message: "Checks passed", - }); - // status_set often omits url after the first call; keep the last known URL. - expect(withoutUrl.agentStatus).toEqual({ - emoji: "βœ…", - message: "Checks passed", - url: status.url, + const cleared = await service.setAgentStatus("workspace-display-status", null); + expect(cleared.displayStatus).toBeUndefined(); + }); + + test("clearing transient display status also clears legacy carried-over status", async () => { + await writeFile( + filePath, + JSON.stringify({ + version: 1, + workspaces: { + "workspace-display-status-clear": { + recency: 123, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + agentStatus: { emoji: "πŸ”§", message: "Legacy background status" }, + }, + }, + }), + "utf-8" + ); + + await service.setAgentStatus("workspace-display-status-clear", { + emoji: "πŸ€”", + message: "Deciding execution strategy", }); + const cleared = await service.setAgentStatus("workspace-display-status-clear", null); - const snapshots = await service.getAllSnapshots(); - expect(snapshots.get("workspace-3")?.agentStatus).toEqual(withoutUrl.agentStatus); + expect(cleared.displayStatus).toBeUndefined(); + expect(cleared.todoStatus).toBeUndefined(); + }); - const cleared = await service.setAgentStatus("workspace-3", null); - expect(cleared.agentStatus).toBeNull(); + test("setTodoStatus persists todo-derived progress and clears it when the list empties", async () => { + const todoStatus = { emoji: "πŸ”„", message: "Running checks" }; - const afterClearWithoutUrl = await service.setAgentStatus("workspace-3", { - emoji: "πŸ§ͺ", - message: "Re-running", - }); - expect(afterClearWithoutUrl.agentStatus).toEqual({ - emoji: "πŸ§ͺ", - message: "Re-running", - url: status.url, - }); + const withTodos = await service.setTodoStatus("workspace-todos", todoStatus, true); + expect(withTodos.todoStatus).toEqual(todoStatus); + expect(withTodos.hasTodos).toBe(true); + + const cleared = await service.setTodoStatus("workspace-todos", null, false); + expect(cleared.todoStatus).toBeUndefined(); + expect(cleared.hasTodos).toBe(false); + + const snapshots = await service.getAllSnapshots(); + expect(snapshots.get("workspace-todos")?.todoStatus).toBeUndefined(); + expect(snapshots.get("workspace-todos")?.hasTodos).toBe(false); }); test("concurrent cross-workspace mutations preserve both workspace entries", async () => { @@ -128,10 +147,10 @@ describe("ExtensionMetadataService", () => { await Promise.all([ service.updateRecency("ws-1", 101), service.setStreaming("ws-2", true, { model: "anthropic/sonnet" }), - service.setAgentStatus("ws-3", { emoji: "βš™οΈ", message: "Working" }), + service.setTodoStatus("ws-3", { emoji: "πŸ”„", message: "Working" }, true), service.updateRecency("ws-4", 404), service.setStreaming("ws-5", false), - service.setAgentStatus("ws-6", null), + service.setTodoStatus("ws-6", null, false), service.updateRecency("ws-7", 707), service.setStreaming("ws-8", true, { model: "openai/gpt-5", @@ -146,14 +165,89 @@ describe("ExtensionMetadataService", () => { expect(snapshots.size).toBe(8); expect(snapshots.get("ws-1")?.recency).toBe(101); expect(snapshots.get("ws-2")?.lastModel).toBe("anthropic/sonnet"); - expect(snapshots.get("ws-3")?.agentStatus).toEqual({ emoji: "βš™οΈ", message: "Working" }); + expect(snapshots.get("ws-3")?.todoStatus).toEqual({ emoji: "πŸ”„", message: "Working" }); expect(snapshots.get("ws-4")?.recency).toBe(404); expect(snapshots.get("ws-5")?.streaming).toBe(false); - expect(snapshots.get("ws-6")?.agentStatus).toBeNull(); + expect(snapshots.get("ws-6")?.todoStatus).toBeUndefined(); expect(snapshots.get("ws-7")?.recency).toBe(707); expect(snapshots.get("ws-8")?.lastThinkingLevel).toBe("high"); }); + test("legacy agentStatus is projected into todoStatus when todoStatus is absent", async () => { + await writeFile( + filePath, + JSON.stringify({ + version: 1, + workspaces: { + "workspace-legacy-status": { + recency: 123, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + agentStatus: { emoji: "πŸ”§", message: "Legacy background status" }, + }, + }, + }), + "utf-8" + ); + + const snapshots = await service.getAllSnapshots(); + expect(snapshots.get("workspace-legacy-status")?.todoStatus).toEqual({ + emoji: "πŸ”§", + message: "Legacy background status", + }); + }); + + test("malformed todoStatus falls back to legacy agentStatus when todos were never explicitly cleared", async () => { + await writeFile( + filePath, + JSON.stringify({ + version: 1, + workspaces: { + "workspace-malformed-todo-status": { + recency: 123, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + todoStatus: { nope: true }, + agentStatus: { emoji: "πŸ”§", message: "Legacy background status" }, + }, + }, + }), + "utf-8" + ); + + const snapshots = await service.getAllSnapshots(); + expect(snapshots.get("workspace-malformed-todo-status")?.todoStatus).toEqual({ + emoji: "πŸ”§", + message: "Legacy background status", + }); + }); + + test("legacy agentStatus does not repopulate todoStatus after an explicit empty todo snapshot", async () => { + await writeFile( + filePath, + JSON.stringify({ + version: 1, + workspaces: { + "workspace-cleared-legacy-status": { + recency: 123, + streaming: false, + lastModel: null, + lastThinkingLevel: null, + agentStatus: { emoji: "πŸ”§", message: "Legacy background status" }, + hasTodos: false, + }, + }, + }), + "utf-8" + ); + + const snapshots = await service.getAllSnapshots(); + expect(snapshots.get("workspace-cleared-legacy-status")?.todoStatus).toBeUndefined(); + expect(snapshots.get("workspace-cleared-legacy-status")?.hasTodos).toBe(false); + }); + test("toSnapshot coerces malformed hasTodos to undefined", async () => { await writeFile( filePath, @@ -195,7 +289,6 @@ describe("ExtensionMetadataService", () => { streaming: false, lastModel: null, lastThinkingLevel: null, - agentStatus: null, }); const snapshots = await service.getAllSnapshots(); @@ -226,13 +319,11 @@ describe("ExtensionMetadataService", () => { expect(streaming.streaming).toBe(true); expect(streaming.lastModel).toBe("anthropic/sonnet"); expect(streaming.lastThinkingLevel).toBe("high"); - expect(streaming.agentStatus).toBeNull(); const cleared = await service.setStreaming("workspace-2", false); expect(cleared.streaming).toBe(false); expect(cleared.lastModel).toBe("anthropic/sonnet"); expect(cleared.lastThinkingLevel).toBe("high"); - expect(cleared.agentStatus).toBeNull(); const snapshots = await service.getAllSnapshots(); expect(snapshots.get("workspace-2")).toEqual(cleared); diff --git a/src/node/services/ExtensionMetadataService.ts b/src/node/services/ExtensionMetadataService.ts index ab56cc20f6..4cd28091b3 100644 --- a/src/node/services/ExtensionMetadataService.ts +++ b/src/node/services/ExtensionMetadataService.ts @@ -24,7 +24,8 @@ import { log } from "@/node/services/log"; * - streamingGeneration: Monotonic stream counter used to detect newer background turns * - lastModel: Last model used in this workspace * - lastThinkingLevel: Last thinking/reasoning level used in this workspace - * - agentStatus: Most recent status_set payload (for sidebar progress in background workspaces) + * - displayStatus: Current non-todo status payload for transient system-driven progress + * - todoStatus: Status derived from the current todo list (preferred sidebar progress surface) * - hasTodos: Whether the workspace still had todos when streaming last stopped * * File location: ~/.mux/extensionMetadata.json @@ -38,6 +39,7 @@ import { log } from "@/node/services/log"; export interface ExtensionMetadataStreamingUpdate { model?: string; thinkingLevel?: ExtensionMetadata["lastThinkingLevel"]; + todoStatus?: ExtensionAgentStatus | null; hasTodos?: boolean; generation?: number; } @@ -80,6 +82,7 @@ export class ExtensionMetadataService { lastModel: null, lastThinkingLevel: null, agentStatus: null, + displayStatus: null, lastStatusUrl: null, }; data.workspaces[workspaceId] = created; @@ -192,6 +195,13 @@ export class ExtensionMetadataService { if (update.thinkingLevel !== undefined) { workspace.lastThinkingLevel = update.thinkingLevel; } + if (update.todoStatus !== undefined) { + if (update.todoStatus) { + workspace.todoStatus = update.todoStatus; + } else { + delete workspace.todoStatus; + } + } if (update.hasTodos !== undefined) { workspace.hasTodos = update.hasTodos; } @@ -199,7 +209,25 @@ export class ExtensionMetadataService { } /** - * Update the latest status_set payload for a workspace. + * Update the todo-derived status payload for a workspace. + */ + async setTodoStatus( + workspaceId: string, + todoStatus: ExtensionAgentStatus | null, + hasTodos: boolean + ): Promise { + return this.mutateWorkspaceSnapshot(workspaceId, Date.now(), (workspace) => { + if (todoStatus) { + workspace.todoStatus = todoStatus; + } else { + delete workspace.todoStatus; + } + workspace.hasTodos = hasTodos; + }); + } + + /** + * Update the latest transient non-todo status payload for a workspace. */ async setAgentStatus( workspaceId: string, @@ -207,13 +235,13 @@ export class ExtensionMetadataService { ): Promise { return this.mutateWorkspaceSnapshot(workspaceId, Date.now(), (workspace) => { const previousUrl = - coerceAgentStatus(workspace.agentStatus)?.url ?? + coerceAgentStatus(workspace.displayStatus)?.url ?? coerceStatusUrl(workspace.lastStatusUrl) ?? null; if (agentStatus) { const carriedUrl = agentStatus.url ?? previousUrl ?? undefined; - workspace.agentStatus = + workspace.displayStatus = carriedUrl !== undefined ? { ...agentStatus, @@ -222,8 +250,11 @@ export class ExtensionMetadataService { : agentStatus; workspace.lastStatusUrl = carriedUrl ?? null; } else { + workspace.displayStatus = null; + // Once a transient display status clears, also clear any legacy status payload so + // upgraded workspaces do not resurface stale pre-todo progress on the next snapshot. workspace.agentStatus = null; - // Keep lastStatusUrl across clears so the next status_set without `url` + // Keep lastStatusUrl across clears so the next transient status without `url` // can still reuse the previous deep link. workspace.lastStatusUrl = previousUrl; } diff --git a/src/node/services/agentDefinitions/builtInAgentContent.generated.ts b/src/node/services/agentDefinitions/builtInAgentContent.generated.ts index d8b5013d1d..0066f3c737 100644 --- a/src/node/services/agentDefinitions/builtInAgentContent.generated.ts +++ b/src/node/services/agentDefinitions/builtInAgentContent.generated.ts @@ -8,7 +8,7 @@ export const BUILTIN_AGENT_CONTENT = { "desktop": "---\nname: Desktop\ndescription: Visual desktop automation agent for GUI-heavy, screenshot-intensive workflows\nbase: exec\nui:\n hidden: true\n routable: true\n requires:\n - desktop\nsubagent:\n runnable: true\n append_prompt: |\n You are a desktop automation sub-agent running in a child workspace.\n\n - Your job: interact with the desktop GUI via screenshot-driven automation.\n - Always take a screenshot before starting a GUI interaction sequence.\n - Follow the grounding loop: screenshot β†’ identify target β†’ act β†’ screenshot to verify.\n - After completing the task, summarize the outcome back to the parent with only\n the result plus selected evidence (e.g., a final screenshot path).\n - Do not expand scope beyond the delegated desktop task.\n - Call `agent_report` exactly once when done.\nprompt:\n append: true\nai:\n thinkingLevel: medium\ntools:\n add:\n - desktop_screenshot\n - desktop_move_mouse\n - desktop_click\n - desktop_double_click\n - desktop_drag\n - desktop_scroll\n - desktop_type\n - desktop_key_press\n remove:\n # Desktop agent should not recursively orchestrate child agents\n - task\n - task_await\n - task_list\n - task_terminate\n - task_apply_git_patch\n # No planning tools\n - propose_plan\n - ask_user_question\n # Internal-only\n - system1_keep_ranges\n # Global config tools\n - mux_agents_.*\n - agent_skill_write\n---\n\nYou are a desktop automation agent.\n\n- **Screenshot-first rule:** Always take a `desktop_screenshot` before beginning any GUI interaction loop. Never act on stale visual state.\n- **Grounding loop:** Follow `screenshot β†’ identify target coordinates β†’ act (click/type/drag) β†’ screenshot to verify` for each major interaction. Every major interaction step should end with a screenshot to verify the expected result.\n- **Coordinate precision:** Use screenshot analysis to identify precise pixel coordinates for clicks, drags, and other positional actions. Account for window position, display scaling, and DPI before acting.\n- **Defensive interaction patterns:**\n - Wait briefly after clicks before verifying because menus and dialogs may animate.\n - For text input, click the target field first, verify focus, then type.\n - For drag operations, verify both the start and end positions with screenshots.\n - If an unexpected dialog or popup appears, take another screenshot and adapt to the new state.\n- **Scrolling:** Use `desktop_scroll` to navigate within windows, then take a screenshot after scrolling to verify the new content is visible.\n- **Error recovery:** If an action does not produce the expected result, take another screenshot, reassess the current state, and retry with adjusted coordinates.\n- **Reporting:** When complete, summarize only the outcome and key evidence back to the parent agent, such as the final screenshot confirming success. Do not send raw coordinate logs.\n", "exec": "---\nname: Exec\ndescription: Implement changes in the repository\nui:\n color: var(--color-exec-mode)\nsubagent:\n runnable: true\n append_prompt: |\n You are running as a sub-agent in a child workspace.\n\n - Take a single narrowly scoped task and complete it end-to-end. Do not expand scope.\n - If the task brief includes clear starting points and acceptance criteria (or a concrete approved plan handoff) β€” implement it directly.\n Do not spawn `explore` tasks or write a \"mini-plan\" unless you are concretely blocked by a missing fact (e.g., a file path that doesn't exist, an unknown symbol name, or an error that contradicts the brief).\n - When you do need repo context you don't have, prefer 1–3 narrow `explore` tasks (possibly in parallel) over broad manual file-reading.\n - If the task brief is missing critical information (scope, acceptance, or starting points) and you cannot infer it safely after a quick `explore`, do not guess.\n Stop and call `agent_report` once with 1–3 concrete questions/unknowns for the parent agent, and do not create commits.\n - Run targeted verification and create one or more git commits.\n - Never amend existing commits β€” always create new commits on top.\n - **Before your stream ends, you MUST call `agent_report` exactly once with:**\n - What changed (paths / key details)\n - What you ran (tests, typecheck, lint)\n - Any follow-ups / risks\n (If you forget, the parent will inject a follow-up message and you'll waste tokens.)\n - You may call task/task_await/task_list/task_terminate to delegate further when available.\n Delegation is limited by Max Task Nesting Depth (Settings β†’ Agents β†’ Task Settings).\n - Do not call propose_plan.\ntools:\n add:\n # Allow all tools by default (includes MCP tools which have dynamic names)\n # Use tools.remove in child agents to restrict specific tools\n - .*\n remove:\n # Exec mode doesn't use planning tools\n - propose_plan\n - ask_user_question\n # Internal-only tools\n - system1_keep_ranges\n # Global config tools are restricted to the mux agent\n - mux_agents_.*\n - agent_skill_write\n - agent_skill_delete\n - mux_config_read\n - mux_config_write\n - skills_catalog_.*\n - analytics_query\n---\n\nYou are in Exec mode.\n\n- If an accepted `` block is provided, treat it as the contract and implement it directly. Only do extra exploration if the plan references non-existent files/symbols or if errors contradict it.\n- Use `explore` sub-agents just-in-time for missing repo context (paths/symbols/tests); don't spawn them by default.\n- Trust Explore sub-agent reports as authoritative for repo facts (paths/symbols/callsites). Do not redo the same investigation yourself; only re-check if the report is ambiguous or contradicts other evidence.\n- For correctness claims, an Explore sub-agent report counts as having read the referenced files.\n- Make minimal, correct, reviewable changes that match existing codebase patterns.\n- Prefer targeted commands and checks (typecheck/tests) when feasible.\n- Treat as a standing order: keep running checks and addressing failures until they pass or a blocker outside your control arises.\n\n## Desktop Automation\n\nWhen a task involves repeated screenshot/action/verify loops for desktop GUI interaction (for example, clicking through application UIs, filling desktop app forms, or visually verifying GUI state), delegate to the `desktop` agent via `task` rather than performing desktop automation inline. The desktop agent is purpose-built for the screenshot β†’ act β†’ verify grounding loop.\n", "explore": "---\nname: Explore\ndescription: Read-only exploration of repository, environment, web, etc. Useful for investigation before making changes.\nbase: exec\nui:\n hidden: true\nsubagent:\n runnable: true\n skip_init_hook: true\n append_prompt: |\n You are an Explore sub-agent running inside a child workspace.\n\n - Explore the repository to answer the prompt using read-only investigation.\n - Return concise, actionable findings (paths, symbols, callsites, and facts).\n - When you have a final answer, call agent_report exactly once.\n - Do not call agent_report until you have completed the assigned task.\ntools:\n # Remove editing and task tools from exec base (read-only agent; skill tools are kept)\n remove:\n - file_edit_.*\n - task\n - task_apply_git_patch\n - task_.*\n---\n\nYou are in Explore mode (read-only).\n\n=== CRITICAL: READ-ONLY MODE - NO FILE MODIFICATIONS ===\n\n- You MUST NOT manually create, edit, delete, move, copy, or rename tracked files.\n- You MUST NOT stage/commit or otherwise modify git state.\n- You MUST NOT use redirect operators (>, >>) or heredocs to write to files.\n - Pipes are allowed for processing, but MUST NOT be used to write to files (for example via `tee`).\n- You MUST NOT run commands that are explicitly about modifying the filesystem or repo state (rm, mv, cp, mkdir, touch, git add/commit, installs, etc.).\n- You MAY run verification commands (fmt-check/lint/typecheck/test) even if they create build artifacts/caches, but they MUST NOT modify tracked files.\n - After running verification, check `git status --porcelain` and report if it is non-empty.\n- Prefer `file_read` for reading file contents (supports offset/limit paging).\n- Use bash for read-only operations (rg, ls, git diff/show/log, etc.) and verification commands.\n", - "mux": "---\nname: Chat With Mux\ndescription: Configure Mux settings, skills, and agent instructions\nui:\n hidden: true\n routable: true\nsubagent:\n runnable: false\ntools:\n add:\n - mux_agents_read\n - mux_agents_write\n - mux_config_read\n - mux_config_write\n - agent_skill_read\n - agent_skill_read_file\n - agent_skill_list\n - agent_skill_write\n - agent_skill_delete\n - skills_catalog_search\n - skills_catalog_read\n - ask_user_question\n - todo_read\n - todo_write\n - status_set\n - notify\n - analytics_query\n---\n\nYou are the **Mux system assistant**.\n\nYour tools are **context-aware** β€” they automatically target the right scope:\n\n**In a project workspace** (routed via Auto):\n\n- **Project skills**: Create, update, list, and delete project skills (`.mux/skills/`)\n- **Project instructions**: Edit the project's `AGENTS.md`\n\n**In the system workspace** (Chat with Mux):\n\n- **Global skills**: Create, update, list, and delete global skills (`~/.mux/skills/`)\n- **Global instructions**: Edit the mux-wide `~/.mux/AGENTS.md`\n\n**Always global** (regardless of context):\n\n- **App config**: Read and write Mux configuration (`~/.mux/config.json`)\n\n## Safety rules\n\n- You do **not** have access to arbitrary filesystem tools.\n- You do **not** have access to project secrets.\n- Before writing AGENTS.md, you must:\n 1. Read the current file (`mux_agents_read`).\n 2. Propose the exact change (show the new content or a concise diff).\n 3. Ask for explicit confirmation via `ask_user_question`.\n 4. Only then call `mux_agents_write` with `confirm: true`.\n- Before writing a skill, show the proposed `SKILL.md` content and confirm.\n\nIf the user declines, do not write anything.\n", + "mux": "---\nname: Chat With Mux\ndescription: Configure Mux settings, skills, and agent instructions\nui:\n hidden: true\n routable: true\nsubagent:\n runnable: false\ntools:\n add:\n - mux_agents_read\n - mux_agents_write\n - mux_config_read\n - mux_config_write\n - agent_skill_read\n - agent_skill_read_file\n - agent_skill_list\n - agent_skill_write\n - agent_skill_delete\n - skills_catalog_search\n - skills_catalog_read\n - ask_user_question\n - todo_read\n - todo_write\n - notify\n - analytics_query\n---\n\nYou are the **Mux system assistant**.\n\nYour tools are **context-aware** β€” they automatically target the right scope:\n\n**In a project workspace** (routed via Auto):\n\n- **Project skills**: Create, update, list, and delete project skills (`.mux/skills/`)\n- **Project instructions**: Edit the project's `AGENTS.md`\n\n**In the system workspace** (Chat with Mux):\n\n- **Global skills**: Create, update, list, and delete global skills (`~/.mux/skills/`)\n- **Global instructions**: Edit the mux-wide `~/.mux/AGENTS.md`\n\n**Always global** (regardless of context):\n\n- **App config**: Read and write Mux configuration (`~/.mux/config.json`)\n\n## Safety rules\n\n- You do **not** have access to arbitrary filesystem tools.\n- You do **not** have access to project secrets.\n- Before writing AGENTS.md, you must:\n 1. Read the current file (`mux_agents_read`).\n 2. Propose the exact change (show the new content or a concise diff).\n 3. Ask for explicit confirmation via `ask_user_question`.\n 4. Only then call `mux_agents_write` with `confirm: true`.\n- Before writing a skill, show the proposed `SKILL.md` content and confirm.\n\nIf the user declines, do not write anything.\n", "name_workspace": "---\nname: Name Workspace\ndescription: Generate workspace name and title from user message\nui:\n hidden: true\nsubagent:\n runnable: false\ntools:\n require:\n - propose_name\n---\n\nYou are a workspace naming assistant. Your only job is to call the `propose_name` tool with a suitable name and title.\n\nDo not emit text responses. Call the `propose_name` tool immediately.\n", "orchestrator": "---\nname: Orchestrator\ndescription: Coordinate sub-agent implementation and apply patches\nbase: exec\nsubagent:\n runnable: false\n append_prompt: |\n You are running as a sub-agent orchestrator in a child workspace.\n\n - Your parent workspace handles all PR management.\n Do NOT create pull requests, push to remote branches, or run any\n `gh pr` / `git push` commands. This applies even if AGENTS.md or\n other instructions say otherwise β€” those PR instructions target the\n top-level workspace only.\n - Orchestrate your delegated subtasks (spawn, await, apply patches,\n verify locally), then call `agent_report` exactly once with:\n - What changed (paths / key details)\n - What you ran (tests, typecheck, lint)\n - Any follow-ups / risks\n - Do not expand scope beyond the delegated task.\ntools:\n add:\n - ask_user_question\n remove:\n - propose_plan\n # Keep Orchestrator focused on coordination: no direct file edits.\n - file_edit_.*\n---\n\nYou are an internal Orchestrator agent running in Exec mode.\n\n**Mission:** coordinate implementation by delegating investigation + coding to sub-agents, then integrating their patches into this workspace.\n\nWhen a plan is present (default):\n\n- Treat the accepted plan as the source of truth. Its file paths, symbols, and structure were validated during planning β€” do not routinely spawn `explore` to re-confirm them. Exception: if the plan references stale paths or appears to have been authored/edited by the user without planner validation, a single targeted `explore` to sanity-check critical paths is acceptable.\n- Spawning `explore` to gather _additional_ context beyond what the plan provides is encouraged (e.g., checking whether a helper already exists, locating test files not mentioned in the plan, discovering existing patterns to match). This produces better implementation task briefs.\n- Do not spawn `explore` just to verify that a planner-generated plan is correct β€” that is the planner's job, and the plan was accepted by the user.\n- Convert the plan into concrete implementation subtasks and start delegation (`exec` for low complexity, `plan` for higher complexity).\n\nWhat you are allowed to do directly in this workspace:\n\n- Spawn/await/manage sub-agent tasks (`task`, `task_await`, `task_list`, `task_terminate`).\n- Apply patches (`task_apply_git_patch`).\n- Use `bash` for orchestration workflows: repo coordination via `git`/`gh`, targeted post-apply verification runs, and waiting on review/CI completion after PR updates (for example: `git push`, `gh pr comment`, `gh pr view`, `gh pr checks --watch`). Only run `gh pr create` when the user explicitly asks you to open a PR.\n- Ask clarifying questions with `ask_user_question` when blocked.\n- Coordinate targeted verification after integrating patches by running focused checks directly (when appropriate) or delegating runs to `explore`/`exec`.\n- Delegate patch-conflict reconciliation to `exec` sub-agents.\n\nHard rules (delegate-first):\n\n- Trust `explore` sub-agent reports as authoritative for repo facts (paths/symbols/callsites). Do not redo the same investigation yourself; only re-check if the report is ambiguous or contradicts other evidence.\n- For correctness claims, an `explore` sub-agent report counts as having read the referenced files.\n- **Do not do broad repo investigation here.** If you need context, spawn an `explore` sub-agent with a narrow prompt (keeps this agent focused on coordination).\n- **Do not implement features/bugfixes directly here.** Spawn `exec` (simple) or `plan` (complex) sub-agents and have them complete the work end-to-end.\n- **Do not use `bash` for file reads/writes, manual code editing, or broad repo exploration.** `bash` in this workspace is for orchestration-only operations: `git`/`gh` repo management, targeted post-apply verification checks, and waiting for PR review/CI outcomes. If direct checks fail due to code issues, delegate fixes to `exec`/`plan` sub-agents instead of implementing changes here.\n- **Never read or scan session storage.** This includes `~/.mux/sessions/**` and `~/.mux/sessions/subagent-patches/**`. Treat session storage as an internal implementation detail; do not shell out to locate patch artifacts on disk. Only use `task_apply_git_patch` to access patches.\n\nDelegation guide:\n\n- Use `explore` for narrowly-scoped read-only questions (confirm an assumption, locate a symbol/callsite, find relevant tests). Avoid \"scan the repo\" prompts.\n- Use `exec` for straightforward, low-complexity work where the implementation path is obvious from the task brief.\n - Good fit: single-file edits, localized wiring to existing helpers, straightforward command execution, or narrowly scoped follow-ups with clear acceptance.\n - Provide a compact task brief (so the sub-agent can act without reading the full plan) with:\n - Task: one sentence\n - Background (why this matters): 1–3 bullets\n - Scope / non-goals: what to change, and what not to change\n - Starting points: relevant files/symbols/paths (from prior exploration)\n - Acceptance: bullets / checks\n - Deliverables: commits + verification commands to run\n - Constraints:\n - Do not expand scope.\n - Prefer `explore` tasks for repo investigation (paths/symbols/tests/patterns) to preserve your context window for implementation.\n Trust Explore reports as authoritative; do not re-verify unless ambiguous/contradictory.\n If starting points + acceptance are already clear, skip initial explore and only explore when blocked.\n - Create one or more git commits before `agent_report`.\n- Use `plan` for higher-complexity subtasks that touch multiple files/locations, require non-trivial investigation, or have an unclear implementation approach.\n - Default to `plan` when a subtask needs coordinated updates across multiple locations, unless the edits are mechanical and already fully specified.\n - For higher-complexity implementation work, prefer `plan` over `exec` so the sub-agent can do targeted research and produce a precise plan before implementation begins.\n - Good fit: multi-file refactors, cross-module behavior changes, unfamiliar subsystems, or work where sequencing/dependencies need discovery.\n - Plan subtasks automatically hand off to implementation after a successful `propose_plan`; expect the usual task completion output once implementation finishes.\n - For `plan` briefs, prioritize goal + constraints + acceptance criteria over file-by-file diff instructions.\n- Use `desktop` for GUI-heavy desktop automation that requires repeated screenshot β†’ act β†’ verify loops (for example, interacting with application windows, clicking through UI flows, or visual verification). The desktop agent enforces a grounding discipline that keeps visual context local.\n\nRecommended Orchestrator β†’ Exec task brief template:\n\n- Task: \n- Background (why this matters):\n - \n- Scope / non-goals:\n - Scope: \n - Non-goals: \n- Starting points: \n- Dependencies / assumptions:\n - Assumes: \n - If unmet: stop and report back; do not expand scope to create prerequisites.\n- Acceptance: \n- Deliverables:\n - Commits: \n - Verification: \n- Constraints:\n - Do not expand scope.\n - Prefer `explore` tasks for repo investigation (paths/symbols/tests/patterns) to preserve your context window for implementation.\n Trust Explore reports as authoritative; do not re-verify unless ambiguous/contradictory.\n If starting points + acceptance are already clear, skip initial explore and only explore when blocked.\n - Create one or more git commits before `agent_report`.\n\nDependency analysis (required before spawning implementation tasks β€” `exec` or `plan`):\n\n- For each candidate subtask, write:\n - Outputs: files/targets/artifacts introduced/renamed/generated\n - Inputs / prerequisites (including for verification): what must already exist\n- A subtask is \"independent\" only if its patch can be applied + verified on the current parent workspace HEAD, without any other pending patch.\n- Parallelism is the default: maximize the size of each independent batch and run it in parallel.\n Use the sequential protocol only when a subtask has a concrete prerequisite on another subtask's outputs.\n- If task B depends on outputs from task A:\n - Do not spawn B until A has completed and A's patch is applied in the parent workspace.\n - If the dependency chain is tight (download β†’ generate β†’ wire-up), prefer one `exec` task rather than splitting.\n\nExample dependency chain (schema download β†’ generation):\n\n- Task A outputs: a new download target + new schema files.\n- Task B inputs: those schema files; verifies by running generation.\n- Therefore: run Task A (await + apply patch) before spawning Task B.\n\nPatch integration loop (default):\n\n1. Identify a batch of independent subtasks.\n2. Spawn one implementation sub-agent task per subtask with `run_in_background: true` (`exec` for low complexity, `plan` for higher complexity).\n3. Await the batch via `task_await`.\n4. For each successful implementation task (`exec` directly, or `plan` after auto-handoff to implementation), integrate patches one at a time:\n - Treat every successful child task with a `taskId` as pending patch integration, whether the completion arrived inline from `task` or later from `task_await`.\n - Complete each dry-run + real-apply pair before starting the next patch. Applying one patch changes `HEAD`, which can invalidate later dry-run results.\n - Dry-run apply: `task_apply_git_patch` with `dry_run: true`.\n - If dry-run succeeds, immediately apply for real: `task_apply_git_patch` with `dry_run: false`.\n - Do not assume an inline `status: completed` result means the child changes are already present in this workspace.\n - If dry-run fails, treat it as a patch conflict and delegate reconciliation:\n 1. Do not attempt a real apply for that patch in this workspace.\n 2. Spawn a dedicated `exec` task. In the brief, include the original failing `task_id` and instruct the sub-agent to replay that patch via `task_apply_git_patch`, resolve conflicts in its own workspace, run `git am --continue`, commit the resolved result, and report back with a new patch to apply cleanly.\n - If real apply fails unexpectedly:\n 1. Restore a clean working tree before delegating: run `git am --abort` via `bash` only when a git-am session is in progress; if abort reports no operation in progress, continue.\n 2. Then follow the same delegated reconciliation flow above.\n5. Verify + review:\n - Run focused verification directly with `bash` when practical (for example: targeted tests or the repo's standard full-validation command), or delegate verification to `explore`/`exec` when investigation/fixes are likely.\n - Use `git`/`gh` directly for PR orchestration when a PR already exists (pushes, review-request comments, replies to review remarks, and CI/check-status waiting loops). Create a new PR only when the user explicitly asks.\n - PASS: summary-only (no long logs).\n - FAIL: include the failing command + key error lines; then delegate a fix to `exec`/`plan` and re-verify.\n\nSequential protocol (only for dependency chains):\n\n1. Spawn the prerequisite implementation task (`exec` or `plan`, based on complexity) with `run_in_background: false`.\n2. If step 1 returns `queued`/`running` without a completed report, call `task_await` with the returned `taskId` before attempting any patch apply. If step 1 returns `status: completed` inline, that same `taskId` still requires patch application.\n3. Dry-run apply its patch (`dry_run: true`); then apply for real (`dry_run: false`). If either step fails, follow the conflict playbook above (including `git am --abort` only when a real apply leaves a git-am session in progress).\n4. Only after the patch is applied, spawn the dependent implementation task.\n5. Repeat until the dependency chain is complete.\n\nNote: child workspaces are created at spawn time. Spawning dependents too early means they work from the wrong repo snapshot and get forced into scope expansion.\n\nKeep context minimal:\n\n- Do not request, paste, or restate large plans.\n- Prefer short, actionable prompts, but include enough context that the sub-agent does not need your plan file.\n - Child workspaces do not automatically have access to the parent's plan file; summarize just the relevant slice or provide file pointers.\n- Prefer file paths/symbols over long prose.\n", "plan": "---\nname: Plan\ndescription: Create a plan before coding\nui:\n color: var(--color-plan-mode)\nsubagent:\n runnable: true\ntools:\n add:\n # Allow all tools by default (includes MCP tools which have dynamic names)\n # Use tools.remove in child agents to restrict specific tools\n - .*\n remove:\n # Plan should not apply sub-agent patches.\n - task_apply_git_patch\n # Global config tools are restricted to the mux agent\n - mux_agents_.*\n - agent_skill_write\n - agent_skill_delete\n - mux_config_read\n - mux_config_write\n - skills_catalog_.*\n - analytics_query\n require:\n - propose_plan\n # Note: file_edit_* tools ARE available but restricted to plan file only at runtime\n # Note: task tools ARE enabled - Plan delegates to Explore sub-agents\n---\n\nYou are in Plan Mode.\n\n- Every response MUST produce or update a plan.\n- Match the plan's size and structure to the problem.\n- Keep the plan self-contained and scannable.\n- Assume the user wants the completed plan, not a description of how you would make one.\n\n## Investigate only what you need\n\nBefore proposing a plan, figure out what you need to verify and gather that evidence.\n\n- When delegation is available, use Explore sub-agents for repo investigation. In Plan Mode, only\n spawn `agentId: \"explore\"` tasks.\n- Give each Explore task specific deliverables, and parallelize them when that helps.\n- Trust completed Explore reports for repo facts. Do not re-investigate just to second-guess them.\n If something is missing, ambiguous, or conflicting, spawn another focused Explore task.\n- If task delegation is unavailable, do the narrowest read-only investigation yourself.\n- Reserve `file_read` for the plan file itself, user-provided text already in this conversation,\n and that narrow fallback. When reading the plan file, prefer `file_read` over `bash cat` so long\n plans do not get compacted.\n- Wait for any spawned Explore tasks before calling `propose_plan`.\n\n## Write the plan\n\n- Use whatever structure best fits the problem: a few bullets, phases, workstreams, risks, or\n decision points are all fine.\n- Include the context, constraints, evidence, and concrete path forward somewhere in that\n structure.\n- Name the files, symbols, or subsystems that matter, and order the work so an implementer can\n follow it.\n- Keep uncertainty brief and local to the relevant step. Use `ask_user_question` when you need the\n user to decide something.\n- Include small code snippets only when they materially reduce ambiguity.\n- Put long rationale or background into `
/` blocks.\n\n## Questions and handoff\n\n- If you need clarification from the user, use `ask_user_question` instead of asking in chat or\n adding an \"Open Questions\" section to the plan.\n- Ask up to 4 questions at a time (2–4 options each; \"Other\" remains available for free-form\n input).\n- After you get answers, update the plan and then call `propose_plan` when it is ready for review.\n- After calling `propose_plan`, do not paste the plan into chat or mention the plan file path.\n- If the user wants edits to other files, ask them to switch to Exec mode.\n\nWorkspace-specific runtime instructions (plan file path, edit restrictions, nesting warnings) are\nprovided separately.\n", diff --git a/src/node/services/agentSkills/builtInSkillContent.generated.ts b/src/node/services/agentSkills/builtInSkillContent.generated.ts index 3ff5f5d62d..c14056c35a 100644 --- a/src/node/services/agentSkills/builtInSkillContent.generated.ts +++ b/src/node/services/agentSkills/builtInSkillContent.generated.ts @@ -198,6 +198,7 @@ export const BUILTIN_SKILL_FILES: Record> = { "Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.", "", "Core workflow:", + "", "1. `agent-browser open ` - Navigate to page", "2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)", '3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs', @@ -267,7 +268,7 @@ export const BUILTIN_SKILL_FILES: Record> = { "- Never use emoji characters as UI icons or status indicators; emoji rendering varies across platforms and fonts.", "- Prefer SVG icons (usually from `lucide-react`) or shared icon components under `src/browser/components/icons/`.", "- For tool call headers, use `ToolIcon` from `src/browser/components/tools/shared/ToolPrimitives.tsx`.", - "- If a tool/agent provides an emoji string (e.g., `status_set` or `displayStatus`), render via `EmojiIcon` (`src/browser/components/icons/EmojiIcon.tsx`) instead of rendering the emoji.", + "- If a tool/agent provides an emoji string (e.g., todo-derived status or `displayStatus`), render via `EmojiIcon` (`src/browser/components/icons/EmojiIcon.tsx`) instead of rendering the emoji.", "- If a new emoji appears in tool output, extend `EmojiIcon` to map it to an SVG icon.", "- Colors defined in `src/browser/styles/globals.css` (`:root @theme` block). Reference via CSS variables (e.g., `var(--color-plan-mode)`), never hardcode hex values.", "- For incrementing numeric UI (costs, timers, token counts, percentages), use semantic numeric typography utilities (`counter-nums` / `counter-nums-mono`) to prevent width jitter.", @@ -368,9 +369,9 @@ export const BUILTIN_SKILL_FILES: Record> = { "- E2E tests (tests/e2e) work with Radix but are slow (~2min startup); reserve for scenarios that truly need real Electron.", "- Only use `validateApiKeys()` in tests that actually make AI API calls.", "", - "## Tool: status_set", + "## Tool: todo_write", "", - "- Set status url to the Pull Request once opened", + "- Keep the TODO list current during multi-step work; sidebar progress is derived from it.", "", "## GitHub", "", @@ -1273,7 +1274,6 @@ export const BUILTIN_SKILL_FILES: Record> = { " - ask_user_question", " - todo_read", " - todo_write", - " - status_set", " - notify", " - analytics_query", "---", @@ -1644,7 +1644,7 @@ export const BUILTIN_SKILL_FILES: Record> = { "", "## Model: openai:.\\*codex", "", - "Use status reporting tools every few minutes.", + "Keep the todo list current every few minutes while a task is in flight.", "```", "", "### Tool Prompts", @@ -1667,12 +1667,12 @@ export const BUILTIN_SKILL_FILES: Record> = { "", "- Run `prettier --write` after editing files", "", - "## Tool: status_set", + "## Tool: todo_write", "", - "- Set status URL to the Pull Request once opened", + "- Keep the TODO list current during multi-step work; sidebar progress is derived from it.", "```", "", - "**Common tools** (varies by model/provider): `bash`, `file_read`, `file_edit_replace_string`, `file_edit_insert`, `propose_plan`, `ask_user_question`, `todo_write`, `todo_read`, `status_set`, `web_fetch`, `web_search`.", + "**Common tools** (varies by model/provider): `bash`, `file_read`, `file_edit_replace_string`, `file_edit_insert`, `propose_plan`, `ask_user_question`, `todo_write`, `todo_read`, `web_fetch`, `web_search`.", "", "## Practical layout", "", @@ -2517,7 +2517,7 @@ export const BUILTIN_SKILL_FILES: Record> = { "- Notify on CI failures or deployment issues", "- Notify when waiting for user input longer than 30 seconds", "- Do not notify for routine status updates", - "- Use status_set for progress updates instead", + "- Use `todo_write` for routine progress updates instead", "```", "", "See [Instruction Files](/agents/instruction-files) for more on scoped instructions.", @@ -2568,7 +2568,7 @@ export const BUILTIN_SKILL_FILES: Record> = { " description:", ' "Send a system notification to the user. Use this to alert the user about important events that require their attention, such as long-running task completion, errors requiring intervention, or questions. " +', ' "Notifications appear as OS-native notifications (macOS Notification Center, Windows Toast, Linux). " +', - ' "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine status updates (use status_set instead).",', + ' "Infer whether to send notifications from user instructions. If no instructions provided, reserve notifications for major wins or blocking issues. Do not use for routine progress updates β€” keep the todo list current instead.",', " schema: z", " .object({", " title: z", @@ -4676,17 +4676,6 @@ export const BUILTIN_SKILL_FILES: Record> = { "
", "", "
", - "status_set (3)", - "", - "| Env var | JSON path | Type | Description |", - "| ------------------------ | --------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------- |", - "| `MUX_TOOL_INPUT_EMOJI` | `emoji` | string | A single emoji character representing the current activity |", - "| `MUX_TOOL_INPUT_MESSAGE` | `message` | string | A brief description of the current activity (auto-truncated to 60 chars with ellipsis if needed) |", - "| `MUX_TOOL_INPUT_URL` | `url` | string | Optional URL to external resource with more details (e.g., Pull Request URL). The URL persists and is displayed to the user for easy access. |", - "", - "
", - "", - "
", "switch_agent (3)", "", "| Env var | JSON path | Type | Description |", diff --git a/src/node/services/tools/status_set.test.ts b/src/node/services/tools/status_set.test.ts deleted file mode 100644 index c9ee6bcd2c..0000000000 --- a/src/node/services/tools/status_set.test.ts +++ /dev/null @@ -1,246 +0,0 @@ -import { describe, it, expect } from "bun:test"; -import { createStatusSetTool } from "./status_set"; -import type { ToolConfiguration } from "@/common/utils/tools/tools"; -import { createRuntime } from "@/node/runtime/runtimeFactory"; -import type { ToolExecutionOptions } from "ai"; -import { STATUS_MESSAGE_MAX_LENGTH } from "@/common/constants/toolLimits"; - -describe("status_set tool validation", () => { - const mockConfig: ToolConfiguration = { - cwd: "/test", - runtime: createRuntime({ type: "local", srcBaseDir: "/tmp" }), - runtimeTempDir: "/tmp", - workspaceId: "test-workspace", - }; - - const mockToolCallOptions: ToolExecutionOptions = { - toolCallId: "test-call-id", - messages: [], - }; - - describe("emoji validation", () => { - it("should accept single emoji characters", async () => { - const tool = createStatusSetTool(mockConfig); - - const emojis = ["πŸ”", "πŸ“", "βœ…", "πŸš€", "⏳"]; - for (const emoji of emojis) { - const result = (await tool.execute!({ emoji, message: "Test" }, mockToolCallOptions)) as { - success: boolean; - emoji: string; - message: string; - }; - expect(result).toEqual({ success: true, emoji, message: "Test" }); - } - }); - - it("should accept emojis with variation selectors", async () => { - const tool = createStatusSetTool(mockConfig); - - // Emojis with variation selectors (U+FE0F) - const emojis = ["✏️", "βœ…", "➑️", "β˜€οΈ"]; - for (const emoji of emojis) { - const result = (await tool.execute!({ emoji, message: "Test" }, mockToolCallOptions)) as { - success: boolean; - emoji: string; - message: string; - }; - expect(result).toEqual({ success: true, emoji, message: "Test" }); - } - }); - - it("should accept emojis with skin tone modifiers", async () => { - const tool = createStatusSetTool(mockConfig); - - const emojis = ["πŸ‘‹πŸ»", "πŸ‘‹πŸ½", "πŸ‘‹πŸΏ"]; - for (const emoji of emojis) { - const result = (await tool.execute!({ emoji, message: "Test" }, mockToolCallOptions)) as { - success: boolean; - emoji: string; - message: string; - }; - expect(result).toEqual({ success: true, emoji, message: "Test" }); - } - }); - - it("should reject multiple emojis", async () => { - const tool = createStatusSetTool(mockConfig); - - const result1 = (await tool.execute!( - { emoji: "πŸ”πŸ“", message: "Test" }, - mockToolCallOptions - )) as { success: boolean; error: string }; - expect(result1.success).toBe(false); - expect(result1.error).toBe("emoji must be a single emoji character"); - - const result2 = (await tool.execute!( - { emoji: "βœ…βœ…", message: "Test" }, - mockToolCallOptions - )) as { success: boolean; error: string }; - expect(result2.success).toBe(false); - expect(result2.error).toBe("emoji must be a single emoji character"); - }); - - it("should reject text (non-emoji)", async () => { - const tool = createStatusSetTool(mockConfig); - - const result1 = (await tool.execute!( - { emoji: "a", message: "Test" }, - mockToolCallOptions - )) as { - success: boolean; - error: string; - }; - expect(result1.success).toBe(false); - expect(result1.error).toBe("emoji must be a single emoji character"); - - const result2 = (await tool.execute!( - { emoji: "abc", message: "Test" }, - mockToolCallOptions - )) as { success: boolean; error: string }; - expect(result2.success).toBe(false); - expect(result2.error).toBe("emoji must be a single emoji character"); - - const result3 = (await tool.execute!( - { emoji: "!", message: "Test" }, - mockToolCallOptions - )) as { - success: boolean; - error: string; - }; - expect(result3.success).toBe(false); - expect(result3.error).toBe("emoji must be a single emoji character"); - }); - - it("should reject empty emoji", async () => { - const tool = createStatusSetTool(mockConfig); - - const result = (await tool.execute!({ emoji: "", message: "Test" }, mockToolCallOptions)) as { - success: boolean; - error: string; - }; - expect(result.success).toBe(false); - expect(result.error).toBe("emoji must be a single emoji character"); - }); - - it("should reject emoji with text", async () => { - const tool = createStatusSetTool(mockConfig); - - const result1 = (await tool.execute!( - { emoji: "πŸ”a", message: "Test" }, - mockToolCallOptions - )) as { success: boolean; error: string }; - expect(result1.success).toBe(false); - expect(result1.error).toBe("emoji must be a single emoji character"); - - const result2 = (await tool.execute!( - { emoji: "xπŸ”", message: "Test" }, - mockToolCallOptions - )) as { success: boolean; error: string }; - expect(result2.success).toBe(false); - expect(result2.error).toBe("emoji must be a single emoji character"); - }); - }); - - describe("message validation", () => { - it(`should accept messages up to ${STATUS_MESSAGE_MAX_LENGTH} characters`, async () => { - const tool = createStatusSetTool(mockConfig); - - const result1 = (await tool.execute!( - { emoji: "βœ…", message: "a".repeat(STATUS_MESSAGE_MAX_LENGTH) }, - mockToolCallOptions - )) as { success: boolean; message: string }; - expect(result1.success).toBe(true); - expect(result1.message).toBe("a".repeat(STATUS_MESSAGE_MAX_LENGTH)); - - const result2 = (await tool.execute!( - { emoji: "βœ…", message: "Analyzing code structure" }, - mockToolCallOptions - )) as { success: boolean }; - expect(result2.success).toBe(true); - }); - - it(`should truncate messages longer than ${STATUS_MESSAGE_MAX_LENGTH} characters with ellipsis`, async () => { - const tool = createStatusSetTool(mockConfig); - - // Test with MAX_LENGTH + 1 characters - const result1 = (await tool.execute!( - { emoji: "βœ…", message: "a".repeat(STATUS_MESSAGE_MAX_LENGTH + 1) }, - mockToolCallOptions - )) as { success: boolean; message: string }; - expect(result1.success).toBe(true); - expect(result1.message).toBe("a".repeat(STATUS_MESSAGE_MAX_LENGTH - 1) + "…"); - expect(result1.message.length).toBe(STATUS_MESSAGE_MAX_LENGTH); - - // Test with longer message - const longMessage = - "This is a very long message that exceeds the 60 character limit and should be truncated"; - const result2 = (await tool.execute!( - { emoji: "βœ…", message: longMessage }, - mockToolCallOptions - )) as { success: boolean; message: string }; - expect(result2.success).toBe(true); - expect(result2.message).toBe(longMessage.slice(0, STATUS_MESSAGE_MAX_LENGTH - 1) + "…"); - expect(result2.message.length).toBe(STATUS_MESSAGE_MAX_LENGTH); - }); - - it("should accept empty message", async () => { - const tool = createStatusSetTool(mockConfig); - - const result = (await tool.execute!({ emoji: "βœ…", message: "" }, mockToolCallOptions)) as { - success: boolean; - }; - expect(result.success).toBe(true); - }); - }); - - describe("url parameter", () => { - it("should accept valid URLs", async () => { - const tool = createStatusSetTool(mockConfig); - - const validUrls = [ - "https://github.com/owner/repo/pull/123", - "http://example.com", - "https://example.com/path/to/resource?query=param", - ]; - - for (const url of validUrls) { - const result = (await tool.execute!( - { emoji: "πŸ”", message: "Test", url }, - mockToolCallOptions - )) as { - success: boolean; - url: string; - }; - expect(result.success).toBe(true); - expect(result.url).toBe(url); - } - }); - - it("should work without URL parameter", async () => { - const tool = createStatusSetTool(mockConfig); - - const result = (await tool.execute!( - { emoji: "βœ…", message: "Test" }, - mockToolCallOptions - )) as { - success: boolean; - url?: string; - }; - expect(result.success).toBe(true); - expect(result.url).toBeUndefined(); - }); - - it("should omit URL from result when undefined", async () => { - const tool = createStatusSetTool(mockConfig); - - const result = (await tool.execute!( - { emoji: "βœ…", message: "Test", url: undefined }, - mockToolCallOptions - )) as { - success: boolean; - }; - expect(result.success).toBe(true); - expect("url" in result).toBe(false); - }); - }); -}); diff --git a/src/node/services/tools/status_set.ts b/src/node/services/tools/status_set.ts deleted file mode 100644 index 0a8b138cc8..0000000000 --- a/src/node/services/tools/status_set.ts +++ /dev/null @@ -1,77 +0,0 @@ -import { tool } from "ai"; -import type { ToolFactory } from "@/common/utils/tools/tools"; -import { TOOL_DEFINITIONS } from "@/common/utils/tools/toolDefinitions"; -import { STATUS_MESSAGE_MAX_LENGTH } from "@/common/constants/toolLimits"; -import type { StatusSetToolResult } from "@/common/types/tools"; - -/** - * Validates that a string is a single emoji character - * Uses Intl.Segmenter to count grapheme clusters (handles variation selectors, skin tones, etc.) - */ -function isValidEmoji(str: string): boolean { - if (!str) return false; - - // Use Intl.Segmenter to count grapheme clusters (what users perceive as single characters) - // This properly handles emojis with variation selectors (like ✏️), skin tones, flags, etc. - const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" }); - const segments = [...segmenter.segment(str)]; - - // Must be exactly one grapheme cluster - if (segments.length !== 1) { - return false; - } - - // Check if it's an emoji using Unicode properties - const emojiRegex = /^[\p{Emoji_Presentation}\p{Extended_Pictographic}]/u; - return emojiRegex.test(segments[0].segment); -} - -/** - * Truncates a message to a maximum length, adding an ellipsis if truncated - */ -function truncateMessage(message: string, maxLength: number): string { - if (message.length <= maxLength) { - return message; - } - // Truncate to maxLength-1 and add ellipsis (total = maxLength) - return message.slice(0, maxLength - 1) + "…"; -} - -/** - * Status set tool factory for AI assistant - * Creates a tool that allows the AI to set status indicator showing current activity - * - * The status is displayed IMMEDIATELY when this tool is called, even before other - * tool calls complete. This prevents agents from prematurely declaring success - * (e.g., "PR checks passed") when operations are still pending. Agents should only - * set success status after confirming the outcome of long-running operations. - * - * @param config Required configuration (not used for this tool, but required by interface) - */ -export const createStatusSetTool: ToolFactory = () => { - return tool({ - description: TOOL_DEFINITIONS.status_set.description, - inputSchema: TOOL_DEFINITIONS.status_set.schema, - execute: ({ emoji, message, url }): Promise => { - // Validate emoji - if (!isValidEmoji(emoji)) { - return Promise.resolve({ - success: false, - error: "emoji must be a single emoji character", - }); - } - - // Truncate message if necessary - const truncatedMessage = truncateMessage(message, STATUS_MESSAGE_MAX_LENGTH); - - // Tool execution is a no-op on the backend - // The status is tracked by StreamingMessageAggregator and displayed in the frontend - return Promise.resolve({ - success: true, - emoji, - message: truncatedMessage, - ...(url && { url }), - }); - }, - }); -}; diff --git a/src/node/services/workspaceService.test.ts b/src/node/services/workspaceService.test.ts index 499a7aac56..23e308577a 100644 --- a/src/node/services/workspaceService.test.ts +++ b/src/node/services/workspaceService.test.ts @@ -20,7 +20,11 @@ import type { ExtensionMetadataService, ExtensionMetadataStreamingUpdate, } from "./ExtensionMetadataService"; -import type { FrontendWorkspaceMetadata, WorkspaceMetadata } from "@/common/types/workspace"; +import type { + FrontendWorkspaceMetadata, + WorkspaceActivitySnapshot, + WorkspaceMetadata, +} from "@/common/types/workspace"; import type { TaskService } from "./taskService"; import type { BackgroundProcessManager } from "./backgroundProcessManager"; import type { TerminalService } from "@/node/services/terminalService"; @@ -1613,7 +1617,10 @@ describe("WorkspaceService idle compaction dispatch", () => { await internals.updateStreamingStatus(workspaceId, false); expect(internals.idleCompactingWorkspaces.has(workspaceId)).toBe(false); - expect(setStreaming).toHaveBeenCalledWith(workspaceId, false, { hasTodos: false }); + expect(setStreaming).toHaveBeenCalledWith(workspaceId, false, { + hasTodos: false, + todoStatus: null, + }); }); }); @@ -1718,6 +1725,81 @@ describe("WorkspaceService streaming generation guard", () => { expect(setStreaming).toHaveBeenCalledWith(workspaceId, true, { model: "openai:gpt-4o" }); }); + test("todo snapshot refreshes run in call order for consecutive updates", async () => { + const workspaceId = "ws-todo-refresh-order"; + const firstWriteDeferred = createDeferred(); + const setTodoStatus = mock( + ( + _workspaceId: string, + todoStatus: { emoji: string; message: string } | null, + hasTodos: boolean + ) => { + if (todoStatus?.message === "First task") { + return firstWriteDeferred.promise; + } + return Promise.resolve({ + recency: Date.now(), + streaming: false, + lastModel: null, + lastThinkingLevel: null, + todoStatus, + hasTodos, + }); + } + ); + + let readCount = 0; + readTodosSpy = spyOn(todoStorageModule, "readTodosForSessionDir").mockImplementation(() => { + readCount += 1; + if (readCount === 1) { + return Promise.resolve([{ content: "First task", status: "in_progress" }]); + } + return Promise.resolve([{ content: "Second task", status: "in_progress" }]); + }); + + ( + workspaceService as unknown as { + extensionMetadata: ExtensionMetadataService; + } + ).extensionMetadata = { + setTodoStatus, + } as unknown as ExtensionMetadataService; + + const internals = workspaceService as unknown as { + updateTodoStatusFromStorage: (workspaceId: string) => Promise; + }; + + const firstRefresh = internals.updateTodoStatusFromStorage(workspaceId); + const secondRefresh = internals.updateTodoStatusFromStorage(workspaceId); + + await new Promise((resolve) => setTimeout(resolve, 0)); + expect(setTodoStatus).toHaveBeenCalledTimes(1); + expect(readCount).toBe(1); + + firstWriteDeferred.resolve({ + recency: Date.now(), + streaming: false, + lastModel: null, + lastThinkingLevel: null, + todoStatus: { emoji: "πŸ”„", message: "First task" }, + hasTodos: true, + }); + + await Promise.all([firstRefresh, secondRefresh]); + + expect(setTodoStatus).toHaveBeenCalledTimes(2); + expect(setTodoStatus.mock.calls[0]).toEqual([ + workspaceId, + { emoji: "πŸ”„", message: "First task" }, + true, + ]); + expect(setTodoStatus.mock.calls[1]).toEqual([ + workspaceId, + { emoji: "πŸ”„", message: "Second task" }, + true, + ]); + }); + test("handleStreamCompletion captures generation before awaiting recency updates", async () => { const workspaceId = "ws-stream-completion-generation"; const recencyDeferred = createDeferred(); @@ -3019,9 +3101,75 @@ describe("WorkspaceService metadata listeners", () => { expect(setStreaming).toHaveBeenCalledTimes(1); expect(setStreaming).toHaveBeenCalledWith(workspaceId, false, { hasTodos: false, + todoStatus: null, generation: 0, }); }); + + test("todo_write events publish todo-derived sidebar status", async () => { + const workspaceId = "ws-todo-status"; + const setTodoStatus = mock(() => + Promise.resolve({ + recency: Date.now(), + streaming: true, + lastModel: null, + lastThinkingLevel: null, + agentStatus: null, + }) + ); + const readTodosSpy = spyOn(todoStorageModule, "readTodosForSessionDir").mockResolvedValue([ + { content: "Run typecheck", status: "in_progress" }, + { content: "Add tests", status: "pending" }, + ]); + + class FakeAIService extends EventEmitter { + isStreaming = mock(() => false); + getWorkspaceMetadata = mock(() => + Promise.resolve({ success: false as const, error: "not found" }) + ); + } + + const aiService = new FakeAIService() as unknown as AIService; + const mockConfig: Partial = { + srcDir: "/tmp/src", + getSessionDir: mock(() => "/tmp/test/sessions"), + findWorkspace: mock(() => null), + loadConfigOrDefault: mock(() => ({ projects: new Map() })), + }; + const mockExtensionMetadata: Partial = { setTodoStatus }; + + new WorkspaceService( + mockConfig as Config, + historyService, + aiService, + mockInitStateManager as InitStateManager, + mockExtensionMetadata as ExtensionMetadataService, + mockBackgroundProcessManager as BackgroundProcessManager + ); + + try { + aiService.emit("tool-call-end", { + type: "tool-call-end", + workspaceId, + messageId: "msg-1", + toolCallId: "tool-1", + toolName: "todo_write", + result: { success: true, count: 2 }, + timestamp: Date.now(), + }); + + await new Promise((resolve) => setTimeout(resolve, 0)); + + expect(readTodosSpy).toHaveBeenCalledWith("/tmp/test/sessions"); + expect(setTodoStatus).toHaveBeenCalledWith( + workspaceId, + { emoji: "πŸ”„", message: "Run typecheck" }, + true + ); + } finally { + readTodosSpy.mockRestore(); + } + }); }); describe("WorkspaceService archive lifecycle hooks", () => { diff --git a/src/node/services/workspaceService.ts b/src/node/services/workspaceService.ts index 6c7b9faf5f..fd779a0eaa 100644 --- a/src/node/services/workspaceService.ts +++ b/src/node/services/workspaceService.ts @@ -51,6 +51,7 @@ import { detectDefaultTrunkBranch, listLocalBranches } from "@/node/git"; import { shellQuote } from "@/node/runtime/backgroundCommands"; import { extractEditedFilePaths } from "@/common/utils/messages/extractEditedFiles"; import { buildCompactionMessageText } from "@/common/utils/compaction/compactionPrompt"; +import { deriveTodoStatus } from "@/common/utils/todoList"; import { fileExists } from "@/node/utils/runtime/fileExists"; import { orchestrateFork } from "@/node/services/utils/forkOrchestrator"; import { generateWorkspaceIdentity } from "@/node/services/workspaceTitleGenerator"; @@ -165,7 +166,11 @@ const MAX_WORKSPACE_NAME_COLLISION_RETRIES = 3; // Shared type for workspace-scoped AI settings (model + thinking) type WorkspaceAISettings = z.infer; -type WorkspaceAgentStatus = NonNullable; +interface WorkspaceAgentStatus { + emoji: string; + message: string; + url?: string; +} type WorkspaceRuntimeStatus = "running" | "stopped" | "unknown" | "unsupported"; const POST_COMPACTION_METADATA_REFRESH_DEBOUNCE_MS = 100; @@ -1081,6 +1086,10 @@ export class WorkspaceService extends EventEmitter { // from older streams from clobbering a newer streaming=true snapshot after async awaits. private readonly streamingGenerations = new Map(); + // Serialize todo snapshot refreshes so back-to-back todo_write/propose_plan updates cannot + // finish out of order and briefly restore stale progress in workspace activity metadata. + private readonly todoStatusUpdateQueue = new Map>(); + // AbortControllers for in-progress workspace initialization (postCreateSetup + initWorkspace). // // Why this lives here: archive/remove are the user-facing lifecycle operations that should @@ -1251,6 +1260,8 @@ export class WorkspaceService extends EventEmitter { "result" in v; const extractStatusSetResult = (result: unknown): WorkspaceAgentStatus | null => isObj(result) && result.success === true ? coerceAgentStatus(result) : null; + const isSuccessfulToolResult = (result: unknown): result is { success: true } => + isObj(result) && result.success === true; // Update streaming status and recency on stream start this.aiService.on("stream-start", (data: unknown) => { if (isStreamStartEvent(data)) { @@ -1283,16 +1294,26 @@ export class WorkspaceService extends EventEmitter { }); this.aiService.on("tool-call-end", (data: unknown) => { - if (!isToolCallEndEvent(data) || data.replay === true || data.toolName !== "status_set") { + if (!isToolCallEndEvent(data) || data.replay === true) { return; } - const agentStatus = extractStatusSetResult(data.result); - if (!agentStatus) { + if (data.toolName === "status_set") { + const agentStatus = extractStatusSetResult(data.result); + if (!agentStatus) { + return; + } + + void this.updateAgentStatus(data.workspaceId, agentStatus); return; } - void this.updateAgentStatus(data.workspaceId, agentStatus); + if ( + (data.toolName === "todo_write" || data.toolName === "propose_plan") && + isSuccessfulToolResult(data.result) + ) { + void this.updateTodoStatusFromStorage(data.workspaceId); + } }); } @@ -1345,19 +1366,44 @@ export class WorkspaceService extends EventEmitter { ); } + private async updateTodoStatusFromStorage(workspaceId: string): Promise { + const previousUpdate = this.todoStatusUpdateQueue.get(workspaceId) ?? Promise.resolve(); + const nextUpdate = previousUpdate + .catch(() => undefined) + .then(async () => { + const sessionDir = this.config.getSessionDir(workspaceId); + const todos = await readTodosForSessionDir(sessionDir); + const todoStatus = deriveTodoStatus(todos) ?? null; + + await this.emitWorkspaceActivityUpdate(workspaceId, "update workspace todo status", () => + this.extensionMetadata.setTodoStatus(workspaceId, todoStatus, todos.length > 0) + ); + }); + + this.todoStatusUpdateQueue.set(workspaceId, nextUpdate); + try { + await nextUpdate; + } finally { + if (this.todoStatusUpdateQueue.get(workspaceId) === nextUpdate) { + this.todoStatusUpdateQueue.delete(workspaceId); + } + } + } + private async updateStreamingStatus( workspaceId: string, streaming: boolean, update: ExtensionMetadataStreamingUpdate = {} ): Promise { try { - let { hasTodos } = update; - if (!streaming && hasTodos === undefined) { - // Stop snapshots need an authoritative todo bit even for background workspaces, + let { hasTodos, todoStatus } = update; + if (!streaming && (hasTodos === undefined || todoStatus === undefined)) { + // Stop snapshots need an authoritative todo summary even for background workspaces, // and centralizing the read here preserves the fire-and-forget abort/error handlers. const sessionDir = this.config.getSessionDir(workspaceId); const todos = await readTodosForSessionDir(sessionDir); - hasTodos = todos.length > 0; + hasTodos ??= todos.length > 0; + todoStatus ??= deriveTodoStatus(todos) ?? null; } if ( !streaming && @@ -1371,6 +1417,7 @@ export class WorkspaceService extends EventEmitter { const snapshot = await this.extensionMetadata.setStreaming(workspaceId, streaming, { ...update, + ...(todoStatus !== undefined ? { todoStatus } : {}), ...(hasTodos !== undefined ? { hasTodos } : {}), }); // Idle compaction tagging is stop-snapshot only. Never tag streaming=true updates, diff --git a/src/node/utils/extensionMetadata.ts b/src/node/utils/extensionMetadata.ts index a0b8f87bc8..551f77d181 100644 --- a/src/node/utils/extensionMetadata.ts +++ b/src/node/utils/extensionMetadata.ts @@ -22,9 +22,11 @@ export interface ExtensionMetadata { lastModel: string | null; lastThinkingLevel: ThinkingLevel | null; agentStatus: ExtensionAgentStatus | null; + displayStatus?: ExtensionAgentStatus | null; + todoStatus?: ExtensionAgentStatus | null; hasTodos?: boolean; - // Persists the latest status_set URL so later status_set calls without a URL - // can still carry the last deep link even after agentStatus is cleared. + // Persists the latest display-status URL so later updates without a URL + // can still carry the last deep link even after displayStatus is cleared. lastStatusUrl?: string | null; } @@ -78,6 +80,19 @@ export function coerceExtensionMetadata(value: unknown): ExtensionMetadata | nul return null; } + const displayStatus = + "displayStatus" in record + ? record.displayStatus === null + ? null + : (coerceAgentStatus(record.displayStatus) ?? undefined) + : undefined; + const todoStatus = + "todoStatus" in record + ? record.todoStatus === null + ? null + : (coerceAgentStatus(record.todoStatus) ?? undefined) + : undefined; + return { recency: record.recency, streaming: record.streaming, @@ -87,6 +102,8 @@ export function coerceExtensionMetadata(value: unknown): ExtensionMetadata | nul lastModel: typeof record.lastModel === "string" ? record.lastModel : null, lastThinkingLevel: isThinkingLevel(record.lastThinkingLevel) ? record.lastThinkingLevel : null, agentStatus: coerceAgentStatus(record.agentStatus), + ...(displayStatus !== undefined ? { displayStatus } : {}), + ...(todoStatus !== undefined ? { todoStatus } : {}), ...(typeof record.hasTodos === "boolean" ? { hasTodos: record.hasTodos } : {}), lastStatusUrl: coerceStatusUrl(record.lastStatusUrl), }; @@ -95,6 +112,17 @@ export function coerceExtensionMetadata(value: unknown): ExtensionMetadata | nul export function toWorkspaceActivitySnapshot( metadata: ExtensionMetadata ): WorkspaceActivitySnapshot { + const displayStatus = metadata.displayStatus !== undefined ? metadata.displayStatus : null; + const todoStatus = + metadata.todoStatus !== undefined + ? metadata.todoStatus + : metadata.hasTodos === false + ? null + : // Upgrade bridge: existing extensionMetadata.json entries may only have the old + // agentStatus field. Project that forward into todoStatus until a fresh todo_write + // or stream-stop snapshot rewrites the workspace metadata. + coerceAgentStatus(metadata.agentStatus); + return { recency: metadata.recency, streaming: metadata.streaming, @@ -103,7 +131,8 @@ export function toWorkspaceActivitySnapshot( : {}), lastModel: metadata.lastModel ?? null, lastThinkingLevel: metadata.lastThinkingLevel ?? null, - agentStatus: coerceAgentStatus(metadata.agentStatus), + ...(displayStatus ? { displayStatus } : {}), + ...(todoStatus ? { todoStatus } : {}), ...(typeof metadata.hasTodos === "boolean" ? { hasTodos: metadata.hasTodos } : {}), }; } diff --git a/tests/ui/workspaces/intermediateStatus.test.ts b/tests/ui/workspaces/intermediateStatus.test.ts index 96df38e154..2ca6e37501 100644 --- a/tests/ui/workspaces/intermediateStatus.test.ts +++ b/tests/ui/workspaces/intermediateStatus.test.ts @@ -1,7 +1,7 @@ /** - * UI integration test for the β€œworking but no status_set yet” intermediate status. + * UI integration test for the β€œworking but no todo-derived status yet” intermediate state. * - * Expectation: While a stream is starting and no status_set tool has been received, + * Expectation: While a stream is starting and the agent has not written a todo list yet, * the workspace sidebar row should show provider icon + model name + starting label. */ @@ -27,7 +27,7 @@ describe("Workspace intermediate status (mock AI router)", () => { await preloadTestModules(); }); - test("shows model + starting while stream is starting and before status_set", async () => { + test("shows model + starting while stream is starting and before todos appear", async () => { const app = await createAppHarness({ branchPrefix: "status-intermediate" }); const collector = createStreamCollector(app.env.orpc, app.workspaceId);