ref: Unify rendering pipeline with structured output and snapshot tests#318
Closed
cameroncooke wants to merge 50 commits intomainfrom
Closed
ref: Unify rendering pipeline with structured output and snapshot tests#318cameroncooke wants to merge 50 commits intomainfrom
cameroncooke wants to merge 50 commits intomainfrom
Conversation
- Migrate all test tools to canonical single-pipeline pattern with createPendingXcodebuildResponse (test_sim, test_device, test_macos) - Add test discovery preflight to device and macOS test tools - Migrate 5 query tools to consistent formatting with front matter, structured error blocks, and clean output (list_schemes, show_build_settings, get_sim_app_path, get_mac_app_path, get_device_app_path) - Add manifest-level when field for next step templates (success/failure/always) to control when next steps appear - Group warnings before summary with yellow triangle symbol - Fix compiler error rendering for multi-line errors - Fix test failure rendering to use consistent indented format with relative file paths - Fix linker warning rendering to use structured format - Remove dead code (legacy finalization helpers, dead modules, unused DI patterns) - Extract duplicate deriveDiagnosticBaseDir to shared module - Create shared xcodebuild-error-utils for query tool error parsing
Replace ad-hoc text formatting across all tools with a single structured event pipeline. Every tool now emits typed PipelineEvent objects (header, statusLine, section, detailTree, table, fileRef, summary) that flow through shared renderers for consistent output formatting. Key changes: - Add PipelineEvent union type replacing XcodebuildEvent with canonical types (HeaderEvent, StatusLineEvent, SectionEvent, DetailTreeEvent, TableEvent, FileRefEvent) plus xcodebuild-specific types (BuildStageEvent, CompilerWarningEvent, CompilerErrorEvent, test events) - Add toolResponse() entry point that feeds events through shared resolveRenderers() pipeline (MCP buffer + CLI stdout) - Migrate all ~50 tool files to use toolResponse([events]) instead of createTextResponse/createErrorResponse/ad-hoc content construction - Migrate infrastructure (typed-tool-factory, tool-invoker, simulator-utils, axe-helpers, ui-automation-guard, validation, xcode-tools-bridge) to route all errors through the pipeline - Delete createErrorResponse, createTextResponse, consolidateContentForClaudeCode as there is now a single call site for response construction (toolResponse) - Default manifest nextSteps to when:success so next steps never appear on error paths - Fix daemon-routed tools: only set pipelineStreamMode:'complete' when a CLI renderer actually wrote to stdout - Regenerate all 74 snapshot fixtures with consistent pipeline formatting
Add vitest.snapshot.config.ts for snapshot integration tests. Exclude src/snapshot-tests/ from the default vitest config so `npm test` runs only unit tests. Update package.json scripts so each test suite has its own config with no overlap: - npm test: unit tests only (vitest.config.ts) - npm run test:snapshot: snapshot integration tests (vitest.snapshot.config.ts) - npm run test:smoke: smoke tests (vitest.smoke.config.ts)
10 rounds of code simplification after the pipeline migration: - Extract shared allText() test helper, replacing 52 duplicate definitions (allText/textOf/joinText) across 48 test files - Extract shared executeAxeCommand to ui-automation/shared/axe-command.ts, removing ~500 lines of identical code across 10 UI automation tools - Deduplicate header() construction across 22 tool files - Deduplicate build stage label maps in event-formatting.ts - Reuse formatHumanCompilerWarningEvent in formatGroupedWarnings - Eliminate 2 barrel re-export files (responses/index.ts, validation/index.ts) - Remove dead exports (summary builder, XcodebuildEvent alias) - Remove dead doctor showAsciiLogo parameter - Remove ~100+ redundant comments across test and tool files - Normalize assertion patterns (toBeFalsy for success, toBe(true) for error) - Fix formatting across all modified files Net: -1700 lines, no behavioral changes.
- Fix 3-space indent for header params and icon-prefixed section lines to align with emoji grid - Fix duplicate headers in xcodebuild error paths: build-utils no longer emits its own header when a pipeline is active - Add deviceId to pipeline header params for device build-and-run and test tools - Add -allowProvisioningUpdates to xcodebuild commands to prevent interactive keychain prompts corrupting CLI output - Preserve test failure details and discovery in snapshot normalizer (removed over-aggressive TEST_FAILURE_BLOCK_REGEX and TEST_DISCOVERY_REGEX stripping) - Fix section line splitting in debugging tools so multi-line output (stack, variables, lldb-command) is properly indented per line - Add debugging happy-path snapshot tests (attach, stack, add-breakpoint, lldb-command, remove-breakpoint, detach) using live simulator debugger - Add device error-path fixtures (stop, install, launch) and fix test--failure fixture naming - Add @main to Calculator example app for Xcode 26.4 compatibility - Regenerate all snapshot fixtures
Add withErrorHandling() higher-order wrapper and toErrorMessage() utility to eliminate identical try/catch boilerplate repeated across nearly every *Logic function in src/mcp/tools/. The wrapper centralizes error normalization, optional logging, and toolResponse assembly while preserving exact error messages via per-tool options. Tools with class-specific error branching (UI automation) or raw ToolResponse returns (get_*_app_path) use the mapError escape hatch. No behavior changes — all 1466 unit tests and 135 snapshot tests pass with zero fixture updates.
…acy code
Convert 9 manual-handler tools to use createTypedToolWithContext, fixing a
validation bug in scaffold_ios_project and scaffold_macos_project where
.parse() threw unhandled ZodErrors instead of returning tool error responses.
- scaffold_ios_project, scaffold_macos_project: createTypedToolWithContext
- xcode_ide_call_tool, xcode_ide_list_tools: createTypedToolWithContext
- xcode_tools_bridge_{disconnect,status,sync}: createTypedToolWithContext
- session_show_defaults: createTypedToolWithContext
- swift_package_stop: createTypedToolWithContext
Remove legacy compatibility code:
- Remove default-export fallback from import-tool-module.ts
- Strip dead types from plugin-types.ts (PluginMeta, WorkflowGroup, etc.)
- Inline handler type in runtime/types.ts
- Delete unused workflow-selection.ts and its tests
Resources now use the same manifest-driven architecture as tools: - YAML manifests in manifests/resources/ define metadata, URIs, and predicates - Dynamic module import via import-resource-module.ts (named exports only) - Predicate-aware registration filters resources using PredicateContext - xcode-ide-state is now gated by runningUnderXcodeAgent predicate Resource modules converted from default exports to named handler exports. Resource registration in bootstrap.ts now receives PredicateContext. Added resource visibility helpers to exposure.ts, resource manifest schema and loader support, snapshot tests for 4 resources, and updated docs.
…setup Production executor factories (getDefaultCommandExecutor, getDefaultFileSystemExecutor, getDefaultInteractiveSpawner) no longer contain test-environment checks. Safety blocking for unit tests is now handled by vitest-executor-safety.setup.ts which installs noop overrides via beforeEach/afterEach, loaded only by the unit test vitest config. - Remove all process.env.VITEST/NODE_ENV/SNAPSHOT_TEST_REAL_EXECUTOR checks from command.ts and interactive-process.ts - Add interactive spawner override seam (__setTestInteractiveSpawnerOverride) - Add createNoopInteractiveSpawner to mock-executors.ts - Remove SNAPSHOT_TEST_REAL_EXECUTOR env toggles from snapshot harnesses - Refactor swift_package_run.ts to remove test-env branching - Update smoke harness to patch interactive spawner overrides
Document findings from investigating the structured event pipeline migration status, renderer architecture rationale, and holdout tools.
…line Re-migrate tools that were previously migrated to the pipeline (ac33b97f) then reverted to manual text construction (c0693a1d). Extends the pipeline infrastructure where needed to produce the fixture-defined output format rather than bypassing the pipeline. Infrastructure extensions: - Add blankLineAfterTitle to SectionEvent for Errors/platform group spacing - Add suppressCliStream to toolResponse() for late-bound CLI next steps - Add extractQueryErrorMessages() structured parser in xcodebuild-error-utils - Update formatSectionEvent() to honor blank line and empty line rendering Tool migrations: - get_sim_app_path: replace formatToolPreflight + manual text with events - get_device_app_path: same migration pattern - get_mac_app_path: same migration pattern - list_devices: replace renderGroupedDevices() with section events per platform - swift_package_run: remove manual error fallback content - swift_package_test: remove manual error fallback content All 1446 unit tests pass. All snapshot fixtures match unmodified.
…nt union The renderer interface handles generic PipelineEvents used by all tools, not just xcodebuild. Rename to PipelineRenderer to reflect actual usage. Split PipelineEvent into CommonPipelineEvent (generic UI/output events) and BuildTestPipelineEvent (xcodebuild/swift-toolchain-specific events) to make the type boundary explicit. The full PipelineEvent union remains as their combination for backward compatibility.
Normalize environment-specific data in resource-harness that caused flaky doctor and device snapshot failures: - argv and execPath (vary by node/vitest install location) - nvm version in PATH entries - node_modules/.bin PATH entries injected by npm at runtime - Process tree line count (varies per run context) - Device connection status checkmarks (depend on physical device state)
The build_run_sim, build_run_device, and build_run_macos orchestrators inlined the same command construction and result parsing that existed in their corresponding step tools (install, launch, boot). This created duplication across 8 files that diverged independently. Extract reusable internal primitives into three new modules: - simulator-steps.ts: findSimulatorById, installAppOnSimulator, launchSimulatorApp - device-steps.ts: installAppOnDevice, launchAppOnDevice - macos-steps.ts: launchMacApp Both orchestrators and step-tool handlers now delegate to these shared primitives, following the existing handleTestLogic pattern used by the test tools.
…alone logging Remove the 5 standalone logging tools (start/stop device/sim log capture, launch_app_logs_sim) and the logging workflow. Log capture is now automatic in build_run and launch_app tools, with log file paths included in responses. Adds derived-data-path and log-paths utilities to support the integrated log capture approach. Tool count reduced from 76 to 71 across 13 workflows.
Remove the __flowdeck_fixtures__, __fixtures_designed__ directories, all flowdeck test files, and the flowdeck harness. These alternative fixture formats are no longer needed.
Enhance displayPath to use ~/ shorthand for paths under the user's home directory when cwd-relative display isn't possible. Apply displayPath to log paths in build-run results, launch-app details, and swift-package run output. Update snapshot normalizer to handle ~/ for fixture compatibility.
Existing changes from prior work (home directory path collapsing, simulator steps, snapshot normalization). These are the starting point. Baseline: 5 pre-existing test failures, 1399 passing.
New files: - src/rendering/types.ts: RenderSession, ToolHandlerContext, ImageAttachment types - src/rendering/render.ts: createRenderSession(strategy), renderEvents() Text strategy replicates mcp-renderer grouping/flush logic. JSON strategy produces JSONL output. Both track events, attachments, and error state.
Session and context infrastructure: - RenderSession created by boundary, passed through invoker to handlers - ToolHandlerContext available via AsyncLocalStorage (getHandlerContext) - Factory wrappers detect void returns and build ToolResponse from session - Backward compatible: existing handlers returning ToolResponse still work Unit tests: 2 pre-existing failures, 1403 passing Snapshot tests: 1 env-specific failure (doctor PATH), 131 passing
65 tool handlers migrated + 50 test files updated. Unit tests: 2 pre-existing failures, 1403 passing. Snapshot tests: 60 failures from double-rendering (CLI boundary not yet updated — needs steps 6-7).
toolResponse() now skips CLI renderers when running inside the factory's AsyncLocalStorage context. This prevents double-rendering for migrated tools. Remaining: 7 snapshot failures from CLI text renderer formatting differences. The render session uses MCP renderer formatting, but CLI subprocess tests expect CLI text renderer formatting. This requires CLI boundary (step 6) to be implemented.
…s 3+6) Migrated 65 non-pipeline tool handlers to ctx.emit pattern via getHandlerContext(). All tools use AsyncLocalStorage to access the handler context without positional parameter conflicts. CLI boundary: printToolResponse re-renders session events through the CLI text renderer for proper spacing. Next-steps appended as events in _meta for unified rendering. Double-rendering prevented by suppressing CLI renderer in toolResponse() when inside handler context. Unit tests: 2 pre-existing failures, 1403 passing Snapshot tests: 1 env-specific failure (doctor PATH), 131 passing
- Pipeline accepts emit callback from handler context - Inline finalization replaces pending pattern - Parser debug file ref defaults to false - Error fallback policy defaults to if-no-structured-diagnostics Unit tests: 2 pre-existing failures, 1402 passing Snapshot tests: 10 failures to investigate
Pipeline tools migrated from pending pattern to inline finalization with ctx.emit. Updated swift-package build error fixture to reflect improved structured error output (errors section + summary + log path). Unit tests: 2 pre-existing failures, 1402 passing Snapshot tests: 1 env-specific failure (doctor PATH), 131 passing
Added normalizations for volatile doctor output values: - Node.js version, OS release, CPU, memory - Xcode version and paths, dependency versions - TMPDIR, homedir, full PATH section - Tool counts and server version Updated debugging stack trace fixture for current runtime frames. Updated swift-package build error fixture for structured error output. All snapshot tests pass. Unit tests: 2 pre-existing failures, 1402 passing.
- Run docs:update for TOOLS.md and TOOLS-CLI.md - Update debugging stack trace fixture for current runtime frames
Technical debt from the refactor that can be addressed incrementally: - Remove hybrid toolResponse() usage from withErrorHandling mapError paths - Clean up ToolResponse return types in tool handler signatures - Daemon protocol v2 (events over wire) - Delete dead renderer code - Encapsulate ToolResponse type
Removed the IIFE + toolResponse() + _meta.events extraction pattern from ~40 tool handlers. Tools now emit events directly via ctx.emit() without the intermediate ToolResponse construction. Remaining toolResponse() callers are legitimate: - mapError callbacks in withErrorHandling (returns ToolResponse by design) - Dual-mode tools (list_sims, list_devices) for non-ctx fallback - Pipeline tool error paths - External-facing functions (doctor, xcode-ide bridge) Unit tests: 2 pre-existing failures, 1402 passing Snapshot tests: all 13 files pass
- Added emit function to mapError callback for direct event emission - Converted all mapError callbacks from toolResponse() to emit() - Converted pipeline pre-flight errors to ctx.emit() - Converted record_sim_video to ctx.emit() - Updated record_sim_video tests for new pattern toolResponse() removed from 48 of 52 tool handler files. Remaining 4 are legitimate external API boundaries: - doctor.ts (external callers: doctor-cli, resources) - xcode-ide/shared.ts (bridge callers) - list_sims.ts, list_devices.ts (resource callers without handler context) Unit tests: 2 pre-existing failures, 1402 passing Snapshot tests: all 13 files pass
Zero toolResponse() callers remain in src/mcp/tools/. - doctor.ts: runDoctor returns events via eventsToToolResponse() - xcode-ide/shared.ts: withBridgeToolHandler emits via ctx directly - list_sims.ts, list_devices.ts: removed dual-mode fallback, always use ctx.emit (resources already set up AsyncLocalStorage) - New utility: src/utils/events-to-tool-response.ts for callers that need ToolResponse without the old renderer pipeline Unit tests: 2 pre-existing failures, 1402 passing Snapshot tests: all 13 files pass
… 10) Deleted dead code: - src/utils/tool-response.ts (zero callers) - src/utils/renderers/mcp-renderer.ts (zero callers) - src/utils/renderers/cli-jsonl-renderer.ts (zero callers) - src/utils/__tests__/tool-response.test.ts - src/utils/renderers/__tests__/mcp-renderer.test.ts Cleaned up: - renderers/index.ts: removed resolveRenderers() and dead imports - xcodebuild-pipeline.ts: removed renderer fallback (emit is required) - xcodebuild-output.ts: removed mcpContent from PipelineResult - Pipeline tests updated to provide emit callback toolResponse() is gone. Events flow through ctx.emit only. Unit tests: 2 pre-existing failures, 135 files pass Snapshot tests: all 13 files pass, 133 tests pass
6 build/build-run tool handlers no longer reference ToolResponse. Return types narrowed from Promise<ToolResponse | void> to Promise<void>. 7 files legitimately retain ToolResponse: - doctor, xcode-ide/shared (external API) - list_sims, list_devices (dual-mode for resources) - test_sim/device/macos (handleTestLogic return type)
…inal) - test_sim/device/macos: handleTestLogic returns void, emits via ctx - list_sims/list_devices: removed dual-mode fallback, always emit via ctx - doctor: runDoctor returns inferred type, uses renderEvents - build tool return types narrowed to Promise<void> Only src/mcp/tools/xcode-ide/shared.ts retains ToolResponse — it receives (not constructs) a ToolResponse from the external Xcode bridge. toolResponse() deleted. resolveRenderers() deleted. mcp-renderer.ts deleted. cli-jsonl-renderer.ts deleted. Unit tests: 2 pre-existing failures, 135 pass Snapshot tests: all 13 files pass, 133 tests pass
Bridge now returns BridgeToolResult (events + images + metadata) instead of ToolResponse. shared.ts emits bridge events directly via ctx without unpacking a ToolResponse wrapper. Zero ToolResponse references remain in src/mcp/tools/.
ToolResponse now exists only at transport boundaries: - types/common.ts (definition) - tool-registry.ts (MCP envelope construction) - daemon protocol/server/client (wire format) - runtime/tool-invoker.ts (postProcessToolResponse for MCP boundary) Eliminated from: - CLI output (printSessionOutput replaces printToolResponse) - CLI register-tool-commands (reads from session directly) - CLI tool-catalog (emits into session) - typed-tool-factory (returns void, no ToolResponse construction) - tool-error-handling (legacy non-ctx overload deleted) - build-utils (returns BuildCommandResult, not ToolResponse) - list_sims/list_devices (dual-mode removed, always use ctx) - resources/simulators/devices (set up AsyncLocalStorage) Deleted: src/utils/events-to-tool-response.ts (zero callers) Unit tests: 2 pre-existing failures, 1394 passing Snapshot tests: 2 env-specific failures (no physical device), 131 passing
RenderSession.emit() now returns void instead of TextRenderOp, removing the dual text/event tracking pattern. CLI output modes (text, json) are handled by dedicated render sessions (cli-text, cli-json) that write to stdout directly, eliminating the printSessionOutput intermediary. Daemon protocol bumped to v2 with DaemonToolResult replacing ToolResponse on the wire. Added DaemonVersionMismatchError and auto-restart logic so CLI transparently recovers when connecting to a stale daemon. Removed postProcessToolResponse in favor of postProcessSession which operates on sessions directly across MCP, daemon, and snapshot harness boundaries.
The tool factory functions already returned a ToolTestResult-shaped object at runtime when called without a ToolHandlerContext (test mode), but lied to TypeScript via `as unknown as void`. This caused 350+ typecheck errors across test files accessing .content and .isError on void. Add ToolHandler overloaded interface that reflects the actual runtime behavior: void with context (production), ToolTestResult without (tests). Update callHandler return type and allText helper to accept broader content types. Fix miscellaneous test type issues (findLast, readonly properties, mock return types).
resolveAppPid scanned all output lines with a bracket regex that matched 3+ digit numbers like [404], causing HTTP status codes to be returned as PIDs. Restrict to checking only the first non-empty line using the colon format (e.g. com.example.app: 42567), which is the actual simctl output pattern.
The tilde replacement only handled ~/path patterns. Add a rule for bare ~ (preceded by whitespace or colon, followed by whitespace or EOL) so that outputs like 'Home: ~' are normalized to 'Home: <HOME>'.
|
🚧 Skipped: PR exceeds review size limit. Please split into smaller PRs and re-run. |
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major refactor of how all tool and resource output is produced across both MCP and CLI transports. Previously, rendering logic was scattered across individual tool handlers — each tool constructed its own
ToolResponsedirectly, leading to inconsistent output between MCP and CLI, duplicated formatting, and no way to verify output without manual testing.This PR introduces a unified rendering pipeline with a clean separation: tools emit structured events, a render session produces text, and protocol-specific result types are constructed only at the transport boundary. Snapshot tests lock down the output contract so regressions are caught automatically.
Architecture
PipelineEventobjects through a handler context. The render session owns the result.ctx.emitin real-time.Render strategies
Two strategies —
text(human-readable) andjson(JSONL) — selected at the boundary. The text strategy uses existing formatters fromevent-formatting.ts. The JSON strategy isJSON.stringify(event) + '\n'.Boundary flows
Key changes
return toolResponse([...])toctx.emit(event)calls via typed tool factoriessimulator-steps,macos-steps,test-preflight,swift-test-discovery) to deduplicate build/run/test workflowsstart_sim_log_cap,stop_sim_log_cap,start_device_log_cap,stop_device_log_cap,launch_app_logs_sim)~/in output for readability