test(cockpit): aimock e2e — c-subagents (Phase 3)#364
Merged
Conversation
…oundary First implementer attempt (Task 3 direct LLM invocation) failed because the c-subagents `task` tool dispatches to subagent functions that each run their own LLM-driven agent loops. Aimock 404s on the un-captured subagent LLM calls at replay time. Replace the direct-LLM-invocation capture with aimock --record mode: run real langgraph dev against aimock proxying to real OpenAI. Captures every LLM call in the full graph (orchestrator + each subagent + nested tool sub-rounds) at the HTTP boundary. Reusable pattern for future multi-LLM examples (c-interrupts, c-generative-ui dashboard).
…proach Switch from a direct-LLM Python script to a shell script that proxies the real langgraph dev server through aimock in --record mode. Captures every LLM call uniformly (orchestrator + each subagent's nested calls + tool sub-rounds) at the HTTP layer. Fixture: 9 entries covering the orchestrator's three task dispatches plus each subagent's tool round-trip (research/booking/itinerary).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a per-example aimock e2e for `c-subagents` (orchestrator LLM with a `task` tool that dispatches subagents). Third per-example spec under the harness library landed in Phase 2 (#356).
What changed
Sits on Phase 2 (#356) + c-* aviation refactor PR 1 (#347).
New capture pattern (reusable)
Future cockpit examples with nested LLM flows (c-interrupts when refactored, c-generative-ui dashboard) use the same aimock --record approach. Pattern: start aimock with `--record --provider-openai`, start langgraph dev with `OPENAI_BASE_URL` pointed at aimock, submit a run via the LangGraph SDK HTTP API, poll until success, aimock writes the fixture file.
Note: the underlying aimock CLI binary is `llmock` (legacy alias), not `aimock` (primary, requires `--config `). Both ship in the same npm package.
Spec assertion shape
Same as chat-aimock Phase 2d (research-subagent): asserts on the durable tool-call chip ("Called task" button rendered by chat-tool-calls primitive) + a content phrase from the captured continuation. Avoided `` because that primitive only renders while a subagent is in RUNNING state — once subagents complete (which is the state `sendPromptAndWait` returns at), the cards are filtered out of the DOM.
Test plan
Spec: `docs/superpowers/specs/2026-05-16-cockpit-aimock-c-subagents-design.md`
Plan: `docs/superpowers/plans/2026-05-16-cockpit-aimock-c-subagents.md`