|
| 1 | +# aimock E2E — Phase 2c: A2UI v1 single-bubble invariant |
| 2 | + |
| 3 | +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (`- [ ]`) syntax. |
| 4 | +
|
| 5 | +**Goal:** Add a Playwright scenario that drives an A2UI v1 GenUI prompt through the aimock harness and asserts the single-bubble invariant from PR #297 — exactly ONE assistant bubble per GenUI turn, with `<a2ui-surface>` rendered inside it. |
| 6 | + |
| 7 | +**Architecture:** The mock returns a tool-call response (`render_a2ui_surface` with envelopes as args). The Python graph processes the tool_call and re-emits the surface as `---a2ui_JSON---\n`-prefixed content in the same AI message. Angular renders one bubble with the surface inside. |
| 8 | + |
| 9 | +**Scope:** single-bubble invariant only. Progressive mount (the `a2ui-partial` event-stream chunked-args behavior) is deferred to Phase 2d. |
| 10 | + |
| 11 | +**Sits on:** Phase 2b ([#314](https://github.com/cacheplane/angular-agent-framework/pull/314)) — directory-of-fixtures runner. Plan lives at `docs/superpowers/plans/2026-05-15-aimock-a2ui-single-bubble.md`. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Working environment |
| 16 | + |
| 17 | +- Worktree: `/tmp/aimock-2c` (branch `claude/aimock-a2ui-single-bubble`). |
| 18 | +- `node_modules` symlinked from main checkout. |
| 19 | +- License header `// SPDX-License-Identifier: MIT` on line 1 of every new TS file. |
| 20 | +- One commit per task. DO NOT push, amend, or `git add -A`. |
| 21 | +- The fixture file format is the same as Phase 2b — `{fixtures: [{match, response}]}`. The `response` field gains a `toolCalls` entry instead of `content`. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Task 0: De-risk the tool-call flow |
| 26 | + |
| 27 | +**Files:** None (investigation only). |
| 28 | + |
| 29 | +This task validates the integration assumption that aimock can serve tool-call responses to the langgraph Python agent and that the resulting bubble carries the surface. If anything fails, STOP and report — the scope needs to shrink or the spec needs revision. |
| 30 | + |
| 31 | +- [ ] **Step 1: Validate the mock fixture format for tool-calls** |
| 32 | + |
| 33 | +Write a one-off scratch fixture at `/tmp/aimock-tc-fixture.json`: |
| 34 | + |
| 35 | +```json |
| 36 | +{ |
| 37 | + "fixtures": [ |
| 38 | + { |
| 39 | + "match": { "userMessage": "show me a tiny surface" }, |
| 40 | + "response": { |
| 41 | + "toolCalls": [ |
| 42 | + { |
| 43 | + "name": "render_a2ui_surface", |
| 44 | + "arguments": { |
| 45 | + "envelopes": [ |
| 46 | + { |
| 47 | + "surfaceUpdate": { |
| 48 | + "surfaceId": "s1", |
| 49 | + "components": [ |
| 50 | + { "id": "root", "component": { "Text": { "text": { "literalString": "Hello from the mock!" } } } } |
| 51 | + ] |
| 52 | + } |
| 53 | + }, |
| 54 | + { "beginRendering": { "surfaceId": "s1", "root": "root" } } |
| 55 | + ] |
| 56 | + } |
| 57 | + } |
| 58 | + ] |
| 59 | + } |
| 60 | + } |
| 61 | + ] |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +Write a scratch Node script at `/tmp/aimock-tc-smoke.mjs`: |
| 66 | + |
| 67 | +```javascript |
| 68 | +import { LLMock } from "@copilotkit/aimock"; |
| 69 | +import OpenAI from "openai"; |
| 70 | + |
| 71 | +const mock = new LLMock({ port: 0 }); |
| 72 | +mock.loadFixtureFile("/tmp/aimock-tc-fixture.json"); |
| 73 | +await mock.start(); |
| 74 | +console.log("aimock url:", mock.url); |
| 75 | + |
| 76 | +const client = new OpenAI({ apiKey: "test", baseURL: `${mock.url}/v1` }); |
| 77 | +const completion = await client.chat.completions.create({ |
| 78 | + model: "gpt-4o", |
| 79 | + messages: [{ role: "user", content: "show me a tiny surface" }], |
| 80 | + tools: [ |
| 81 | + { |
| 82 | + type: "function", |
| 83 | + function: { |
| 84 | + name: "render_a2ui_surface", |
| 85 | + parameters: { type: "object", properties: { envelopes: { type: "array" } } }, |
| 86 | + }, |
| 87 | + }, |
| 88 | + ], |
| 89 | +}); |
| 90 | + |
| 91 | +console.log(JSON.stringify(completion.choices[0].message, null, 2)); |
| 92 | +await mock.stop(); |
| 93 | +``` |
| 94 | + |
| 95 | +Run: |
| 96 | +```bash |
| 97 | +cd /tmp/aimock-2c |
| 98 | +npm install --no-save --no-package-lock @copilotkit/aimock openai |
| 99 | +node /tmp/aimock-tc-smoke.mjs |
| 100 | +``` |
| 101 | + |
| 102 | +Expected: the printed message contains `tool_calls` array with a `render_a2ui_surface` call whose `arguments` (string) parses to `{envelopes: [...]}` matching the fixture. |
| 103 | + |
| 104 | +If the mock does not emit `tool_calls`: STOP. The mock may require a different matcher shape (e.g., the request must declare the tool in its `tools` array AND the mock matches on toolName). Try adjusting the fixture's `match` to `{toolName: "render_a2ui_surface"}` and re-run. Report which shape worked. |
| 105 | + |
| 106 | +- [ ] **Step 2: Verify the langgraph Python agent honors a tool-call response** |
| 107 | + |
| 108 | +Locate the existing Phase 2a smoke flow's tool-call handling: |
| 109 | + |
| 110 | +```bash |
| 111 | +grep -n "render_a2ui_surface\|bind_tools" /tmp/aimock-2c/examples/chat/python/src/streaming/envelope_tool.py /tmp/aimock-2c/examples/chat/python/src/graph.py |
| 112 | +``` |
| 113 | + |
| 114 | +Confirm the agent binds the `render_a2ui_surface` tool to the LLM. If the binding is conditional (e.g., gated on Gen UI mode), document the condition — the spec assumes the default flow includes A2UI tool binding. |
| 115 | + |
| 116 | +- [ ] **Step 3: Run a manual end-to-end probe** |
| 117 | + |
| 118 | +Start the harness with the scratch tool-call fixture: |
| 119 | + |
| 120 | +```bash |
| 121 | +cd /tmp/aimock-2c |
| 122 | +ln -sf /tmp/aimock-tc-fixture.json examples/chat/aimock-e2e/fixtures/tc-probe.json |
| 123 | +# Run the smoke spec with a custom prompt — modify smoke.spec.ts inline (do NOT commit) to send "show me a tiny surface" and pause for inspection. |
| 124 | +``` |
| 125 | + |
| 126 | +Skip this step if Step 1's printed `tool_calls` shape is unambiguous — the Playwright spec in Task 4 will exercise the full path. The point of Step 3 is to catch agent-side surprises (e.g., agent rejects tool_calls without a finish_reason that matches its expectation). If you can resolve any agent-side issue by reading code, do so; if not, STOP and report. |
| 127 | + |
| 128 | +- [ ] **Step 4: Clean up** |
| 129 | + |
| 130 | +```bash |
| 131 | +rm -f /tmp/aimock-tc-fixture.json /tmp/aimock-tc-smoke.mjs |
| 132 | +rm -f /tmp/aimock-2c/examples/chat/aimock-e2e/fixtures/tc-probe.json |
| 133 | +``` |
| 134 | + |
| 135 | +Confirm working tree is clean: `git status`. |
| 136 | + |
| 137 | +- [ ] **Step 5: Report** |
| 138 | + |
| 139 | +DE-RISK COMPLETE or DE-RISK FAILED. Include: |
| 140 | +- The fixture `match` shape that worked (`{userMessage}`, `{toolName}`, or both). |
| 141 | +- The exact shape of `completion.choices[0].message.tool_calls[0]` (key path to `arguments`). |
| 142 | +- Whether the agent has any conditional gating on tool binding. |
| 143 | + |
| 144 | +If de-risk passes, proceed to Task 1. If it fails in a way that makes the single-bubble assertion impossible (e.g., mock won't emit tool_calls at all), STOP and escalate. |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## Task 1: Add the `a2ui-surface.json` fixture |
| 149 | + |
| 150 | +**Files:** |
| 151 | +- Create: `examples/chat/aimock-e2e/fixtures/a2ui-surface.json` |
| 152 | + |
| 153 | +- [ ] **Step 1: Write the fixture** |
| 154 | + |
| 155 | +Write `examples/chat/aimock-e2e/fixtures/a2ui-surface.json`. **Adapt the `match` shape to whatever Task 0 verified worked.** The contents below use `userMessage`; if Task 0 found `toolName` is required, use that instead. |
| 156 | + |
| 157 | +```json |
| 158 | +{ |
| 159 | + "fixtures": [ |
| 160 | + { |
| 161 | + "match": { "userMessage": "show me a tiny surface" }, |
| 162 | + "response": { |
| 163 | + "toolCalls": [ |
| 164 | + { |
| 165 | + "name": "render_a2ui_surface", |
| 166 | + "arguments": { |
| 167 | + "envelopes": [ |
| 168 | + { |
| 169 | + "surfaceUpdate": { |
| 170 | + "surfaceId": "s1", |
| 171 | + "components": [ |
| 172 | + { |
| 173 | + "id": "root", |
| 174 | + "component": { |
| 175 | + "Text": { "text": { "literalString": "Hello from the mock!" } } |
| 176 | + } |
| 177 | + } |
| 178 | + ] |
| 179 | + } |
| 180 | + }, |
| 181 | + { "beginRendering": { "surfaceId": "s1", "root": "root" } } |
| 182 | + ] |
| 183 | + } |
| 184 | + } |
| 185 | + ] |
| 186 | + } |
| 187 | + } |
| 188 | + ] |
| 189 | +} |
| 190 | +``` |
| 191 | + |
| 192 | +- [ ] **Step 2: Commit Task 1** |
| 193 | + |
| 194 | +```bash |
| 195 | +cd /tmp/aimock-2c |
| 196 | +git add examples/chat/aimock-e2e/fixtures/a2ui-surface.json |
| 197 | +git commit -m "feat(examples-chat): add a2ui surface fixture" |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Task 2: Add the `a2ui-single-bubble.spec.ts` Playwright spec |
| 203 | + |
| 204 | +**Files:** |
| 205 | +- Create: `examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts` |
| 206 | + |
| 207 | +- [ ] **Step 1: Write the spec** |
| 208 | + |
| 209 | +Write `examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts`: |
| 210 | + |
| 211 | +```typescript |
| 212 | +// SPDX-License-Identifier: MIT |
| 213 | +import { test, expect } from '@playwright/test'; |
| 214 | + |
| 215 | +test('a2ui single bubble: one assistant bubble carries tool_calls + rendered surface', async ({ page }) => { |
| 216 | + await page.goto('/embed'); |
| 217 | + |
| 218 | + const input = page.getByRole('textbox', { name: /message|prompt/i }); |
| 219 | + await input.fill('show me a tiny surface'); |
| 220 | + await page.getByRole('button', { name: /send/i }).click(); |
| 221 | + |
| 222 | + // Surface element appears inside the conversation. |
| 223 | + const surface = page.locator('a2ui-surface'); |
| 224 | + await expect(surface).toBeVisible({ timeout: 30_000 }); |
| 225 | + await expect(surface).toContainText('Hello from the mock!'); |
| 226 | + |
| 227 | + // Single-bubble invariant: count assistant messages once the surface is mounted. |
| 228 | + // Skeleton bubbles (chat-genui-skeleton) must NOT exist as separate <chat-message>. |
| 229 | + const assistantBubbles = page.locator('chat-message').filter({ |
| 230 | + has: page.locator('a2ui-surface, chat-streaming-md, [data-role="assistant"]'), |
| 231 | + }); |
| 232 | + await expect(assistantBubbles).toHaveCount(1); |
| 233 | + |
| 234 | + // No standalone skeleton in the DOM after the turn. |
| 235 | + await expect(page.locator('chat-genui-skeleton')).toHaveCount(0); |
| 236 | +}); |
| 237 | +``` |
| 238 | + |
| 239 | +**Note on the assistantBubbles selector:** The filter targets `<chat-message>` elements that contain at least one of: an `<a2ui-surface>` (the GenUI bubble), a `<chat-streaming-md>` (the markdown bubble), or any element marked `data-role="assistant"` (defensive — if the chat composition uses a role attribute elsewhere). If the actual class/structure differs, adjust the selector — but it MUST result in `toHaveCount(1)` for the single-bubble invariant. Do not loosen the count assertion. |
| 240 | + |
| 241 | +- [ ] **Step 2: Run the spec** |
| 242 | + |
| 243 | +```bash |
| 244 | +cd /tmp/aimock-2c |
| 245 | +npx playwright install --with-deps chromium |
| 246 | +cd examples/chat/python |
| 247 | +uv sync |
| 248 | +cd /tmp/aimock-2c/examples/chat/aimock-e2e |
| 249 | +npx playwright test a2ui-single-bubble.spec.ts |
| 250 | +``` |
| 251 | + |
| 252 | +Expected: 1 test passes. Wall-clock ~60–120s. |
| 253 | + |
| 254 | +If it fails with surface not visible: the agent flow may need additional triggers (e.g., a Gen UI mode setting in the palette). Check the existing smoke spec's flow — it just sends a prompt; A2UI is the default Gen UI mode per the smoke checklist, so no palette change should be needed. Report the failure and the trace. |
| 255 | + |
| 256 | +If it fails with `toHaveCount(2)`: this is a real regression in the single-bubble behavior (PR #297). Capture the trace, DO NOT modify the test — this is precisely the regression Phase 2c exists to catch. |
| 257 | + |
| 258 | +- [ ] **Step 3: Run the full Playwright suite** |
| 259 | + |
| 260 | +```bash |
| 261 | +cd /tmp/aimock-2c/examples/chat/aimock-e2e |
| 262 | +npx playwright test |
| 263 | +``` |
| 264 | + |
| 265 | +Expected: 5 tests pass (1 smoke + 3 markdown + 1 a2ui-single-bubble). |
| 266 | + |
| 267 | +- [ ] **Step 4: Commit Task 2** |
| 268 | + |
| 269 | +```bash |
| 270 | +cd /tmp/aimock-2c |
| 271 | +git add examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts |
| 272 | +git commit -m "test(examples-chat): A2UI single-bubble invariant aimock scenario" |
| 273 | +``` |
| 274 | + |
| 275 | +--- |
| 276 | + |
| 277 | +## Self-review checklist |
| 278 | + |
| 279 | +- [x] Task 0 de-risk runs before any committed code lands. |
| 280 | +- [x] Single-bubble invariant assertion uses `toHaveCount(1)` against a filtered selector — not a brittle CSS selector. |
| 281 | +- [x] Skeleton-bubble residue assertion (`chat-genui-skeleton` count 0) lives alongside the single-bubble assertion. |
| 282 | +- [x] Existing Phase 2a + 2b specs still pass (smoke + 3 markdown). |
| 283 | +- [x] No production code touched. |
| 284 | +- [x] Fixture content is exact; assertions must match it without mutation. |
| 285 | +- [x] aimock library name appears only in fixture/source TS imports and plan/spec/README contexts (established in Phase 2a/2b). |
0 commit comments