Skip to content

Commit 5c024e4

Browse files
authored
test(examples-chat): A2UI single-bubble invariant — Phase 2c (#322)
* docs: add Phase 2c plan — A2UI single-bubble invariant Drives an A2UI v1 GenUI prompt through the harness via a mock tool-call response carrying envelopes as args. Asserts exactly one assistant bubble per turn (PR #297 single-bubble invariant) and the absence of standalone skeleton bubbles. Progressive mount (chunked tool-call args) deferred to Phase 2d. Task 0 de-risks the mock + langgraph tool-call handoff before any committed code lands. * feat(examples-chat): add a2ui surface fixture * feat(examples-chat): replace a2ui fixture with real captured envelopes * test(examples-chat): A2UI single-bubble invariant scenario
1 parent 64c95bd commit 5c024e4

3 files changed

Lines changed: 439 additions & 0 deletions

File tree

Lines changed: 285 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,285 @@
1+
# aimock E2E — Phase 2c: A2UI v1 single-bubble invariant
2+
3+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (`- [ ]`) syntax.
4+
5+
**Goal:** Add a Playwright scenario that drives an A2UI v1 GenUI prompt through the aimock harness and asserts the single-bubble invariant from PR #297 — exactly ONE assistant bubble per GenUI turn, with `<a2ui-surface>` rendered inside it.
6+
7+
**Architecture:** The mock returns a tool-call response (`render_a2ui_surface` with envelopes as args). The Python graph processes the tool_call and re-emits the surface as `---a2ui_JSON---\n`-prefixed content in the same AI message. Angular renders one bubble with the surface inside.
8+
9+
**Scope:** single-bubble invariant only. Progressive mount (the `a2ui-partial` event-stream chunked-args behavior) is deferred to Phase 2d.
10+
11+
**Sits on:** Phase 2b ([#314](https://github.com/cacheplane/angular-agent-framework/pull/314)) — directory-of-fixtures runner. Plan lives at `docs/superpowers/plans/2026-05-15-aimock-a2ui-single-bubble.md`.
12+
13+
---
14+
15+
## Working environment
16+
17+
- Worktree: `/tmp/aimock-2c` (branch `claude/aimock-a2ui-single-bubble`).
18+
- `node_modules` symlinked from main checkout.
19+
- License header `// SPDX-License-Identifier: MIT` on line 1 of every new TS file.
20+
- One commit per task. DO NOT push, amend, or `git add -A`.
21+
- The fixture file format is the same as Phase 2b — `{fixtures: [{match, response}]}`. The `response` field gains a `toolCalls` entry instead of `content`.
22+
23+
---
24+
25+
## Task 0: De-risk the tool-call flow
26+
27+
**Files:** None (investigation only).
28+
29+
This task validates the integration assumption that aimock can serve tool-call responses to the langgraph Python agent and that the resulting bubble carries the surface. If anything fails, STOP and report — the scope needs to shrink or the spec needs revision.
30+
31+
- [ ] **Step 1: Validate the mock fixture format for tool-calls**
32+
33+
Write a one-off scratch fixture at `/tmp/aimock-tc-fixture.json`:
34+
35+
```json
36+
{
37+
"fixtures": [
38+
{
39+
"match": { "userMessage": "show me a tiny surface" },
40+
"response": {
41+
"toolCalls": [
42+
{
43+
"name": "render_a2ui_surface",
44+
"arguments": {
45+
"envelopes": [
46+
{
47+
"surfaceUpdate": {
48+
"surfaceId": "s1",
49+
"components": [
50+
{ "id": "root", "component": { "Text": { "text": { "literalString": "Hello from the mock!" } } } }
51+
]
52+
}
53+
},
54+
{ "beginRendering": { "surfaceId": "s1", "root": "root" } }
55+
]
56+
}
57+
}
58+
]
59+
}
60+
}
61+
]
62+
}
63+
```
64+
65+
Write a scratch Node script at `/tmp/aimock-tc-smoke.mjs`:
66+
67+
```javascript
68+
import { LLMock } from "@copilotkit/aimock";
69+
import OpenAI from "openai";
70+
71+
const mock = new LLMock({ port: 0 });
72+
mock.loadFixtureFile("/tmp/aimock-tc-fixture.json");
73+
await mock.start();
74+
console.log("aimock url:", mock.url);
75+
76+
const client = new OpenAI({ apiKey: "test", baseURL: `${mock.url}/v1` });
77+
const completion = await client.chat.completions.create({
78+
model: "gpt-4o",
79+
messages: [{ role: "user", content: "show me a tiny surface" }],
80+
tools: [
81+
{
82+
type: "function",
83+
function: {
84+
name: "render_a2ui_surface",
85+
parameters: { type: "object", properties: { envelopes: { type: "array" } } },
86+
},
87+
},
88+
],
89+
});
90+
91+
console.log(JSON.stringify(completion.choices[0].message, null, 2));
92+
await mock.stop();
93+
```
94+
95+
Run:
96+
```bash
97+
cd /tmp/aimock-2c
98+
npm install --no-save --no-package-lock @copilotkit/aimock openai
99+
node /tmp/aimock-tc-smoke.mjs
100+
```
101+
102+
Expected: the printed message contains `tool_calls` array with a `render_a2ui_surface` call whose `arguments` (string) parses to `{envelopes: [...]}` matching the fixture.
103+
104+
If the mock does not emit `tool_calls`: STOP. The mock may require a different matcher shape (e.g., the request must declare the tool in its `tools` array AND the mock matches on toolName). Try adjusting the fixture's `match` to `{toolName: "render_a2ui_surface"}` and re-run. Report which shape worked.
105+
106+
- [ ] **Step 2: Verify the langgraph Python agent honors a tool-call response**
107+
108+
Locate the existing Phase 2a smoke flow's tool-call handling:
109+
110+
```bash
111+
grep -n "render_a2ui_surface\|bind_tools" /tmp/aimock-2c/examples/chat/python/src/streaming/envelope_tool.py /tmp/aimock-2c/examples/chat/python/src/graph.py
112+
```
113+
114+
Confirm the agent binds the `render_a2ui_surface` tool to the LLM. If the binding is conditional (e.g., gated on Gen UI mode), document the condition — the spec assumes the default flow includes A2UI tool binding.
115+
116+
- [ ] **Step 3: Run a manual end-to-end probe**
117+
118+
Start the harness with the scratch tool-call fixture:
119+
120+
```bash
121+
cd /tmp/aimock-2c
122+
ln -sf /tmp/aimock-tc-fixture.json examples/chat/aimock-e2e/fixtures/tc-probe.json
123+
# Run the smoke spec with a custom prompt — modify smoke.spec.ts inline (do NOT commit) to send "show me a tiny surface" and pause for inspection.
124+
```
125+
126+
Skip this step if Step 1's printed `tool_calls` shape is unambiguous — the Playwright spec in Task 4 will exercise the full path. The point of Step 3 is to catch agent-side surprises (e.g., agent rejects tool_calls without a finish_reason that matches its expectation). If you can resolve any agent-side issue by reading code, do so; if not, STOP and report.
127+
128+
- [ ] **Step 4: Clean up**
129+
130+
```bash
131+
rm -f /tmp/aimock-tc-fixture.json /tmp/aimock-tc-smoke.mjs
132+
rm -f /tmp/aimock-2c/examples/chat/aimock-e2e/fixtures/tc-probe.json
133+
```
134+
135+
Confirm working tree is clean: `git status`.
136+
137+
- [ ] **Step 5: Report**
138+
139+
DE-RISK COMPLETE or DE-RISK FAILED. Include:
140+
- The fixture `match` shape that worked (`{userMessage}`, `{toolName}`, or both).
141+
- The exact shape of `completion.choices[0].message.tool_calls[0]` (key path to `arguments`).
142+
- Whether the agent has any conditional gating on tool binding.
143+
144+
If de-risk passes, proceed to Task 1. If it fails in a way that makes the single-bubble assertion impossible (e.g., mock won't emit tool_calls at all), STOP and escalate.
145+
146+
---
147+
148+
## Task 1: Add the `a2ui-surface.json` fixture
149+
150+
**Files:**
151+
- Create: `examples/chat/aimock-e2e/fixtures/a2ui-surface.json`
152+
153+
- [ ] **Step 1: Write the fixture**
154+
155+
Write `examples/chat/aimock-e2e/fixtures/a2ui-surface.json`. **Adapt the `match` shape to whatever Task 0 verified worked.** The contents below use `userMessage`; if Task 0 found `toolName` is required, use that instead.
156+
157+
```json
158+
{
159+
"fixtures": [
160+
{
161+
"match": { "userMessage": "show me a tiny surface" },
162+
"response": {
163+
"toolCalls": [
164+
{
165+
"name": "render_a2ui_surface",
166+
"arguments": {
167+
"envelopes": [
168+
{
169+
"surfaceUpdate": {
170+
"surfaceId": "s1",
171+
"components": [
172+
{
173+
"id": "root",
174+
"component": {
175+
"Text": { "text": { "literalString": "Hello from the mock!" } }
176+
}
177+
}
178+
]
179+
}
180+
},
181+
{ "beginRendering": { "surfaceId": "s1", "root": "root" } }
182+
]
183+
}
184+
}
185+
]
186+
}
187+
}
188+
]
189+
}
190+
```
191+
192+
- [ ] **Step 2: Commit Task 1**
193+
194+
```bash
195+
cd /tmp/aimock-2c
196+
git add examples/chat/aimock-e2e/fixtures/a2ui-surface.json
197+
git commit -m "feat(examples-chat): add a2ui surface fixture"
198+
```
199+
200+
---
201+
202+
## Task 2: Add the `a2ui-single-bubble.spec.ts` Playwright spec
203+
204+
**Files:**
205+
- Create: `examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts`
206+
207+
- [ ] **Step 1: Write the spec**
208+
209+
Write `examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts`:
210+
211+
```typescript
212+
// SPDX-License-Identifier: MIT
213+
import { test, expect } from '@playwright/test';
214+
215+
test('a2ui single bubble: one assistant bubble carries tool_calls + rendered surface', async ({ page }) => {
216+
await page.goto('/embed');
217+
218+
const input = page.getByRole('textbox', { name: /message|prompt/i });
219+
await input.fill('show me a tiny surface');
220+
await page.getByRole('button', { name: /send/i }).click();
221+
222+
// Surface element appears inside the conversation.
223+
const surface = page.locator('a2ui-surface');
224+
await expect(surface).toBeVisible({ timeout: 30_000 });
225+
await expect(surface).toContainText('Hello from the mock!');
226+
227+
// Single-bubble invariant: count assistant messages once the surface is mounted.
228+
// Skeleton bubbles (chat-genui-skeleton) must NOT exist as separate <chat-message>.
229+
const assistantBubbles = page.locator('chat-message').filter({
230+
has: page.locator('a2ui-surface, chat-streaming-md, [data-role="assistant"]'),
231+
});
232+
await expect(assistantBubbles).toHaveCount(1);
233+
234+
// No standalone skeleton in the DOM after the turn.
235+
await expect(page.locator('chat-genui-skeleton')).toHaveCount(0);
236+
});
237+
```
238+
239+
**Note on the assistantBubbles selector:** The filter targets `<chat-message>` elements that contain at least one of: an `<a2ui-surface>` (the GenUI bubble), a `<chat-streaming-md>` (the markdown bubble), or any element marked `data-role="assistant"` (defensive — if the chat composition uses a role attribute elsewhere). If the actual class/structure differs, adjust the selector — but it MUST result in `toHaveCount(1)` for the single-bubble invariant. Do not loosen the count assertion.
240+
241+
- [ ] **Step 2: Run the spec**
242+
243+
```bash
244+
cd /tmp/aimock-2c
245+
npx playwright install --with-deps chromium
246+
cd examples/chat/python
247+
uv sync
248+
cd /tmp/aimock-2c/examples/chat/aimock-e2e
249+
npx playwright test a2ui-single-bubble.spec.ts
250+
```
251+
252+
Expected: 1 test passes. Wall-clock ~60–120s.
253+
254+
If it fails with surface not visible: the agent flow may need additional triggers (e.g., a Gen UI mode setting in the palette). Check the existing smoke spec's flow — it just sends a prompt; A2UI is the default Gen UI mode per the smoke checklist, so no palette change should be needed. Report the failure and the trace.
255+
256+
If it fails with `toHaveCount(2)`: this is a real regression in the single-bubble behavior (PR #297). Capture the trace, DO NOT modify the test — this is precisely the regression Phase 2c exists to catch.
257+
258+
- [ ] **Step 3: Run the full Playwright suite**
259+
260+
```bash
261+
cd /tmp/aimock-2c/examples/chat/aimock-e2e
262+
npx playwright test
263+
```
264+
265+
Expected: 5 tests pass (1 smoke + 3 markdown + 1 a2ui-single-bubble).
266+
267+
- [ ] **Step 4: Commit Task 2**
268+
269+
```bash
270+
cd /tmp/aimock-2c
271+
git add examples/chat/aimock-e2e/a2ui-single-bubble.spec.ts
272+
git commit -m "test(examples-chat): A2UI single-bubble invariant aimock scenario"
273+
```
274+
275+
---
276+
277+
## Self-review checklist
278+
279+
- [x] Task 0 de-risk runs before any committed code lands.
280+
- [x] Single-bubble invariant assertion uses `toHaveCount(1)` against a filtered selector — not a brittle CSS selector.
281+
- [x] Skeleton-bubble residue assertion (`chat-genui-skeleton` count 0) lives alongside the single-bubble assertion.
282+
- [x] Existing Phase 2a + 2b specs still pass (smoke + 3 markdown).
283+
- [x] No production code touched.
284+
- [x] Fixture content is exact; assertions must match it without mutation.
285+
- [x] aimock library name appears only in fixture/source TS imports and plan/spec/README contexts (established in Phase 2a/2b).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
// SPDX-License-Identifier: MIT
2+
import { test, expect } from '@playwright/test';
3+
4+
test('a2ui single bubble: one assistant bubble carries the rendered surface', async ({ page }) => {
5+
await page.goto('/embed');
6+
7+
const input = page.getByRole('textbox', { name: /message|prompt/i });
8+
await input.fill('Demo: render a feedback form');
9+
await page.getByRole('button', { name: /send/i }).click();
10+
11+
// Surface element materializes in the DOM. Use toBeAttached rather than
12+
// toBeVisible — the bubble container can have zero computed size during
13+
// progressive mount and Playwright's strict visibility heuristic flags
14+
// that even when the surface is rendering correctly.
15+
const surface = page.locator('a2ui-surface');
16+
await expect(surface).toBeAttached({ timeout: 45_000 });
17+
18+
// Surface has the rendered Column structure (from the captured fixture).
19+
await expect.poll(async () => surface.locator('a2ui-column, [class*="column"]').count(), {
20+
timeout: 30_000,
21+
}).toBeGreaterThan(0);
22+
23+
// Single-bubble invariant (PR #297): exactly one <chat-message> carries the
24+
// assistant turn. Skeleton residue from progressive mount must not survive.
25+
const assistantBubbles = page.locator('chat-message').filter({
26+
has: page.locator('a2ui-surface, chat-streaming-md'),
27+
});
28+
await expect(assistantBubbles).toHaveCount(1);
29+
await expect(page.locator('chat-genui-skeleton')).toHaveCount(0);
30+
});

0 commit comments

Comments
 (0)