Conversation
…o keys The compliance heartbeat has been writing zero rows to agent_storyboard_status since the SDK switched comply() to storyboard- driven testing. The SDK emits one TestResult per phase of each storyboard, keyed `<storyboard_id>/<phase_id>` in result.tracks[].scenarios[].scenario (see @adcp/sdk compliance/storyboard-tracks.ts). The old implementation walked the YAML's per-step `comply_scenario` field (bare names like `signals_flow`, `capability_discovery`) and looked them up in the SDK's scenario map. Every lookup missed → testedCount === 0 → every storyboard skipped at the `continue` guard. Effect across the registry: agent_storyboard_status total rows: 6 (across 4 agents) rows written by triggered_by='heartbeat': 0 rows surviving were legacy bare-name keys from old manual runs This silently broke the AAO Verified badge pipeline (no storyboard rows → deriveVerificationStatus has nothing to verify against) and every agent's dashboard `storyboards_passing: 0 / N` was misleading: the runner wasn't failing storyboards, the parser was dropping them. Surfaced by escalation #329: Evgeny's agent was running 30/30 scenarios clean but showing `degraded` because specialism_status.signal-owned read 'untested' from a never-populated agent_storyboard_status row. Fix: read SDK output directly. Group scenarios by storyboard id, roll per-step pass counts up from each phase's `steps` array, fall back to phase-level counts when steps are absent. The `storyboardIds` override is preserved for explicit-IDs callers that need an `untested` entry when the runner didn't run a requested storyboard. The unused YAML `comply_scenario` field is no longer load-bearing for status mapping (the SDK already knows which storyboards it ran). Tests: 9 cases covering all-pass, partial, all-fail, phase-only fallback, legacy bare-name skip, empty input, and explicit-IDs untested gap. Stack note: this is orthogonal to Emma's #4247 compliance-state unification stack (#4250, #4263, #4264, #4268, #4274) which collapses agent_test_history into agent_compliance_runs. Different files; rebases cleanly in either order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…he fix Runs comply() against an agent URL and prints what deriveStoryboardStatuses would produce, without DB writes. Used to validate the SDK-6.x scenario-key fix against real agents (adcp-signals-adaptor.evgeny-193.workers.dev/mcp and wonderstruck.sales-agent.scope3.com/mcp) before merging. Will stay useful for future SDK upgrades that touch scenario emission or storyboard-track aggregation — same pattern as the diagnose-agent-comply-queue script from #4361. Usage: npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <agent-url> [<agent-url> ...] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-ids check, add 3 edge tests Addresses code-reviewer feedback on PR #4364: - JSDoc on deriveStoryboardStatuses now calls out that steps_passed/total are not directly comparable across rows (some rows are real step counts, some are phase-level fallbacks when the SDK omits per-step data). - Comment pinning the storyboard-id invariant (flat ids, no `/`) so the indexOf split stays correct as new storyboards land. - Defensive `result.tracks ?? []` so a malformed result doesn't throw. - Hoist `storyboardIds && length > 0` into a single `hasExplicitIds` const used at both the toEmit decision and the no-data fallback. - Three new test cases: * same storyboard split across multiple tracks aggregates correctly * result.tracks absent → [] * non-string scenario values (null, number) → skipped without throwing 12/12 vitest passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
Expert review pass — both clearcode-reviewerNo blockers. All actionable feedback addressed in c11abeb:
12/12 vitest passing. adtech-product-expertVerdict: ship, with two product-side follow-ups (not blockers):
Plan
Trust-but-verify quotes:
|
Contributor
Author
|
Got it — both reviewers clear, merge plan looks solid. Happy to open the follow-up issue for the track-summary vs. storyboard-percentage dashboard reconciliation if you want to hand that off; just say the word. Generated by Claude Code |
This was referenced May 11, 2026
bokelley
added a commit
that referenced
this pull request
May 11, 2026
…4374) Adds an "X / Y storyboards passing" element between the SDK headline ("2 silent" etc.) and the track pills, with a tooltip explaining the relationship: storyboards = canonical conformance unit (each applicable specialism + protocol baseline + universal check is one storyboard, pass or fail) track pills = SDK's coarse roll-up that can read as "passing" even when underlying storyboards are partial — useful for quick glance but misleading in isolation Track pills gain their own tooltip pointing readers at the Verification panel for per-storyboard detail. Resolves the Evgeny-shape disconnect from escalation #329: track summary showed "2 silent / 30 of 30 scenarios passing" while the agent's signal_owned specialism storyboard was 1/5 steps. With the data flowing correctly after PR #4364, this surface change closes the loop on the adtech-product reviewer's "deprecate track summary on the public dashboard, keep it operator-only" call by making the storyboard count visually prominent and clarifying that the SDK track pills are debug context. Push A item 4 of 4 in the compliance reporting fidelity initiative. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The compliance heartbeat has been writing zero rows to
agent_storyboard_statussincecomply()switched to storyboard-driven testing. Every agent's dashboardstoryboards_passing: 0/Nwas misleading — the parser was dropping the results.Root cause
SDK 6.x emits one
TestResultper phase of each storyboard, keyed<storyboard_id>/<phase_id>inresult.tracks[].scenarios[].scenario(see@adcp/sdkcompliance/storyboard-tracks.ts:54). The oldderiveStoryboardStatuseswalked the YAML's per-stepcomply_scenariofield (bare names likesignals_flow,capability_discovery) and looked them up in a Map keyed by the SDK's scenario strings. Every lookup missed →testedCount === 0→ every storyboard skipped at thecontinueguard. No rows written. No badges issued.Scope
Direct DB query against prod:
No heartbeat run has ever written to
agent_storyboard_status. Affects every agent's badge eligibility, everyspecialism_statusvalue, everystoryboards_passingcount.Surfaced by escalation #329 — Evgeny's agent runs 30/30 scenarios clean but shows
degradedbecausespecialism_status.signal-owned = "untested"reads from a never-populated row.Fix
Read SDK output directly. Group
result.tracks[].scenarios[]by<storyboard_id>parsed from the scenario string, roll per-step pass counts up from each phase'sstepsarray, fall back to phase-level counts when steps are absent.storyboardIdsoverride is preserved for explicit-IDs callers (manual evals that need anuntestedentry when the runner didn't run a requested storyboard).The YAML's
comply_scenariofield is no longer load-bearing for status mapping — the SDK already knows which storyboards it ran. Field is left in place (still useful for human documentation / planning).Tests
server/tests/unit/derive-storyboard-statuses.test.ts— 9 cases:stepsarray → fall back to phase-level countsstoryboardIdswith runner gap → 'untested' entrystoryboardIdsignoring extra storyboards in resultAll passing locally; type check clean.
Stack note
Orthogonal to Emma's #4247 compliance-state unification stack (#4250, #4263, #4264, #4268, #4274) which is about which tables carry compliance state (collapsing
agent_test_history). This PR is about parsing the SDK output correctly into the existing tables. Different files (compliance-testing.tsvs hercompliance-db.ts/member-tools.ts/registry-api.ts); rebases cleanly in either order.Follow-ups
Once merged + deployed, the next heartbeat tick will start populating
agent_storyboard_statusproperly for all 18 registered agents. A heartbeat sweep cycle drains in ~2h. After that, badge issuance viaprocessAgentBadgesshould re-fire on the next heartbeat for any agent whose declared specialisms now have passing storyboard rows.🤖 Generated with Claude Code