Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/fix-storyboard-status-scenario-keys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
---

Rewrites `deriveStoryboardStatuses` to read SDK 6.x's storyboard-keyed scenarios. `comply()` emits `result.tracks[].scenarios[].scenario` as `<storyboard_id>/<phase_id>` (one per phase), but the old implementation walked YAML steps' `comply_scenario` fields and looked up bare names like `signals_flow` / `capability_discovery` — every lookup missed, so `testedCount === 0` skipped every storyboard. Net effect: zero rows in `agent_storyboard_status` have ever been written by the compliance heartbeat. The dashboard's "X passing / Y total" was structurally `0 / N` across the registry, every declared specialism was `untested`, and the AAO Verified badge pipeline silently stopped issuing.

New implementation groups scenarios by storyboard id, rolls per-step pass counts up from each phase's `steps` array (with phase-level fallback when steps are absent), and supports the existing `storyboardIds` override for explicit-IDs callers that need an untested entry when the runner didn't run a requested storyboard. Surfaced by escalation #329 — Evgeny's agent was passing 30/30 scenarios but showing `degraded` because the storyboard counts never updated.
111 changes: 73 additions & 38 deletions server/src/addie/services/compliance-testing.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@ import type {
TriggeredBy,
} from '../../db/compliance-db.js';

import { getStoryboard, getAllStoryboards } from '../../services/storyboards.js';
import type { Storyboard } from '../../services/storyboards.js';

// ── Re-exports ────────────────────────────────────────────────────

Expand Down Expand Up @@ -227,67 +225,104 @@ function mapOverallStatus(status: string): OverallRunStatus {
/**
* Derive per-storyboard pass/fail from a compliance result.
*
* Maps scenario results back to storyboard steps via comply_scenario.
* For explicit runs (storyboardIds provided), only those storyboards
* are evaluated. For heartbeat runs, all storyboards with matching
* scenarios are evaluated.
* `comply()` emits one `TestResult` per *phase* of each storyboard it ran,
* keyed `<storyboard_id>/<phase_id>` in `result.tracks[].scenarios[].scenario`
* (see `@adcp/sdk` `compliance/storyboard-tracks.ts`). We group those by
* storyboard id and roll step-level pass counts up from each phase's
* `steps` array — which is what `agent_storyboard_status.steps_passed/total`
* record.
*
* Modes:
* - heartbeat path (no `storyboardIds`): emit an entry for every storyboard
* the SDK actually produced data for.
* - explicit-IDs path (`storyboardIds` non-empty): emit one entry per id,
* with `status='untested'` for any id the SDK didn't run.
*
* `steps_passed` / `steps_total` reflect what the SDK reported for that
* storyboard in this run. Two storyboards (or the same storyboard across
* different runs) may count steps differently: most rows are real step
* counts; rows where the SDK emitted phases without per-step data fall back
* to phase-level counts. The values are meaningful within a single row
* (passed/total ratio, status derivation) but should not be compared across
* rows without checking which mode produced them.
*/
export function deriveStoryboardStatuses(
result: ComplianceResult,
storyboardIds?: string[],
): StoryboardStatusEntry[] {
// Build scenario → passed map from all track results
const scenarioResults = new Map<string, boolean>();
for (const track of result.tracks) {
interface Aggregate {
stepsPassed: number;
stepsTotal: number;
phasesPassed: number;
phasesTotal: number;
}
const perStoryboard = new Map<string, Aggregate>();
// Storyboard ids in `static/compliance/source/**/index.yaml` are flat
// identifiers (no `/`); splitting on the first `/` therefore always yields
// the storyboard id followed by the phase id. The `<= 0` guard also
// rejects pathological leading-slash strings.
const tracks = result.tracks ?? [];

for (const track of tracks) {
for (const s of track.scenarios) {
scenarioResults.set(s.scenario, s.overall_passed);
const sepIdx = typeof s.scenario === 'string' ? s.scenario.indexOf('/') : -1;
if (sepIdx <= 0) continue; // skip legacy bare-name scenarios (no longer emitted by storyboard-driven comply())
const sbId = s.scenario.slice(0, sepIdx);
let agg = perStoryboard.get(sbId);
if (!agg) {
agg = { stepsPassed: 0, stepsTotal: 0, phasesPassed: 0, phasesTotal: 0 };
perStoryboard.set(sbId, agg);
}
agg.phasesTotal++;
if (s.overall_passed) agg.phasesPassed++;

// Roll per-step results up from the phase. Some SDK paths emit a phase
// without a `steps` array (e.g. resource-resolution failures); we then
// fall back to phase-level counts below so the storyboard still
// reports a status.
const steps = s.steps ?? [];
for (const step of steps) {
agg.stepsTotal++;
if (step.passed) agg.stepsPassed++;
}
}
}

if (scenarioResults.size === 0) return [];

const storyboardsToCheck: Storyboard[] = storyboardIds
? storyboardIds.map(id => getStoryboard(id)).filter((s): s is Storyboard => !!s)
: getAllStoryboards();
// Decide which storyboard ids to emit entries for.
const hasExplicitIds = !!storyboardIds && storyboardIds.length > 0;
const toEmit = hasExplicitIds ? storyboardIds! : Array.from(perStoryboard.keys());

const entries: StoryboardStatusEntry[] = [];

for (const sb of storyboardsToCheck) {
// Collect steps with comply_scenario
const testableSteps: Array<{ stepId: string; scenario: string }> = [];
for (const phase of sb.phases) {
for (const step of phase.steps) {
if (step.comply_scenario) {
testableSteps.push({ stepId: step.id, scenario: step.comply_scenario });
}
for (const sbId of toEmit) {
const agg = perStoryboard.get(sbId);
if (!agg) {
// Explicit id requested but the runner didn't produce data for it.
if (hasExplicitIds) {
entries.push({ storyboard_id: sbId, status: 'untested', steps_passed: 0, steps_total: 0 });
}
continue;
}

if (testableSteps.length === 0) continue;

// Only include storyboards where at least one scenario was tested
const testedCount = testableSteps.filter(s => scenarioResults.has(s.scenario)).length;
if (testedCount === 0 && !storyboardIds) continue;

const passedCount = testableSteps.filter(s => scenarioResults.get(s.scenario) === true).length;
const totalSteps = testableSteps.length;
const useSteps = agg.stepsTotal > 0;
const passed = useSteps ? agg.stepsPassed : agg.phasesPassed;
const total = useSteps ? agg.stepsTotal : agg.phasesTotal;

let status: StoryboardStatusEntry['status'];
if (testedCount === 0) {
if (total === 0) {
status = 'untested';
} else if (passedCount === totalSteps) {
} else if (passed === total) {
status = 'passing';
} else if (passedCount === 0) {
} else if (passed === 0) {
status = 'failing';
} else {
status = 'partial';
}

entries.push({
storyboard_id: sb.id,
storyboard_id: sbId,
status,
steps_passed: passedCount,
steps_total: totalSteps,
steps_passed: passed,
steps_total: total,
});
}

Expand Down
95 changes: 95 additions & 0 deletions server/src/scripts/test-comply-storyboard-statuses.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
/**
* Run `comply()` against an agent URL and print what
* `deriveStoryboardStatuses` would produce. Read-only — no DB writes.
*
* Lets us validate the new SDK-6.x scenario-key parser against real agents
* before merging. Mirrors what the compliance heartbeat does for the
* storyboard-status piece, but prints to stdout instead of recording.
*
* Usage:
* npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <agent-url>
* npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <url1> <url2> ...
*/

import { AAO_UA_COMPLIANCE } from '../config/user-agents.js';
import {
comply,
deriveStoryboardStatuses,
complianceResultToDbInput,
type ComplyOptions,
} from '../addie/services/compliance-testing.js';

const urls = process.argv.slice(2).filter(a => !a.startsWith('--'));

if (urls.length === 0) {
console.error('Usage: test-comply-storyboard-statuses.ts <agent-url> [<agent-url> ...]');
process.exit(1);
}

async function probe(agentUrl: string): Promise<void> {
console.log(`\n${'='.repeat(80)}\nAgent: ${agentUrl}\n${'='.repeat(80)}`);
const start = Date.now();

const opts: ComplyOptions = {
test_session_id: `local-probe-${Date.now()}`,
timeout_ms: 90_000,
userAgent: AAO_UA_COMPLIANCE,
};

let result;
try {
result = await comply(agentUrl, opts);
} catch (err) {
console.log(` comply() threw: ${err instanceof Error ? err.message : String(err)}`);
return;
}

const duration = Date.now() - start;
console.log(`\nOverall: ${result.overall_status} (${duration}ms)`);
console.log(`Headline: ${result.summary.headline}`);
console.log(`Declared specialisms: ${JSON.stringify(result.agent_profile?.specialisms ?? [])}`);
console.log(`Storyboards executed: ${JSON.stringify(result.storyboards_executed ?? '(field absent)')}`);

console.log(`\nTracks:`);
for (const t of result.tracks) {
console.log(` ${t.track.padEnd(20)} status=${t.status.padEnd(8)} scenarios=${t.scenarios.length}`);
for (const s of t.scenarios.slice(0, 6)) {
const pass = s.overall_passed ? '✓' : '✗';
const stepCount = s.steps?.length ?? 0;
const stepsPassed = s.steps?.filter(st => st.passed).length ?? 0;
console.log(` ${pass} ${s.scenario.padEnd(50)} steps=${stepsPassed}/${stepCount}`);
}
if (t.scenarios.length > 6) {
console.log(` … +${t.scenarios.length - 6} more`);
}
}

console.log(`\nderiveStoryboardStatuses() output (what the heartbeat would persist):`);
const entries = deriveStoryboardStatuses(result);
if (entries.length === 0) {
console.log(` (empty — nothing to persist)`);
} else {
for (const e of entries) {
console.log(` ${e.storyboard_id.padEnd(40)} ${e.status.padEnd(10)} steps=${e.steps_passed}/${e.steps_total}`);
}
}

console.log(`\ncomplianceResultToDbInput().storyboard_statuses (full input shape):`);
const dbInput = complianceResultToDbInput(result, agentUrl, 'production', 'manual');
console.log(` count: ${dbInput.storyboard_statuses?.length ?? 0}`);
if (dbInput.storyboard_statuses?.length) {
console.log(JSON.stringify(dbInput.storyboard_statuses, null, 2));
}
}

async function main(): Promise<void> {
for (const url of urls) {
await probe(url);
}
console.log('');
}

main().catch((err) => {
console.error('Probe failed:', err);
process.exit(1);
});
Loading
Loading