fix(addie): owner evaluate_agent_quality writes to canonical compliance state by bokelley · Pull Request #4250 · adcontextprotocol/adcp

bokelley · 2026-05-08T19:01:23Z

Refs #4247 — PR 1 of 4 in the compliance-state unification initiative.

Summary

Closes the 12-hour gap between an owner running evaluate_agent_quality (via Addie chat or the test_adcp_agent MCP tool) and the public /api/registry/agents/:url/compliance endpoint reflecting the result.

Root cause: evaluate_agent_quality wrote to agent_test_history / agent_contexts.last_test_* (not visible to the compliance API). The heartbeat is the only prior writer to the canonical tables, so owner-triggered results were invisible until the next 12-hour cron.

What this PR does (owner path only):

Checks whether the calling org owns the agent via member_profiles.agents @> $1::jsonb JOIN organization_memberships — identical to resolveAgentOwnerOrg in registry-api.ts:4733.
If owner AND compliance_opt_out is not set: calls complianceResultToDbInput() + complianceDb.recordComplianceRun() with triggered_by = 'owner_test' and dry_run = false.
Adds 'owner_test' to the TriggeredBy type and both triggered_by CHECK constraints (migration 471 — covers agent_compliance_runs and agent_storyboard_status).
Explicitly omits notifyComplianceChange to prevent Slack spam on iterative owner test runs.
Retains the legacy agentContextDb.recordTest() write for backward compatibility (deprecated in PR 3).

Known gap documented for follow-up: AAO Verified badge state is still updated only by the heartbeat (badge issuance calls processAgentBadges which is in the heartbeat loop, not in recordComplianceRun). Compliance status reflects owner test results immediately; badge re-issuance still requires the next heartbeat. Tracked in #4247.

Non-breaking justification: Adds an optional write path to existing canonical tables with a new triggered_by = 'owner_test' value. No fields removed, no types changed, no existing writes modified. The new enum value is additive; the constraint drop-and-recreate is non-destructive DDL. Existing consumers of the compliance API gain data, not schema changes.

Pre-PR review

code-reviewer: approved after fixes — ownership query strengthened to join organization_memberships, compliance_opt_out guard added, dry_run: false comment added, log level on owner-check failure raised to warn
internal-tools-strategist: approved — ownership check (member_profiles.agents @> $1::jsonb) matches established codebase pattern; notifyComplianceChange suppression is correct per notification subscriber scope (Slack + external consumers via Scope3); badge-issuance gap acknowledged as follow-up

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

Push fixup commits directly: gh pr checkout <num> →
fix → push.

Or re-trigger: comment /triage execute on the source
issue.

See #3121
for context.

Session: https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

Generated by Claude Code

EmmaLouise2018 · 2026-05-08T19:16:50Z

Reviewing this against the updated #4247 plan. Right shape for PR 1 — owner-only canonical writes + triggered_by = 'owner_test' + compliance_opt_out guard + Slack-noise suppression all match the design. Two asks before merge to keep the public contract honest:

1. `verdict_source` field on the compliance response (load-bearing)

/api/registry/agents/:url/compliance is a public registry surface. What it reflects is changing, even though the field names aren't:

Pre-PR: "this agent's last scheduled verdict" (heartbeat-only).
Post-PR: "this agent's last verdict from any source" (heartbeat + owner_test).

A consumer scraping this endpoint daily learned heartbeat history; post-PR they'll see any verdict, possibly an owner running it on demand to test a fix. Tell the caller, don't guess silently. Add verdict_source: 'heartbeat' | 'owner_test' to the response so downstream filters correctly. Reads off the triggered_by of the latest agent_compliance_runs row joined to agent_compliance_status — same source that's already populated in this PR.

The OperatorLookupResultSchema analog (AgentComplianceDetailSchema or whichever is canonical for /compliance) gets the additive optional field. Frozen reporting contract argument applies; don't change semantics on the wire without a flag.

Alternative: a SHOULD-clause in the changeset spelling out the semantic shift for downstream scrapers. Pick one. Don't ship neither.

2. Last-write-wins race test (pin the contract)

Heartbeat and owner_test can fire within minutes. Two concurrent writers, one row in agent_compliance_status. The implicit rule under the current code is last-write-wins on (agent_url). State it. A test pins the rule:

// heartbeat-style write at T → status reflects T
// owner_test write at T+1s → status reflects T+1s
// owner_test at T+2s, heartbeat at T+3s → status reflects T+3s

A future refactor that switches to "first-write-wins" or "merge" would silently change the public contract. The test is the contract.

Lineage

Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247 — full unification plan (this PR is PR 1 of 4)
fix(registry): heartbeat picks up member-registered agents — write-side seed + read-side CTE #4252 — heartbeat-picks-up-member-agents (sibling fix on the visibility-miss bug class)
fix(registry): require declared agent type + auto-backfill primary_brand_domain #4235 — type-required + brand-domain backfill (the prior visibility-miss fix)

Approving on the strength of the owner-only canonical write pattern plus compliance_opt_out guard. Two asks above before merge — both small. The badge-issuance gap acknowledged in the PR description is the right deferral; that's an issue against #4247 to track.

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

bokelley · 2026-05-08T19:26:16Z

Both asks addressed in the fixup commit (0c996a6):

1. verdict_source on /api/registry/agents/:url/compliance

AgentComplianceDetailSchema gets a new optional field:

verdict_source: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null

getComplianceStatus and bulkGetComplianceStatus now include a LATERAL subquery against agent_compliance_runs (dry_run = false, ORDER BY tested_at DESC LIMIT 1) to surface triggered_by as last_triggered_by on AgentComplianceStatus. The endpoint emits it as verdict_source. No migration needed — the data is already there.

2. Last-write-wins contract test

New compliance-db-last-write-wins.test.ts (4 cases) pins:

Every recordComplianceRun call issues ON CONFLICT (agent_url) DO UPDATE — never DO NOTHING.
triggered_by is forwarded verbatim for both heartbeat and owner_test sources.
getComplianceStatus returns last_triggered_by from the most recent non-dry run (asserts dry_run = false and ORDER BY tested_at DESC LIMIT 1 are in the SQL).

A future switch to first-write-wins or source-priority filtering breaks these tests.

All 3302 unit tests pass.

Session: https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

Generated by Claude Code

bokelley · 2026-05-09T00:02:20Z

Code review (expert pass): solid root, ready to merge after stripping committed dist/ files.

Block:

dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} are checked in. Generated artifacts shouldn't be in the diff. Remove and amend.

Nits (non-blocking):

server/src/addie/mcp/member-tools.ts:3578 — comment says "fall through to the legacy agent_test_history path below" but fix(addie): backfill owner test history + stop dual-write for owner runs #4264 makes that conditional. Will be stale by the time fix(addie): backfill owner test history + stop dual-write for owner runs #4264 lands.
compliance-db-last-write-wins.test.ts mocks query/getClient. Repo convention is real-Postgres tests. Acceptable as a SQL-string contract test; flag for follow-up.

After dist strip, mark ready and the chain unblocks (#4263 is clean too — see #4264 for the real chain blocker).

bokelley · 2026-05-09T00:04:11Z

Blocker addressed — pushed c71809a0.

Removed dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} via git rm.
Added .gitignore rules for dist/schemas/*.{js,js.map,d.ts,d.ts.map} so build artifacts in that directory can't sneak back in.

Nits noted (not fixed):

member-tools.ts:3578 comment — will be stale once fix(addie): backfill owner test history + stop dual-write for owner runs #4264 lands. Flagged for whoever owns fix(addie): backfill owner test history + stop dual-write for owner runs #4264 to clean up on merge.
compliance-db-last-write-wins.test.ts mock pattern — acknowledged as a SQL-string contract test deviation from real-Postgres convention; follow-up tracked in the PR body.

Ready to mark out of draft when you are.

Triaged by Claude Code. Session: https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

Generated by Claude Code

…ce state Closes the 12-hour gap between owner-triggered storyboard runs and the public /api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4). When evaluate_agent_quality is triggered by the agent owner, the result is now written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history (deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK constraints. notifyComplianceChange is intentionally suppressed for owner runs to prevent iteration-loop Slack spam. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

…rtifacts Generated JS/TS files don't belong in source control. Also adds .gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence. https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

…tion_queue on main)

PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.

bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 8, 2026

bokelley mentioned this pull request May 8, 2026

Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247

Open

12 tasks

claude and others added 4 commits May 8, 2026 20:07

chore(dist): remove accidentally committed onboarding-openapi build a…

ea37a3b

…rtifacts Generated JS/TS files don't belong in source control. Also adds .gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence. https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

fix(migrate): renumber 471 → 472 (resolve clash with manager_revalida…

42e7f37

…tion_queue on main)

EmmaLouise2018 force-pushed the claude/issue-4247-owner-test-canonical-write branch from c71809a to 42e7f37 Compare May 9, 2026 00:09

fix(addie): address Brian's review — DDL lock guard + cross-org gap note

54a4f16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250
bokelley wants to merge 5 commits intomainfrom
claude/issue-4247-owner-test-canonical-write

bokelley commented May 8, 2026

Uh oh!

EmmaLouise2018 commented May 8, 2026

Uh oh!

bokelley commented May 8, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bokelley commented May 8, 2026

Summary

Pre-PR review

Uh oh!

EmmaLouise2018 commented May 8, 2026

1. verdict_source field on the compliance response (load-bearing)

2. Last-write-wins race test (pin the contract)

Lineage

Uh oh!

bokelley commented May 8, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `verdict_source` field on the compliance response (load-bearing)