Skip to content

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250

Draft
bokelley wants to merge 5 commits intomainfrom
claude/issue-4247-owner-test-canonical-write
Draft

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250
bokelley wants to merge 5 commits intomainfrom
claude/issue-4247-owner-test-canonical-write

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 8, 2026

Refs #4247 — PR 1 of 4 in the compliance-state unification initiative.

Summary

Closes the 12-hour gap between an owner running evaluate_agent_quality (via Addie chat or the test_adcp_agent MCP tool) and the public /api/registry/agents/:url/compliance endpoint reflecting the result.

Root cause: evaluate_agent_quality wrote to agent_test_history / agent_contexts.last_test_* (not visible to the compliance API). The heartbeat is the only prior writer to the canonical tables, so owner-triggered results were invisible until the next 12-hour cron.

What this PR does (owner path only):

  1. Checks whether the calling org owns the agent via member_profiles.agents @> $1::jsonb JOIN organization_memberships — identical to resolveAgentOwnerOrg in registry-api.ts:4733.
  2. If owner AND compliance_opt_out is not set: calls complianceResultToDbInput() + complianceDb.recordComplianceRun() with triggered_by = 'owner_test' and dry_run = false.
  3. Adds 'owner_test' to the TriggeredBy type and both triggered_by CHECK constraints (migration 471 — covers agent_compliance_runs and agent_storyboard_status).
  4. Explicitly omits notifyComplianceChange to prevent Slack spam on iterative owner test runs.
  5. Retains the legacy agentContextDb.recordTest() write for backward compatibility (deprecated in PR 3).

Known gap documented for follow-up: AAO Verified badge state is still updated only by the heartbeat (badge issuance calls processAgentBadges which is in the heartbeat loop, not in recordComplianceRun). Compliance status reflects owner test results immediately; badge re-issuance still requires the next heartbeat. Tracked in #4247.

Non-breaking justification: Adds an optional write path to existing canonical tables with a new triggered_by = 'owner_test' value. No fields removed, no types changed, no existing writes modified. The new enum value is additive; the constraint drop-and-recreate is non-destructive DDL. Existing consumers of the compliance API gain data, not schema changes.

Pre-PR review

  • code-reviewer: approved after fixes — ownership query strengthened to join organization_memberships, compliance_opt_out guard added, dry_run: false comment added, log level on owner-check failure raised to warn
  • internal-tools-strategist: approved — ownership check (member_profiles.agents @> $1::jsonb) matches established codebase pattern; notifyComplianceChange suppression is correct per notification subscriber scope (Slack + external consumers via Scope3); badge-issuance gap acknowledged as follow-up

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

  • Push fixup commits directly: gh pr checkout <num>
    fix → push.
  • Or re-trigger: comment /triage execute on the source
    issue.

See #3121
for context.

Session: https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb


Generated by Claude Code

@bokelley bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 8, 2026
@EmmaLouise2018
Copy link
Copy Markdown
Contributor

Reviewing this against the updated #4247 plan. Right shape for PR 1 — owner-only canonical writes + triggered_by = 'owner_test' + compliance_opt_out guard + Slack-noise suppression all match the design. Two asks before merge to keep the public contract honest:

1. verdict_source field on the compliance response (load-bearing)

/api/registry/agents/:url/compliance is a public registry surface. What it reflects is changing, even though the field names aren't:

  • Pre-PR: "this agent's last scheduled verdict" (heartbeat-only).
  • Post-PR: "this agent's last verdict from any source" (heartbeat + owner_test).

A consumer scraping this endpoint daily learned heartbeat history; post-PR they'll see any verdict, possibly an owner running it on demand to test a fix. Tell the caller, don't guess silently. Add verdict_source: 'heartbeat' | 'owner_test' to the response so downstream filters correctly. Reads off the triggered_by of the latest agent_compliance_runs row joined to agent_compliance_status — same source that's already populated in this PR.

The OperatorLookupResultSchema analog (AgentComplianceDetailSchema or whichever is canonical for /compliance) gets the additive optional field. Frozen reporting contract argument applies; don't change semantics on the wire without a flag.

Alternative: a SHOULD-clause in the changeset spelling out the semantic shift for downstream scrapers. Pick one. Don't ship neither.

2. Last-write-wins race test (pin the contract)

Heartbeat and owner_test can fire within minutes. Two concurrent writers, one row in agent_compliance_status. The implicit rule under the current code is last-write-wins on (agent_url). State it. A test pins the rule:

// heartbeat-style write at T → status reflects T
// owner_test write at T+1s → status reflects T+1s
// owner_test at T+2s, heartbeat at T+3s → status reflects T+3s

A future refactor that switches to "first-write-wins" or "merge" would silently change the public contract. The test is the contract.

Lineage

Approving on the strength of the owner-only canonical write pattern plus compliance_opt_out guard. Two asks above before merge — both small. The badge-issuance gap acknowledged in the PR description is the right deferral; that's an issue against #4247 to track.

bokelley pushed a commit that referenced this pull request May 8, 2026
…ns test

Address review feedback from @EmmaLouise2018 on PR #4250:

1. `verdict_source` field on /api/registry/agents/:url/compliance
   — `AgentComplianceDetailSchema` gains optional `verdict_source`:
     'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
   — `getComplianceStatus` and `bulkGetComplianceStatus` join
     `agent_compliance_runs` via LATERAL subquery (dry_run=false,
     ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by
     of the most recent run.  No migration needed.
   — Endpoint response emits `verdict_source: status.last_triggered_by`.
   — `AgentComplianceStatus` interface gets `last_triggered_by` field.

2. Last-write-wins contract test
   — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT
     DO UPDATE semantics: every recordComplianceRun call overwrites
     agent_compliance_status regardless of triggered_by source.  A
     future change to first-write-wins or priority ordering would
     break these tests.

https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 8, 2026

Both asks addressed in the fixup commit (0c996a6):

1. verdict_source on /api/registry/agents/:url/compliance

AgentComplianceDetailSchema gets a new optional field:

verdict_source: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null

getComplianceStatus and bulkGetComplianceStatus now include a LATERAL subquery against agent_compliance_runs (dry_run = false, ORDER BY tested_at DESC LIMIT 1) to surface triggered_by as last_triggered_by on AgentComplianceStatus. The endpoint emits it as verdict_source. No migration needed — the data is already there.

2. Last-write-wins contract test

New compliance-db-last-write-wins.test.ts (4 cases) pins:

  • Every recordComplianceRun call issues ON CONFLICT (agent_url) DO UPDATE — never DO NOTHING.
  • triggered_by is forwarded verbatim for both heartbeat and owner_test sources.
  • getComplianceStatus returns last_triggered_by from the most recent non-dry run (asserts dry_run = false and ORDER BY tested_at DESC LIMIT 1 are in the SQL).

A future switch to first-write-wins or source-priority filtering breaks these tests.

All 3302 unit tests pass.

Session: https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ


Generated by Claude Code

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 9, 2026

Code review (expert pass): solid root, ready to merge after stripping committed dist/ files.

Block:

  • dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} are checked in. Generated artifacts shouldn't be in the diff. Remove and amend.

Nits (non-blocking):

After dist strip, mark ready and the chain unblocks (#4263 is clean too — see #4264 for the real chain blocker).

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 9, 2026

Blocker addressed — pushed c71809a0.

  • Removed dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} via git rm.
  • Added .gitignore rules for dist/schemas/*.{js,js.map,d.ts,d.ts.map} so build artifacts in that directory can't sneak back in.

Nits noted (not fixed):

Ready to mark out of draft when you are.


Triaged by Claude Code. Session: https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx


Generated by Claude Code

claude and others added 4 commits May 8, 2026 20:07
…ce state

Closes the 12-hour gap between owner-triggered storyboard runs and the public
/api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4).

When evaluate_agent_quality is triggered by the agent owner, the result is now
written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status
with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history
(deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK
constraints. notifyComplianceChange is intentionally suppressed for owner runs to
prevent iteration-loop Slack spam.

https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb
…ns test

Address review feedback from @EmmaLouise2018 on PR #4250:

1. `verdict_source` field on /api/registry/agents/:url/compliance
   — `AgentComplianceDetailSchema` gains optional `verdict_source`:
     'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
   — `getComplianceStatus` and `bulkGetComplianceStatus` join
     `agent_compliance_runs` via LATERAL subquery (dry_run=false,
     ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by
     of the most recent run.  No migration needed.
   — Endpoint response emits `verdict_source: status.last_triggered_by`.
   — `AgentComplianceStatus` interface gets `last_triggered_by` field.

2. Last-write-wins contract test
   — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT
     DO UPDATE semantics: every recordComplianceRun call overwrites
     agent_compliance_status regardless of triggered_by source.  A
     future change to first-write-wins or priority ordering would
     break these tests.

https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
…rtifacts

Generated JS/TS files don't belong in source control. Also adds
.gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence.

https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx
@EmmaLouise2018 EmmaLouise2018 force-pushed the claude/issue-4247-owner-test-canonical-write branch from c71809a to 42e7f37 Compare May 9, 2026 00:09
EmmaLouise2018 added a commit that referenced this pull request May 9, 2026
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added
to the compliance API but the dashboard wasn't yet rendering:

- compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)"
  / "(webhook)" after Last checked, so operators see whether the
  current verdict came from their own evaluate_agent_quality run or
  the scheduled heartbeat.
- history panel: per-run badge with the same source label, info-blue
  for owner_test and neutral for the rest. Pre-PR-1 rows render with
  neutral — no regression.

No backend changes; pure UI surfacing of fields already in the API.
Stacked on PR #4250.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants