RFC: AAO Verified via canonical test campaigns (supersedes two-tier model in #2965)

# AAO Verified (Live) via Canonical Test Campaigns

**Status:** Revised draft (v3) — naming aligned with PR [#2153](https://github.com/adcontextprotocol/adcp/pull/2153): single brand mark "AAO Verified" with axis qualifiers `(Spec)` and `(Live)` instead of the v2 draft's two distinct mark names ("AdCP Conformant" / "AAO Verified"). The wire format reflects this with `verification_modes: string[]` in the JWT and registry API — `["spec"]` today, `["spec", "live"]` once canonical campaigns light up. End-state machinery is unchanged; only the public framing collapses to a single brand word with composable qualifiers. See PR #2153 for the rename in code + docs.
**Author:** Brian O'Kelley + Claude
**Date:** April 2026
**Milestone:** 4.0 (final trigger flip for `(Live)`). Transitional machinery already shipping in 3.x: merged PR #3001 (eight observability checks + `attestation_verifier` scope) + PR #2153 (qualifier framing in code, JWT, registry API, SVG).

## TL;DR

**AAO Verified (Live)** is the top-tier trust mark for AdCP agents. Today (3.x), AAO issues it by continuously observing the seller's own campaigns running on a designated compliance account ([`docs/building/aao-verified.mdx`](https://github.com/adcontextprotocol/adcp/blob/main/docs/building/aao-verified.mdx)). In 4.0, AAO becomes the operator: it runs a canonical test campaign per declared specialism, weekly, through the seller's real ad-server integration. AAO Verified (Live) issues when the canonical campaign stays healthy and revokes when it doesn't. The same eight checks from 3.x apply per canonical campaign; only the issuance trigger changes — sellers don't re-plumb.

**AAO Verified (Spec)** (storyboard-issued, 3.x and 4.0) remains the lower-weight publishable mark. Verified (Live) ⇒ Verified (Spec) — a storyboard regression blocks AAO Verified (Live) issuance.

If you've read [`docs/building/aao-verified.mdx`](https://github.com/adcontextprotocol/adcp/blob/main/docs/building/aao-verified.mdx), the end state is: same machinery, different trigger. This RFC defines the trigger flip, scopes the per-specialism work to get there, and sizes AAO's operational commitment honestly.

---

## The problem with "passes the basic tests"

Our current badge path (#2153) issues a mark to agents that pass storyboards against `simulate_delivery`. That's wire-format conformance. A buyer seeing "Verified" on an agent's website is not thinking *"the JSON shapes are right."* They're thinking *"I can actually buy media through this agent."*

Those are different claims. Storyboards prove the first. Only a live campaign proves the second. The two-mark model preserves both claims as distinct publishable signals; canonical campaigns flip what "Verified" means.

## Two marks, not one

AdCP 3.x (shipping via merged [#3001](https://github.com/adcontextprotocol/adcp/pull/3001)) defines two distinct public trust marks:

| Mark | What it means | How it's issued (3.x) | How it's issued (4.0, this RFC) |
|------|---------------|-----------------------|--------------------------------|
| **AAO Verified (Spec)** | Wire format is right: the agent implements its declared `supported_protocols` and `specialisms` per the storyboard suite. | Storyboards pass. | Same — storyboards pass. |
| **AAO Verified (Live)** | The declared capability is actually implemented in the seller's live production stack: real impressions, real inventory, sustained over weeks. | Continuous observability of whatever campaigns the seller runs on a designated compliance account. | **AAO-operated canonical test campaign** for the specialism runs weekly and stays healthy. |

**Containment.** Verified (Live) ⇒ Verified (Spec). A seller cannot hold AAO Verified (Live) without AAO Verified (Spec) — a storyboard regression blocks AAO Verified (Live) issuance even when live campaigns look fine. You can be Conformant without being Verified (storyboard-passing but no live traffic yet); you cannot be Verified without being Conformant.

**Why two marks, not one.** The earlier single-bar framing dropped storyboards from the public-badge path entirely. Buyers got a stronger signal, but pre-production agents, non-revenue pilots, and specialisms without canonical flows lost any public signal. The two-mark model keeps AAO Verified (Spec) as a useful lower-weight mark (searchable registry, "storyboards passing") and reserves AAO Verified (Live) for in-market sellers with canonical-campaign health.

### Per-check mapping: 3.x enrollment-based → 4.0 canonical-campaign-based

The same eight checks apply at both versions; the locus of observation moves from "the compliance account's ongoing activity" to "AAO's canonical flight."

| # | Check | 3.x (enrollment) | 4.0 (canonical campaign) |
|---|-------|------------------|--------------------------|
| 1 | **Liveness** | At least one active buy in the compliance account ≥ 80% of the 7–14 day rolling window | At least one healthy completed canonical flight per scheduled cadence (weekly cadence = ≥ 1 healthy flight per rolling 7-day window; missed flights count as failures, not as quiet periods) |
| 2 | **Freshness** | Same `get_media_buy_delivery` query on day N vs N+1 returns different numbers | Same query *during* a canonical flight's flight window returns changing numbers across the flight |
| 3 | **Plausibility** | Monotonic impressions, `by_package` sums, non-zero where expected, `pacing_index` consistency | Same checks applied to the canonical flight's reported delivery |
| 4 | **Filter correctness** | `start_date` / `end_date` narrow results against account history | Same applied to the canonical flight |
| 5 | **Reporting-surface cross-consistency** | All declared reporting surfaces agree across the compliance account window | All declared surfaces agree for the canonical flight; skipped if polling is the only declared surface |
| 6 | **Lifecycle correctness** | Completed / paused / canceled buys behave correctly across the account | The canonical flight completes, pauses, and cancels correctly when AAO induces those transitions |
| 7 | **Introspection consistency** | `authorization` object on `sync_accounts` / `list_accounts` matches actual enforcement | Same — runner verifies `authorization` before and during each canonical flight |
| 8 | **Seller-initiated state-transition propagation** | Trafficker / finance / lifecycle changes surface within seller's declared status-freshness tolerance | Same applied to the canonical flight; AAO induces seller-initiated transitions in some weeks to test propagation |

## The 4.0 proposal: canonical test campaigns

**AAO Verified (Live) in 4.0 = an AAO-operated canonical test campaign for this specialism is currently running through your agent and reporting is flowing.**

One trigger. One bar. You earn it by letting AAO run real buys through you; you lose it when those buys stop working.

### Specialism as contract

Each specialism that supports canonical campaigns gets exactly one **canonical test flow** — a real end-to-end interaction that exercises the production code path for that capability. AAO runs it weekly. If every step works, the AAO Verified (Live) mark stays green for that specialism; if any step fails, it degrades.

"Canonical test campaign" is a shorthand; the real idea is "AAO runs an end-to-end canonical flow through your agent that exercises the same code path buyers would hit." The flow shape varies by specialism — a media-buy sales specialism runs a PSA flight; a governance specialism runs a compliant/non-compliant brief pair; a signals specialism runs a discover-then-activate flow.

The common thread: AAO is an operator participating in the ecosystem. Every AAO Verified (Live) agent has AAO as an active counterparty exercising their real code paths weekly. Breakage on any seller surfaces within a week.

### Canonical campaign spec format

Each specialism with a canonical flow gets a sibling YAML alongside its storyboard:

```
/compliance/{version}/specialisms/{id}/index.yaml          (storyboards — AAO Verified (Spec))
/compliance/{version}/specialisms/{id}/canonical-campaign.yaml  (AAO Verified (Live), 4.0)
```

`canonical-campaign.yaml` declares: the brief, budget envelope, flight length, creative shape, expected per-check coverage, and attribution rules (which checks contribute to which specialism marks for composed flights). AAO authors and maintains these alongside the storyboards; sellers don't author them.

### Composition attribution rules

A single canonical flight that exercises composed specialisms (e.g., `sales-guaranteed` + `creative-template` + `governance-aware-seller`) needs deterministic attribution when checks fail. Rules:

- Each check in `canonical-campaign.yaml` declares which specialism(s) it contributes to.
- A single-specialism check failure degrades only that specialism's mark.
- A check contributing to multiple specialisms uses **most-specific-specialism-first** resolution: a failure on a check tagged `[sales-guaranteed, creative-template]` defaults to the more-specific specialism if the failure mode unambiguously implicates one (e.g., trafficked-creative success + delivery failure → `sales-guaranteed`); when the failure is ambiguous, all tagged specialisms degrade.
- Sellers MAY appeal an attribution after-the-fact via the dispute path (see [Open questions](#open-questions)).

### Runner request-signing integration

Sellers that advertise `request_signing.required_for` covering tasks the canonical runner uses (`create_media_buy`, `sync_creatives`, `update_media_buy`, etc.) MUST be invoked through AAO's signed-request runner. The runner honors the seller's declared signing profile; failure to sign breaks the canonical flow but is attributed to the runner (an AAO-side bug), not the seller. Conversely, a seller that requires signing but accepts unsigned canonical-runner requests fails the canonical campaign for the relevant signing-required specialism.

This is implicit verification of `signed-requests` (see [Per-specialism coverage](#per-specialism-coverage)) — every canonical flight against a signing-required seller exercises the signing pipeline.

## Per-specialism coverage

All 20 stable specialisms walked through. Target column indicates when AAO plans to stand up the canonical flow; "Conformant-only" specialisms keep the AAO Verified (Spec) mark but have no path to AAO Verified (Live).

The "Why" column states the access blocker or design constraint when complexity is High; for Low-complexity specialisms, the canonical flow shape is short.

### Media-buy specialisms

| Specialism | Canonical flow | Why (Low/Medium/High) | Target |
|------------|----------------|-----------------------|--------|
| `sales-non-guaranteed` | Programmatic PSA, 24-hour pacing check, AAO as buyer | **Low** — auction inventory is universally available; fastest feedback loop | **Pilot (phase 1)** |
| `sales-guaranteed` | :30 PSA, 7-day guaranteed flight, delivery heartbeat | Low–Medium — requires guaranteed-inventory partner | Phase 2 |
| `sales-proposal-mode` | AAO issues brief → seller returns proposal → AAO accepts → buy runs 7 days | Medium — exercises proposal negotiation loop end-to-end | Phase 2 |
| `sales-catalog-driven` | Catalog-driven PSA (e.g., retail-media test SKU) with conversion-tracking ping | Medium — requires catalog seed and attribution test surface | Phase 3 |
| `sales-broadcast-tv` | :30 PSA in primetime, watch C3 → C7 maturation | **High** — broadcast inventory access at scale is the blocker; PSA slots already mostly committed (Ad Council holds most primetime); C7 settlement is 15–22 days. May be reclassified to Conformant-only if PSA partnership doesn't land. | Phase 4 (aspirational) |
| `sales-social` | Platform-native PSA flow per platform | **High** — each platform (Meta / TikTok / Snap / X / LinkedIn) needs its own integration; tenant-isolation makes "AAO probes targeting" a security review for some platforms; the universe of social sellers is small. May be reclassified to per-platform opt-in only. | Phase 4 (aspirational) |
| `governance-aware-seller` | Canonical campaign with governance hooks set; seller propagates approvals/conditions/denials unchanged | Low — composes with whatever sales specialism the seller also holds; same flight, additional checks | Phase 2 (alongside sales) |
| `audience-sync` | Sync synthetic-but-resolvable test list; verify reflection in delivery | Medium — privacy surface (synthetic identifiers owned by AAO; documented consent posture) | Phase 3 |

### Creative specialisms

| Specialism | Canonical flow | Why | Target |
|------------|----------------|-----|--------|
| `creative-template` | Weekly brief → trafficked creative → served on AAO's canonical sales partner | Low — composes with a sales specialism; same flight | Phase 2 |
| `creative-generative` | Weekly brief → generated spot → serves on AAO's canonical sales partner | Low–Medium — composes with sales; generation cost is the only delta | Phase 2 |
| `creative-ad-server` | PSA tagged with known macros; verify click/impression via macro callbacks; ad-server health probe | Low — single canonical PSA per channel family covers it | Phase 2 |

### Signals specialisms

| Specialism | Canonical flow | Why | Target |
|------------|----------------|-----|--------|
| `signal-marketplace` | Discover canonical test signal → activate → confirm targeting takes effect in a sales flow | Medium — requires sales-flow partner | Phase 3 |
| `signal-owned` | AAO exposes canonical first-party test segment → seller resolves it on the buy path → verify reflection | Medium — synthetic test segment plumbing | Phase 3 |

### Governance specialisms

| Specialism | Canonical flow | Why | Target |
|------------|----------------|-----|--------|
| `content-standards` | Submit one brand-safe and one unsafe brief weekly; verify approval/rejection behavior | Low — decision-check, not delivery-check | Phase 2 |
| `property-lists` | Sync canonical test property list; verify targeting narrows per the list during a canonical flight | Low — small synthetic property list, easy to author | Phase 2 |
| `collection-lists` | Sync canonical test collection list; verify inclusion/exclusion on content programs | Low | Phase 3 |
| `governance-delivery-monitor` | Active canonical campaign under monitoring; AAO induces drift; verify alert fires within threshold | Medium — composes with a sales flow | Phase 3 |
| `governance-spend-authority` | Submit conditional-approval brief weekly; watch human-in-loop approval flow complete within SLA | **Medium–High** — latency-sensitive (requires human response within SLA); requires AAO-side human-in-loop partner | Phase 4 |

### Brand specialisms

| Specialism | Canonical flow | Why | Target |
|------------|----------------|-----|--------|
| `brand-rights` | — | The synthetic license acquisition flow doesn't exercise the real code path (real licenses are negotiated, priced, countersigned). Re-evaluate once `brand-rights` matures past preview status. | **Conformant-only (4.0)** |

### Security / transport specialisms

| Specialism | Canonical flow | Why | Target |
|------------|----------------|-----|--------|
| `signed-requests` | — | Implicitly exercised by every canonical campaign — every runner request to a signing-required seller tests the signing pipeline (see [Runner request-signing integration](#runner-request-signing-integration) above). A standalone canonical flow would be redundant. **Taxonomy note:** `signed-requests` is classified as a specialism in 3.x by historical accident; it is a cross-protocol transport-layer concern, not a media-buy specialism. [#3075](https://github.com/adcontextprotocol/adcp/issues/3075) tracks reclassification as a universal capability-gated storyboard; the deprecation landed as a patch in [#3076](https://github.com/adcontextprotocol/adcp/pull/3076). | **Conformant-only (independent verification implicit in other canonical flows)** |

### Capabilities without a specialism (Conformant-only)

These are declared in `get_adcp_capabilities` but are not specialisms — listed here so readers don't expect canonical flows for them. Their behavior is verified by storyboards and (where applicable) implicitly exercised by every canonical campaign.

| Capability | Why | Verification |
|------------|-----|--------------|
| `webhook_signing` | Implicitly exercised — every canonical-flight webhook tests signing | Storyboards + implicit during canonical flights |
| `idempotency` | Storyboard-provable; no ongoing live-data aspect | Storyboards |
| `compliance_testing` (test controller) | Self-describing surface for the compliance runner itself | Storyboards |

## Pilot and phase plan

The RFC commits AAO to running canonical campaigns at ecosystem scale. That's a substantial operational commitment. The pilot exists to test that AAO can staff it before 4.0 locks in the trigger flip — *if AAO can't staff it, the trigger flip doesn't happen and AAO Verified (Live) stays at the 3.x enrollment-based mechanism indefinitely.*

### Phase 1 — Pilot (target: 30–60 days)

- **Scope**: one specialism (`sales-non-guaranteed`), 5–10 participating sellers (start at 5, expand to 8–10 by week 3 if onboarding holds)
- **Cadence**: weekly 24-hour flights — ~20–40 flights total
- **Staffing**: one on-call engineer (part-time, best-effort response)
- **Budget**: ~$500/month for paid inventory if remnant/PSA options are unavailable
- **Learn**: observed failure rate, time-to-diagnosis, tooling gaps, seller friction in onboarding
- **Phase-2 transition criteria** (all required):
  - Median seller enrollment time < 2 hours; no seller exceeds 8 hours
  - Every incident root-caused and documented within 48 hours (frequency is OK; opacity is not)
  - False-positive revocation rate (canonical-campaign failure attributed to seller when actually AAO-side) < 5% across the pilot window
  - At least one seller publicly committed to keep participating post-pilot

Pilot sellers should be a mix of seller shapes, not the highest count: one SSP with mature remnant/PSA operations, one direct publisher, one ad network, one DSP-side partner if one wants in, one self-hosted reference implementation. Spread of seller archetypes > number of sellers.

### Phase 2 — Broaden to clean flows (target: 90 days after Phase 1 exit)

Add: `sales-guaranteed`, `sales-proposal-mode`, `governance-aware-seller`, `creative-ad-server`, `creative-template`, `content-standards`, `property-lists`.

- ~7 specialisms × ~10 sellers × weekly = ~70 flights/week
- Staffing: 1 FTE ops engineer + part-time TPM
- Budget: $2–3K/month inventory + content sourcing
- **Phase-3 transition criteria**: all Phase-2 specialisms have ≥ 5 verified sellers; PSA content partnership (e.g., Ad Council MOU) signed for at least 2 channel families; runner uptime ≥ 99.5% over 60 days

### Phase 3 — Design-moderate flows (target: 6 months after Phase 2 exit)

Add: `sales-catalog-driven`, `audience-sync`, `signal-owned`, `signal-marketplace`, `collection-lists`, `governance-delivery-monitor`, `creative-generative`.

- ~14 specialisms × ~20 sellers × weekly = ~280 flights/week
- Staffing: 1 FTE ops + 1 FTE runner engineer + part-time TPM; on-call rotation
- Budget: $5–8K/month; synthetic-identity and test-segment infrastructure built
- **Phase-4 transition criteria**: composition attribution rules (canonical-campaign.yaml) tested in production; seller appeal/dispute path exercised at least twice end-to-end; legal sign-off on synthetic-identity posture for `audience-sync`

### Phase 4 — Design-heavy flows (target: 6 months after Phase 3 exit)

Aspirational. Each of the four Phase-4 specialisms may not land; the spec acknowledges this and accepts Conformant-only as the steady-state for any specialism whose canonical flow can't be staffed/funded.

- `sales-broadcast-tv` — gated on Ad Council partnership or equivalent licensed PSA inventory
- `sales-social` — gated on per-platform Marketing Partner relationships; expect 2–3 of 5 platforms, not all
- `governance-spend-authority` — gated on AAO-side human-in-loop partner and SLA commitment
- `brand-rights` — reclassified to Conformant-only unless real-license flow becomes feasible

### Honest maximum

This is a real institutional commitment. **The pilot exists to test that AAO can staff it before 4.0 locks in the trigger flip.** If the answer is no, AAO Verified (Live) stays at the 3.x enrollment-based mechanism — the spec is forward-compatible.

At Phase 4 steady state, AAO is running ~50 canonical flights per day across the ecosystem. The expert-review estimate of the fully-loaded annual cost is in the range of **$900K–$1.2M** (~3.5–4 FTE + inventory + legal + tooling), not the ~$400K implied by the per-phase numbers above. Roles undersized in the per-phase view that the operational-plan issue should size honestly: seller-success/onboarding (1 FTE), content/partnerships lead (0.5 FTE), legal review (~$40K/yr external), comms / customer-marketing (0.25 FTE). Inventory at steady state likely runs $25–35K/month including CTV CPMs, not $10–15K.

### Operational plan (separate issue)

A separate "AAO canonical-campaign runner: operational plan" issue should cover service ownership (in-house vs. contracted), monitoring stack, PSA content partnerships (Ad Council MOU first), 24/7 on-call policies, budget approval, legal review cadence, and seller-success staffing. That issue should close before 4.0 ships; this RFC's spec content can land independently.

## How the mark works (4.0 end state)

### Issuance

AAO creates the canonical test campaign for each declared specialism via standard AdCP (`create_media_buy` → `sync_creatives` → delivery observation). If every step succeeds and reporting looks healthy after 7 days, the AAO Verified (Live) mark issues for that specialism.

The eight checks from the 3.x transitional spec apply per canonical campaign rather than per compliance account; see the [per-check mapping table](#per-check-mapping-3x-enrollment-based--40-canonical-campaign-based).

### Maintenance

Weekly refresh flight per specialism. Reporting heartbeats continuously. AAO re-runs storyboards against the seller's live agent endpoint **daily** to confirm AAO Verified (Spec) remains current — the canonical-campaign cadence and the storyboard cadence are independent, so wire-format regressions are caught within 24 hours regardless of the canonical-flight schedule.

### Degradation and grace

- Any check fails → mark enters **7-day grace** (extended from the v1 draft's 48 hours, in light of seller-side commercial harm risk; transient ad-server issues need real ops response time, including weekends)
- 7 days continuous failure → revoke for that specialism (other specialisms unaffected)
- Storyboard regression → enters the same 7-day grace, applies to all of that seller's AAO Verified (Live) specialisms (containment); a seller has 7 days to fix the wire-format regression before AAO Verified (Live) revokes
- Membership lapses → revoke immediately (existing AAO-membership behavior)

During grace, the public mark renders as "AAO Verified (Live) — Monitoring" rather than disappearing. This avoids a flap-induced commercial harm scenario where a seller loses a deal because of a transient lapse the issue gets fixed within hours.

### Recovery

Next successful canonical flight (or storyboard re-run for storyboard regressions) → mark reissues for that specialism.

### Per-specialism independence

A seller declaring multiple specialisms gets multiple independent AAO Verified (Live) specialisms that degrade independently. If one canonical flow breaks, only that specialism lapses; the others stay green.

**Buyer-facing presentation.** For UX legibility at scale, the registry / brand.json renders specialisms grouped into four buyer-legible categories: **Sales** (all `sales-*`), **Creative** (`creative-*`), **Signals** (`signal-*`), **Governance** (`governance-*`, `content-standards`, `property-lists`, `collection-lists`, `audience-sync`). The category mark renders as `Verified` only if all of the seller's claimed specialisms in that category are currently Verified; one lapsing specialism flips the category to `Verified — Monitoring` with a tooltip listing the specific lapsed specialism. This keeps the buyer-facing signal clean while preserving the per-specialism technical reality underneath.

## What storyboards become

Storyboards remain essential and publicly visible — they are the **AAO Verified (Spec)** mark.

- **AAO Verified (Spec)** is storyboard-issued. Published, searchable in the registry, useful for pre-production agents, beta sellers, specialisms without canonical flows, and developers checking their implementation.
- **Pre-req for AAO Verified (Live).** A seller cannot be AAO Verified (Live) without being AAO Verified (Spec).
- **Daily re-run against the seller's live agent endpoint** is what keeps Conformant current. AAO's runner authoritatively decides whether storyboards are passing; sellers can dispute a result via the appeal path.
- **Regression test.** Wire-format violations that live data might paper over still get caught.
- **Local dev-CI.** Storyboards run fast against `@adcp/client`'s test controller during development.

A seller's storyboards at 100% green still doesn't get AAO Verified (Live) until the canonical campaign runs cleanly for 7 days. A seller whose canonical campaign is healthy but fails a storyboard enters the same 7-day grace as a canonical-campaign failure.

## What the seller provides (4.0)

Same as 3.x enrollment — none of the seller-facing obligations change at the trigger flip:

1. **One compliance account** on their platform (sandbox or real).
2. **PSA inventory access** (or an equivalent zero-cost path — remnant, house, or a small real budget under $100/week).
3. **The `attestation_verifier` scope** (from #2964 / #2994, shipping in 3.x) granted to AAO's compliance identity.

No new seller obligation at 4.0. The trigger flips on AAO's side; sellers don't re-enroll or re-plumb.

## Why this is better than the 3.x enrollment-based model

1. **AAO-as-operator.** AAO is in-market — placing real buys, accepting real reporting, exercising real signal flows. The mark is a byproduct of AAO's real operator activity, not a separate verification pipeline.
2. **Teach-to-test is infeasible at this surface area.** Stubbing a 7-day canonical flight across create → traffic → delivery → maturation while handling an ambiguous mix of canonical and probe traffic over weeks is essentially building a working ad server. The canonical-campaign trigger combined with secondary-identity probing closes the most plausible attack surfaces.
3. **Predictable canonical content.** AAO sources one canonical PSA per channel family; sellers plug in to a known flow rather than inventing one.
4. **Continuous reality check with ecosystem CI.** Every AAO Verified (Live) agent has a known-good counterparty exercising their real code paths weekly. Breakage on any seller surfaces within a week across the ecosystem.
5. **Per-specialism independence with buyer-legible category roll-up.** A seller declaring multiple specialisms can have some lapse while others stay green; the registry's category roll-up keeps the buyer signal legible at scale.

## Open questions

1. **PSA content sourcing.** Ad Council partnership is the highest-confidence path; AAO-branded "Running on AAO" PSAs are a fallback (own production cost ~$200–300K/year for a full multi-channel rotation; Ad Council MOU is ~6 months of BD). First-year content commitment lives in the operational-plan issue.
2. **Sellers that genuinely can't take PSAs.** Retail media (auction integrity, merchandising contracts), some social platforms (tenant-isolation security review). Fallback paths: small real budget, slower cadence (monthly), or stay Conformant-only.
3. **Buyer demand commitment.** AAO Verified (Live) as a "filter" is aspirational until a tier-1 holdco (WPP / Publicis / Omnicom / IPG) publicly requires it in an agentic-buying RFP. Without that, AAO Verified (Live) is a nice-to-have. **This is a gating question for the 4.0 trigger flip and should be tracked separately, not as a spec question.**
4. **Privacy / legal posture for `audience-sync`.** Synthetic identifiers owned by AAO with documented opt-in; or AAO staff MAID/HEM list with consent on file. Whichever AAO's legal review supports. Lives in operational-plan issue.
5. **Seller appeal / dispute path.** When a canonical flight fails for AAO-side reasons (PSA inventory outage, runner bug, attribution misfire), how does the seller dispute the lapse? AAO publishes a dispute SLA (e.g., 48h to first response, 5 business days to resolution); appeals don't extend grace, they reverse revocations after the fact.
6. **AAO's own runner audit posture.** Who audits AAO's canonical-campaign runner? Self-attested today; longer-term, an MRC partnership where MRC accredits the runner's process is worth exploring (AAO operates, MRC accredits the operation — strategic complement, not competitor).
7. **Strategic positioning vs. MRC / TAG / IAB / BPA.** AAO-as-operator is genuinely new in the ad-tech attestation landscape. The strategic case is "agent-native" (existing bodies assume human ad ops; AAO is the only body that understands AdCP wire format). MRC partnership is worth exploring before Phase 2.
8. **Multi-specialism flight design.** A canonical flight can exercise composed specialisms (e.g., `sales-guaranteed` + `creative-template` + `governance-aware-seller`) under a single flight. AAO SHOULD reuse canonical flights to minimize flight count; `canonical-campaign.yaml` declares which specialisms a flight covers. Attribution rules above resolve partial failures.

## Failure-mode contingencies

Five realistic failure modes named to make the operational-plan issue concrete:

1. **Seller serves PSAs cleanly to AAO, fakes data to probes.** Mitigation: third-party measurement cross-check on canonical flights (DV / IAS impressions vs. AAO's reporting). Adds budget line ~$2–3K/month at Phase 3 and beyond. Not in the per-phase numbers; should be added to the operational-plan issue.
2. **PSA content partnership falls through mid-year.** Mitigation: pre-produce one fallback PSA per channel family during Phase 2 (~$60K one-time) as insurance.
3. **Integration breaks; seller loses revenue from "formerly Verified" status.** Mitigation: 7-day grace (above) plus public "Monitoring" state during grace. May warrant insurance for commercially-harmed wrongly-lapsed sellers; legal review.
4. **Buyers conflate AAO Verified (Live) with brand-safety / viewability.** Mitigation: standardized disclaimer on every surface — *"AAO Verified (Live) attests to AdCP capability conformance and live ad-server integration. It is not a brand-safety, viewability, or measurement-quality claim."*
5. **24/7 on-call misses an incident; dozens lapse.** Mitigation: staged-degradation rule (no mass-lapse exceeding N sellers within any 6-hour window without TPM review); commercial-harm insurance (legal line item).

## Impact on shipped PRs

- **#2153 (badge infrastructure)** — merged as MVP with storyboard-triggered issuance. In the two-mark model, that becomes the **AAO Verified (Spec)** mark permanently (storyboards remain the trigger). **AAO Verified (Live)** gets a parallel trigger once the canonical-campaign runner comes online per specialism. Nothing to revert; the badge machinery is reusable for the second mark.
- **#3001 (AAO Verified (Live) transitional spec, merged)** — shipped the two-mark language, eight-check observability, `attestation_verifier` scope integration, Path A/B enrollment, webhook-ownership contract. All that machinery is reusable under canonical-campaign triggering. The "Transitional framing" Note at the top of `aao-verified.mdx` points at this RFC.
- **#2994 (accounts authorization, open, targeting 3.1)** — introduces the `attestation_verifier` scope on sync/list responses. Required for AAO to hold the right scope when running canonical campaigns. Lands in 3.1.
- **#3016 (post-#2993 `valid_actions` tightening, open, targeting 3.1)** — closes the empty-`valid_actions` loophole that would let a seller hide non-AdCP buys from AAO. Required for brownfield Path B of the canonical runner. Lands in 3.1.

## Phased migration path

Pilot, 3.x ship, and Phase 1 deliberately overlap. The 3.x enrollment-based AAO Verified (Live) ships in late June 2026; the canonical-campaign pilot can run concurrently because they don't conflict — a seller in the pilot is dual-instrumented during the overlap.

1. **3.x ships (late June 2026).** AAO Verified (Live) available via enrollment-based continuous observation. AAO Verified (Spec) available via storyboard pass. Containment relationship enforced. All machinery (`attestation_verifier` scope, eight checks, Path A/B, webhook-ownership contract) in place.
2. **Pilot (June–August 2026, runs concurrently with 3.x).** AAO stands up the canonical-campaign runner for `sales-non-guaranteed` only, 5–10 sellers. Pilot is a non-committal trial; AAO Verified (Live) still issues via 3.x enrollment-based observation during this period. Learn from failures.
3. **Phased rollout (Q4 2026 onward).** Per-specialism, AAO flips AAO Verified (Live) issuance from 3.x enrollment-based → canonical-campaign-based. Phase 2 / 3 / 4 as above. **Both issuance paths coexist as OR during the transition window per specialism — either path grants the mark.** AAO publishes the per-specialism cutover date at least 30 days in advance; on the cutover date, enrollment-based issuance retires for that specialism and only canonical-campaign-based issuance grants the mark going forward. Storyboards remain mandatory throughout.
4. **4.0 ships (target: early 2027 or early 2028, depending on pilot outcomes).** Enrollment-based AAO Verified (Live) issuance retires for any specialism where a canonical campaign is operational. Specialisms without canonical flows (`signed-requests`, `brand-rights` if it stays preview, transport capabilities) stay AdCP-Conformant-only.

The expert-review feedback flagged the late-June 2026 → early 2027 timeline as compressed (Ad Council MOU alone is 6 months; FTE hires 4–5 months). **Realistic 4.0 ship target may be early 2028 instead of early 2027,** depending on Phase 1 / 2 outcomes. The spec is forward-compatible either way; the trigger flip is per-specialism and per-cutover-date, not a single big-bang event.

3.x machinery is the foundation; 4.0 is the trigger flip per specialism; the spec's normative seller obligations don't change between versions.

## Relation to other 4.0 work

The 4.0 Tier-2-equivalent check expansions filed separately are additive to the canonical-campaign machinery:

- [#3017](https://github.com/adcontextprotocol/adcp/issues/3017) — creative-approval pipeline liveness (applies to `creative-template`, `creative-generative`, social sales specialisms)
- [#3018](https://github.com/adcontextprotocol/adcp/issues/3018) — cancellation-propagation timing (applies to all sales specialisms)
- [#3019](https://github.com/adcontextprotocol/adcp/issues/3019) — billing reconciliation touch (opt-in, applies to sales specialisms)
- [#3020](https://github.com/adcontextprotocol/adcp/issues/3020) — IO / JWS signing workflow liveness (applies to `sales-guaranteed`)
- [#3009](https://github.com/adcontextprotocol/adcp/issues/3009) — multi-subscriber `reporting_webhooks[]` (relaxes Path B2's dedicated-tenant requirement)

These become additional checks *within* each canonical campaign once the runner lands. #3020 (JWS signing) composes naturally with `sales-guaranteed`'s canonical flow.

---

## Earlier draft notes (v2 changes)

This v2 incorporates expert-review feedback on the v1 revision:

- **Specialism count corrected.** v1 said "19 stable specialisms"; the canonical enum has 20 (added `signed-requests` to the Security/transport subsection as Conformant-only).
- **Per-check mapping table added.** v1 said "the eight checks apply per canonical campaign rather than per compliance account" without showing the mapping; v2 walks each check with 3.x → 4.0 behavior side-by-side.
- **Composition attribution rules promoted from "open question" to main body.** v1 left this as a question; v2 specifies most-specific-specialism-first resolution and locates the rules in `canonical-campaign.yaml`.
- **Canonical-campaign spec format committed.** v1 left `canonical-campaign.yaml` as a question; v2 commits to the path and contents.
- **Runner request-signing integration spelled out.** v1 placed `signed-requests` as Conformant-only without explaining how the runner itself complies with seller signing requirements.
- **AND vs OR resolved.** v1's phased migration said both issuance paths "coexist" without saying whether the mark needs both; v2 commits to OR with dated per-specialism cutover ≥ 30 days in advance.
- **Storyboard regression revocation specified.** v1 said "automatic" without a cadence or grace; v2 commits to daily storyboard re-runs and same-7-day-grace as canonical-campaign failures.
- **Grace extended from 48h to 7 days.** v1's 48h grace was commercially harsh (weekend incidents, ad-ops response times); v2 extends to 7 days with public "Monitoring" state.
- **Per-specialism UX rendering specified.** v1 left buyer-facing UX as ambiguous; v2 commits to category-grouped rendering with tooltip drill-down.
- **Liveness check (check 1) reinterpreted for canonical cadence.** v1's "80% of rolling window" doesn't translate to weekly flights (~14% liveness); v2 restates as ≥1 healthy flight per scheduled cadence.
- **`brand-rights` reclassified to Conformant-only.** v1 had it in Phase 4 with synthetic license acquisition; v2 acknowledges that flow doesn't exercise the real code path.
- **Phase-transition criteria added per phase.** v1 only had Phase-1 exit criteria; v2 has criteria gating Phase 2 → 3 → 4 transitions.
- **Failure-mode contingencies added.** Five realistic failure modes named explicitly.
- **Honest-maximum framing moved before the budget numbers.** v1 buried "the pilot exists to test that AAO can staff it" after the cost estimates.
- **Tone hedged.** "Teach-to-test structurally impossible" → "infeasible at this surface area" (matches merged `aao-verified.mdx`).
- **Timeline acknowledged as compressed.** v1's early-2027 4.0 target was unrealistic per expert review; v2 acknowledges early-2028 as the realistic alternative depending on pilot outcomes.
- **Buyer demand and MRC positioning surfaced as open questions** (Q3, Q7), not assumptions.



#	Check	3.x (enrollment)	4.0 (canonical campaign)
1	Liveness	At least one active buy in the compliance account ≥ 80% of the 7–14 day rolling window	At least one healthy completed canonical flight per scheduled cadence (weekly cadence = ≥ 1 healthy flight per rolling 7-day window; missed flights count as failures, not as quiet periods)
2	Freshness	Same `get_media_buy_delivery` query on day N vs N+1 returns different numbers	Same query during a canonical flight's flight window returns changing numbers across the flight
3	Plausibility	Monotonic impressions, `by_package` sums, non-zero where expected, `pacing_index` consistency	Same checks applied to the canonical flight's reported delivery
4	Filter correctness	`start_date` / `end_date` narrow results against account history	Same applied to the canonical flight
5	Reporting-surface cross-consistency	All declared reporting surfaces agree across the compliance account window	All declared surfaces agree for the canonical flight; skipped if polling is the only declared surface
6	Lifecycle correctness	Completed / paused / canceled buys behave correctly across the account	The canonical flight completes, pauses, and cancels correctly when AAO induces those transitions
7	Introspection consistency	`authorization` object on `sync_accounts` / `list_accounts` matches actual enforcement	Same — runner verifies `authorization` before and during each canonical flight
8	Seller-initiated state-transition propagation	Trafficker / finance / lifecycle changes surface within seller's declared status-freshness tolerance	Same applied to the canonical flight; AAO induces seller-initiated transitions in some weeks to test propagation

Mark	What it means	How it's issued (3.x)	How it's issued (4.0, this RFC)
AAO Verified (Spec)	Wire format is right: the agent implements its declared `supported_protocols` and `specialisms` per the storyboard suite.	Storyboards pass.	Same — storyboards pass.
AAO Verified (Live)	The declared capability is actually implemented in the seller's live production stack: real impressions, real inventory, sustained over weeks.	Continuous observability of whatever campaigns the seller runs on a designated compliance account.	AAO-operated canonical test campaign for the specialism runs weekly and stays healthy.

Specialism	Canonical flow	Why (Low/Medium/High)	Target
`sales-non-guaranteed`	Programmatic PSA, 24-hour pacing check, AAO as buyer	Low — auction inventory is universally available; fastest feedback loop	Pilot (phase 1)
`sales-guaranteed`	:30 PSA, 7-day guaranteed flight, delivery heartbeat	Low–Medium — requires guaranteed-inventory partner	Phase 2
`sales-proposal-mode`	AAO issues brief → seller returns proposal → AAO accepts → buy runs 7 days	Medium — exercises proposal negotiation loop end-to-end	Phase 2
`sales-catalog-driven`	Catalog-driven PSA (e.g., retail-media test SKU) with conversion-tracking ping	Medium — requires catalog seed and attribution test surface	Phase 3
`sales-broadcast-tv`	:30 PSA in primetime, watch C3 → C7 maturation	High — broadcast inventory access at scale is the blocker; PSA slots already mostly committed (Ad Council holds most primetime); C7 settlement is 15–22 days. May be reclassified to Conformant-only if PSA partnership doesn't land.	Phase 4 (aspirational)
`sales-social`	Platform-native PSA flow per platform	High — each platform (Meta / TikTok / Snap / X / LinkedIn) needs its own integration; tenant-isolation makes "AAO probes targeting" a security review for some platforms; the universe of social sellers is small. May be reclassified to per-platform opt-in only.	Phase 4 (aspirational)
`governance-aware-seller`	Canonical campaign with governance hooks set; seller propagates approvals/conditions/denials unchanged	Low — composes with whatever sales specialism the seller also holds; same flight, additional checks	Phase 2 (alongside sales)
`audience-sync`	Sync synthetic-but-resolvable test list; verify reflection in delivery	Medium — privacy surface (synthetic identifiers owned by AAO; documented consent posture)	Phase 3

Specialism	Canonical flow	Why	Target
`creative-template`	Weekly brief → trafficked creative → served on AAO's canonical sales partner	Low — composes with a sales specialism; same flight	Phase 2
`creative-generative`	Weekly brief → generated spot → serves on AAO's canonical sales partner	Low–Medium — composes with sales; generation cost is the only delta	Phase 2
`creative-ad-server`	PSA tagged with known macros; verify click/impression via macro callbacks; ad-server health probe	Low — single canonical PSA per channel family covers it	Phase 2

Specialism	Canonical flow	Why	Target
`signal-marketplace`	Discover canonical test signal → activate → confirm targeting takes effect in a sales flow	Medium — requires sales-flow partner	Phase 3
`signal-owned`	AAO exposes canonical first-party test segment → seller resolves it on the buy path → verify reflection	Medium — synthetic test segment plumbing	Phase 3

Specialism	Canonical flow	Why	Target
`content-standards`	Submit one brand-safe and one unsafe brief weekly; verify approval/rejection behavior	Low — decision-check, not delivery-check	Phase 2
`property-lists`	Sync canonical test property list; verify targeting narrows per the list during a canonical flight	Low — small synthetic property list, easy to author	Phase 2
`collection-lists`	Sync canonical test collection list; verify inclusion/exclusion on content programs	Low	Phase 3
`governance-delivery-monitor`	Active canonical campaign under monitoring; AAO induces drift; verify alert fires within threshold	Medium — composes with a sales flow	Phase 3
`governance-spend-authority`	Submit conditional-approval brief weekly; watch human-in-loop approval flow complete within SLA	Medium–High — latency-sensitive (requires human response within SLA); requires AAO-side human-in-loop partner	Phase 4

Capability	Why	Verification
`webhook_signing`	Implicitly exercised — every canonical-flight webhook tests signing	Storyboards + implicit during canonical flights
`idempotency`	Storyboard-provable; no ongoing live-data aspect	Storyboards
`compliance_testing` (test controller)	Self-describing surface for the compliance runner itself	Storyboards

RFC: AAO Verified via canonical test campaigns (supersedes two-tier model in #2965) #3046

Description

AAO Verified (Live) via Canonical Test Campaigns

TL;DR

The problem with "passes the basic tests"

Two marks, not one

Per-check mapping: 3.x enrollment-based → 4.0 canonical-campaign-based

The 4.0 proposal: canonical test campaigns

Specialism as contract

Canonical campaign spec format

Composition attribution rules

Runner request-signing integration

Per-specialism coverage

Media-buy specialisms

Creative specialisms

Signals specialisms

Governance specialisms

Brand specialisms

Security / transport specialisms

Capabilities without a specialism (Conformant-only)

Pilot and phase plan

Phase 1 — Pilot (target: 30–60 days)

Phase 2 — Broaden to clean flows (target: 90 days after Phase 1 exit)

Phase 3 — Design-moderate flows (target: 6 months after Phase 2 exit)

Phase 4 — Design-heavy flows (target: 6 months after Phase 3 exit)

Honest maximum

Operational plan (separate issue)

How the mark works (4.0 end state)

Issuance

Maintenance

Degradation and grace

Recovery

Per-specialism independence

What storyboards become

What the seller provides (4.0)

Why this is better than the 3.x enrollment-based model

Open questions

Failure-mode contingencies

Impact on shipped PRs

Phased migration path

Relation to other 4.0 work

Earlier draft notes (v2 changes)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions