diff --git a/.changeset/identitymatch-fcap-architecture-spec.md b/.changeset/identitymatch-fcap-architecture-spec.md new file mode 100644 index 0000000000..69f9f519fb --- /dev/null +++ b/.changeset/identitymatch-fcap-architecture-spec.md @@ -0,0 +1,36 @@ +--- +"adcontextprotocol": patch +--- + +IdentityMatch & frequency capping architecture, with both the wire-spec change and the implementation guidance landing as authoritative protocol docs. + +**Wire spec changes** (`identity-match-response.json`): +- Adds `serve_window_sec` (integer, 1–300, default 60) — per-package single-shot fcap window. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. Not a router response cache TTL. +- Deprecates `ttl_sec`. Originally documented as a router cache TTL but operationally functioned as a per-package serve throttle. Senders during the deprecation window populate both fields; receivers prefer `serve_window_sec`. Removed in a 3.0.x release ≥ 6 weeks after the 2026-04-26 notice (earliest 2026-06-07). + +**Doc updates** (authoritative implementation guidance): +- `docs/trusted-match/specification.mdx` — adds `serve_window_sec` field, marks `ttl_sec` deprecated, adds normative conformance invariants for IdentityMatch eligibility (audience intersection, fcap merge across identities, active state, audience freshness). Updates the caching section for the new contract. +- `docs/trusted-match/identity-match-implementation.mdx` (new page) — implementation guide covering the `fcap_keys` label model with tenant prefix and charset, reference valkey-backed data model (audience SET, exposure HASH, package HASH, fcap_policy HASH), merge rules with MAX recommended, SDK primitives (`decodeTmpx`, `writeExposure`, `upsertAudience`, `upsertPackage`, `upsertFcapPolicy`, `inspectExposure`), pluggable store interfaces (FrequencyStore / AudienceStore / PackageStore / FcapPolicyStore), production topology pattern (pub/sub buffering between tracking endpoint and store writer), and Redis-command walkthroughs for the five conformance scenarios. +- `docs/trusted-match/buyer-guide.mdx` — updates frequency-cap management and the serve-window contract sections; cross-links to the implementation page. +- `docs/trusted-match/migration-from-axe.mdx` — adds OpenRTB 2.6 `User.eids[]` cross-walk for buyers bridging from OpenRTB-shaped pipelines. + +**Three-layer model:** +- Wire spec (normative) — what crosses an agent boundary. +- Conformance invariants (normative) — backend-agnostic eligibility logic. +- Reference data model (non-normative) — Scope3's valkey-backed implementation choice. Buyers may use Aerospike, DynamoDB, or anything else; the SDK exposes pluggable store interfaces. The protocol describes WHAT the service must compute, not HOW it stores the data. + +**SDK primitives** ship across `@adcp/client` (TS), `adcp-go`, and `adcp` (Python). Same primitive surface in all three languages. Impression handling is two composable functions (`decodeTmpx` + `writeExposure`), not one bundled call — production tracking endpoints decode at intake and write downstream behind a pub/sub buffer; bundling would force synchronous topology. + +**Architecture history** preserved at `specs/identitymatch-fcap-architecture.md` (slimmed from 485 to 136 lines) — captures the design decisions, the deferred security/privacy follow-ups, the rollout plan, and consolidated Slack/PR-review threads. Implementation details now live in `docs/`. + +All TMP surfaces remain `x-status: experimental`. Wire change is purely additive (`serve_window_sec`); the `ttl_sec` removal lands in a later 3.0.x. + +**Tracked deferred follow-ups** (not in this PR): +- TMPX harvest → competitor-suppression attack +- Eligibility-as-audience-membership oracle (honeypot package_ids) +- Consent revocation between IdentityMatch and impression +- Side-channel via eligibility deltas +- `hashed_email` in TMPX leak surface +- DoS amplification via large `package_ids[]` +- Where do fcap policies live on the wire (currently SDK-only) +- Identity-graph plug-point interface for SDK diff --git a/CHANGELOG.md b/CHANGELOG.md index bbc303a47d..36c5aabced 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## Upcoming + +### Deprecation Notices — experimental surfaces + +- **TMP `identity-match-response.ttl_sec` is deprecated; replaced by `serve_window_sec`.** Notice published 2026-04-26. The `ttl_sec` field was documented as a router response cache TTL but operationally functioned as a per-package single-shot fcap, conflating two distinct concerns and silently breaking either when tuned. Replacement field `serve_window_sec` (integer, 1–300, default 60) carries the corrected semantic — *after serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again.* This is **not** a router response cache. Multi-impression frequency capping is a separate concern handled by buyer-side exposure records and policies, updated out-of-band via TMPX impression callbacks regardless of this window. During the deprecation period, senders SHOULD populate both `ttl_sec` and `serve_window_sec` with the same value; receivers SHOULD prefer `serve_window_sec` when both are present. Per the [experimental-status contract](docs/reference/experimental-status.mdx), the `ttl_sec` field MAY be removed no earlier than **2026-06-07** (6 weeks after this notice) in a 3.0.x release. `serve_window_sec` lands additively in 3.0.1 alongside this notice. Tracked in `specs/identitymatch-fcap-architecture.md`. + ## 3.0.0 See [release notes](docs/reference/release-notes.mdx) for migration guidance, or [prerelease upgrade notes](docs/reference/migration/prerelease-upgrades.mdx) for rc.3 adopters. diff --git a/docs/trusted-match/buyer-guide.mdx b/docs/trusted-match/buyer-guide.mdx index c466fcc41e..964abb1e02 100644 --- a/docs/trusted-match/buyer-guide.mdx +++ b/docs/trusted-match/buyer-guide.mdx @@ -16,7 +16,7 @@ A buyer agent exposes two HTTP/2 endpoints under a single base URL — `POST /co | Message type | Receives | Returns | |---|---|---| | `context_match_request` | Page/content signals, placement, geo | Offers with creative manifests | -| `identity_match_request` | Opaque user token, all active package IDs | Eligible package IDs + TTL | +| `identity_match_request` | Opaque user token, all active package IDs | Eligible package IDs + `serve_window_sec` | Each endpoint handles one message type. Both must respond in under 50ms. The router enforces this budget and will skip slow providers. @@ -120,11 +120,11 @@ The router sends you one or more opaque identity tokens and a list of ALL your a "type": "identity_match_response", "request_id": "id-9c4e", "eligible_package_ids": ["acme-outdoor-q2", "acme-loyalty-retarget"], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` -Return only the package IDs that pass your eligibility checks. Packages not in the list are treated as ineligible. The `ttl_sec` tells the router how long to cache this response — during that window, the router returns cached eligibility without re-querying you. The publisher uses cached eligibility to allocate across whatever placements exist. Set the TTL based on how quickly your eligibility state changes (frequency caps, audience updates, etc.). +Return only the package IDs that pass your eligibility checks. Packages not in the list are treated as ineligible. The `serve_window_sec` is a **per-package single-shot fcap**: after the publisher serves the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. Default 60s, max 300s. This is not a router response cache TTL — see [The serve-window contract](#the-serve-window-contract). **What you never receive** in Identity Match: page URLs, content topics, keywords, article text, or any content signal. You cannot determine what the user is looking at. @@ -143,21 +143,24 @@ You have no role in this step. The publisher controls activation. ## Frequency Cap Management -Cross-publisher frequency capping is the primary use case for Identity Match. Your agent maintains frequency state per user token: +Cross-publisher frequency capping is the primary use case for Identity Match. Your agent maintains frequency state per user identity: -- **Count impressions** by user token + package ID -- **Track recency** — when was the last impression for this token? -- **Apply caps** from the media buy: `max_impressions` per `window`, minimum `recency` between exposures -- **Exclude the package** from `eligible_package_ids` when a cap is hit -- **Set `ttl_sec`** to reflect how long this eligibility is valid — a shorter TTL means the router re-checks sooner, which is useful when a cap is close to being reached +- **Count impressions** per fcap key (campaign, advertiser, creative, line item, or whatever dimensions you cap on) per resolved user identity +- **Apply policies** with a window and max count +- **Merge across identities** for users with multiple resolved tokens (RampID + ID5 + MAID for the same person) — see [identity handling](/docs/trusted-match/identity-match-implementation#identity-handling-and-cross-identity-dedup) +- **Exclude packages** from `eligible_package_ids` when any cap on the package trips Because Identity Match runs across all publishers using TMP, a user who saw your ad on Publisher A will correctly show as over-frequency on Publisher B — even though you can't see which publisher sent the request. +For the implementation details — the fcap_keys label model, the reference valkey data model, audience and exposure record shapes, the SDK primitives, and conformance scenarios — see [Identity Match implementation](/docs/trusted-match/identity-match-implementation). + ### How Buyers Learn About Exposures -The `tmpx` field on the Identity Match response carries a TMPX token — an HPKE-encrypted blob containing the user's resolved identity tokens. The publisher substitutes `{TMPX}` into creative tracking URLs. When the ad serves, your impression pixel receives the encrypted token. Your cluster master decrypts it, logs the exposure against the user, and replicates updated frequency state to read replicas. This gives you real-time per-user exposure signals without the publisher seeing user identity. +The `tmpx` field on the Identity Match response carries a TMPX token — an HPKE-encrypted blob containing the user's resolved identity tokens. The publisher substitutes `{TMPX}` into creative tracking URLs. When the ad serves, your impression pixel receives the encrypted token. Your impression handler decrypts it (via the SDK's `decodeTmpx` primitive) and writes the exposure increment to your store (via `writeExposure`). Most production deployments separate decode (synchronous, at intake) from write (asynchronous, behind a queue) for buffering — see the implementation page for the topology pattern. + +This gives you real-time per-user exposure signals without the publisher seeing user identity. -See [TMPX Exposure Tokens](/docs/trusted-match/specification#tmpx-exposure-tokens) for the encryption format and binary token structure. +See [TMPX Exposure Tokens](/docs/trusted-match/specification#tmpx-exposure-tokens) for the encryption format and binary token structure, and [Identity Match implementation](/docs/trusted-match/identity-match-implementation#sdk-primitives) for the SDK functions. ## Provider Registration @@ -200,16 +203,18 @@ Common scenarios: - **Internal failure**: Return an error response. The router skips your provider and proceeds with other providers. - **Timeout**: If you can't respond within the latency budget, the router skips you. No error response needed — the router handles this. -## The TTL Caching Contract +## The serve-window contract + +The `serve_window_sec` field on Identity Match responses is a **per-package single-shot fcap** between the buyer and the publisher: + +- For each package in `eligible_package_ids`, the publisher MAY serve the user **at most one impression** on that package within `serve_window_sec` seconds. +- After the publisher has served one impression on each eligible package, the publisher MUST re-query Identity Match before serving any of those packages to the same user again. +- Multi-impression frequency capping (5/day, 100/month, etc.) is separate. It lives in your buyer-side state and is updated out-of-band via TMPX impression callbacks regardless of `serve_window_sec`. The serve window is the protocol-level throttle; multi-impression caps are buyer-internal policy. -The `ttl_sec` field on Identity Match responses is a caching contract between the buyer and the router: +The router MAY apply an internal deduplication cache keyed by `{identities_hash, provider_id, package_ids_hash, consent_hash}` (see spec for canonical bytes), but the publisher's binding contract is the serve-window throttle, not the router's cache window. -- The router caches the response for `ttl_sec` seconds, keyed by `{identities_hash, provider_id, package_ids_hash, consent_hash}` (see spec for canonical bytes). `identities_hash` is computed over the per-provider filtered subset you received — your cache partition is scoped to the identity types you resolve. -- During that window, the router returns cached eligibility without re-querying the buyer -- The publisher uses cached eligibility to allocate across whatever placements exist — a single pre-roll, a CTV ad pod, or a web page with multiple ad units -- The buyer doesn't need to know how many placements exist or how the publisher allocates +**Choosing a serve_window_sec value**: Default 60 seconds. Range 1–300. Anything longer than 300 makes per-package fcap too coarse for typical campaigns. Anything shorter than your IdentityMatch round-trip just adds load. 60 is a good default; tune downward if eligibility state shifts faster (close to a cap, audience just changed) or upward (max 300) if your IdentityMatch service is at load and the campaigns are tolerant of coarser fcap. -**Choosing a TTL**: Set the TTL based on how quickly your eligibility state changes. If frequency caps reset hourly, a 300-second TTL is reasonable. If a user is close to a cap limit, return a shorter TTL (e.g., 30 seconds) so the router re-checks sooner. ## Performance Requirements @@ -233,7 +238,7 @@ Buyers receive real-time per-user exposure signals via the `{TMPX}` macro. The I | | OpenRTB | TMP | |---|---|---| | **You receive** | Full bid request (user + content + device) | Either content OR identity, never both | -| **You return** | Bid price | Offer (creative manifest) or eligible package IDs + TTL | +| **You return** | Bid price | Offer (creative manifest) or eligible package IDs + serve window | | **Auction** | Exchange runs auction | No auction — publisher joins locally | | **Frequency** | Per-DSP only | Cross-publisher via Identity Match | | **Integration** | Per-exchange SSP adapter | Two endpoints (context + identity), any surface | diff --git a/docs/trusted-match/identity-match-implementation.mdx b/docs/trusted-match/identity-match-implementation.mdx new file mode 100644 index 0000000000..d7245110ef --- /dev/null +++ b/docs/trusted-match/identity-match-implementation.mdx @@ -0,0 +1,429 @@ +--- +title: Identity Match Implementation Guide +sidebarTitle: IdentityMatch Implementation +description: "Implementation guidance for the buyer-side IdentityMatch service — fcap_keys label model, exposure-log reference data model, SDK primitives, and conformance scenarios." +"og:title": "AdCP TMP IdentityMatch Implementation Guide" +--- + +# Identity Match Implementation Guide + +This page covers how to implement the buyer side of TMP's Identity Match operation. The wire spec lives in the [TMP specification](/docs/trusted-match/specification); the conformance invariants the service must satisfy are also normative there. What lives on this page is **implementation guidance** — the data model, the SDK primitives, and the operational shape of a working IdentityMatch service. Storage backend is an implementer choice; the SDK exposes pluggable interfaces. + +The reference data model on this page is **valkey-backed and log-based**, matching the existing reference implementation in [`adcp-go/targeting/`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting). Other buyers may use Aerospike, DynamoDB, PostgreSQL, in-memory state, or anything else — as long as the conformance invariants hold, the service is valid. + +## Three layers + +| Layer | Status | What it covers | +|---|---|---| +| Wire spec | Normative | The HTTP JSON request/response on `POST /identity`, the `serve_window_sec` semantic, the TMPX binary format. See the [TMP specification](/docs/trusted-match/specification). | +| Conformance invariants | Normative | The eligibility logic an IdentityMatch service MUST compute (audience intersection, fcap evaluation across identities, active state, audience freshness). See [Conformance invariants for IdentityMatch eligibility](/docs/trusted-match/specification#conformance-invariants-for-identitymatch-eligibility). | +| Reference data model | Non-normative | The valkey-backed implementation choice in `adcp-go/targeting/` — Redis primitives, key patterns, field names. The rest of this page. | + +## fcap_keys label model + +A frequency cap is identified by a tag of the form `dimension:value`: + +``` +campaign:42 +campaign_group:7 +advertiser:13 +creative:8 +``` + +Packages declare which `fcap_keys` they belong to; exposure log entries record which keys the impression counts toward; policies (window, max count) are attached per-key. + +``` +package 2342: fcap_keys ["campaign:42", "campaign_group:7", "advertiser:13"] +policy "campaign:42": {window_sec: 60, max_count: 5} +policy "advertiser:13": {window_sec: 86400, max_count: 20} +``` + +Multi-tenant operators adopt a tenant prefix (`buyer-acme:campaign:42`) as a deployment convention to prevent key collisions across advertiser orgs on shared state — this is an operator-level choice, not a protocol requirement. + +**Charset constraint.** Each segment matches `[a-zA-Z0-9_-]+` so the `:` delimiter is unambiguous. URL-bearing or otherwise colon-bearing values must be hashed or shortened before use as a value segment. + +**Why labels not hierarchy.** Cap dimensions are heterogeneous across customers — some want creative-level caps, some line-item, some flight, some advertiser-roll-up. A fixed schema either over-prescribes or under-serves. Labels also make cross-seller fcap automatic: any policy whose key is shared across sellers (e.g., `buyer-acme:advertiser:13`) enforces across all of them with no extra mode. + +**Cross-cutting policies are explicit, not implied.** A campaign that needs both a per-campaign and a per-advertiser cap declares both keys and gets two policy lookups at check time. There is no implicit roll-up. + +> **Note on the current `adcp-go/targeting` reference implementation:** as of this writing the reference impl uses scalar `package_id` and `campaign_id` rather than arbitrary `fcap_keys`. The generalization to the label model documented here is in progress and is what the spec defines. New implementations SHOULD build against the `fcap_keys` model directly. + +## Identity handling and cross-identity dedup + +The protocol does not dictate a canonical user ID. Buyers will use multiple identity providers (RampID, ID5, MAID, UID2, publisher-issued tokens) in parallel — Scope3's identity graph is canonical only for Scope3-hosted IdentityMatch instances. Other operators run their own graph or none at all. + +The reference impl handles this cleanly: **per-impression `impression_id` written to every identity log, deduplicated by `impression_id` at read time.** This makes the count exact regardless of whether identities are canonicalized upstream: + +- A user with three resolved identities (`rampid:abc`, `id5:def`, `maid:ghi`) on a single impression: the impression's `impression_id` is appended to all three identity logs. At eligibility time, reading all three logs and deduplicating by `impression_id` recovers a single exposure. +- A user whose identity resolution toggles across impressions (some impressions resolve `rampid` only, some resolve `id5` only): each impression has its own `impression_id`. The dedup union across the user's identity logs returns all distinct impressions correctly. **No merge rule needed; no under-counting.** + +This is why the reference impl uses a log rather than counters: counters can't dedup across identities without an external mechanism. The log approach is correct by construction for graphless and graph-canonicalizing operators alike. + +### `impression_id` generation rules + +The TMPX impression callback decodes the resolved identities (typically up to 3, per the [TMPX size budget](/docs/trusted-match/specification#size-budget)). The impression handler generates one `impression_id` at decode time and appends an `ExposureEntry` with that id to each identity's log. + +Critical invariants for `impression_id`: + +1. **Globally unique across all sellers, all sources, all time.** A single buyer agent serves impressions sourced from many sellers. If two sellers' impressions on the same user collide on `impression_id`, the read-time dedup falsely merges them as one impression and the cap under-counts. Use UUIDv4 (≥122 bits randomness) or an equivalent generator with collision-resistance across distributed instances. +2. **Generated by the buyer's impression handler at TMPX decode**, not by the seller, the publisher, the router, or the TMPX nonce. The TMPX nonce is per-IdentityMatch-evaluation and SHARED across all impressions in the serve window — not unique per impression. Seller-supplied IDs would collide across sellers. Publisher-supplied IDs would collide across publishers. Only the buyer agent has the global view to mint a unique id. +3. **One `impression_id` per impression, written to ALL of the user's resolved identity logs for that impression.** This is what enables the read-time dedup. If the buyer instead generated different ids per identity, the dedup contract breaks and the same impression would count once per resolved identity. +4. **Pixel retries are a separate concern.** The same pixel firing twice (network retry, refresh, etc.) MUST NOT generate two `impression_id`s — that would double-count a single impression. Either (a) the impression handler dedupes incoming requests by a separate idempotency key carried in the pixel URL or `Idempotency-Key` header, or (b) the deployment accepts a small over-count from pixel retries as benign. Cross-identity dedup and per-pixel idempotency are different problems with different mitigations. + +## Reference data model (valkey-backed, log-based) + +This is the layout used by [`adcp-go/targeting/`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting). Storage choice is implementation; any backend that satisfies the conformance invariants is conformant. + +Valkey/Redis does not validate writes against a schema definition. The contract is enforced by the SDK on the write side and by the IdentityMatch reader on the read side. Library discipline (not database constraints) is what makes this work. + +### Exposure log (per identity) + +``` +type: STRING (binary-encoded []ExposureEntry, lazy-pruned to window) +key: user:exposures:{HashToken(uid_type + ":" + user_token)} +value: [ + { impression_id, fcap_keys[], timestamp }, + ... +] +``` + +`HashToken` is a 16-byte SHA-256 prefix, hex-encoded. Binary entry encoding keeps the log compact ([`exposure_binary.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/exposure_binary.go)) — a 30-day log for a typical user is a few KB. + +Each entry records: + +- `impression_id` — generated by the impression handler at TMPX decode (UUID, ~16-20 bytes serialized). Used for cross-identity dedup at read time. Same value written to every identity log for one impression. +- `fcap_keys[]` — the labels this impression counts toward (e.g. `["campaign:42", "advertiser:13"]`). +- `timestamp` — unix seconds when the impression occurred. + +### User profile (per identity, optional) + +``` +type: STRING (JSON-encoded UserProfile) +key: user:profile:{HashToken(uid_type + ":" + user_token)} +value: { segments: { "seg_id": intent_score, ... } } +``` + +Audience-membership lookup. Populated by the buyer's audience pipeline (typically `sync_audiences`). At eligibility time, the IdentityMatch service unions segment memberships across all the user's identities, then intersects with each candidate package's required audiences. + +### Package config (per package) + +``` +type: STRING (JSON-encoded PackageIdentityConfig) +key: package:identity:{package_id} +value: { + target_segments: ["seg_a", "seg_b"], + fcap_keys: ["campaign:42", "advertiser:13"], + active: true, + updated_at: +} +``` + +Written by the buyer's package-CRUD writethrough. Loaded with a single `MGet` for all candidate packages at eligibility time, then cached in-process per (seller_id, property_id, country) for ~5 minutes. + +### Fcap policy (per fcap_key) + +``` +type: STRING (JSON-encoded FcapPolicy) +key: fcap_policy:{fcap_key} +value: { + window_sec: , + max_count: , + active: true, + updated_at: +} +``` + +Sliding window via `now - window_sec` filter at read. No FIXED/SLIDING toggle; the read-time filter handles both implicitly. + +### Read pattern + +For an IdentityMatch request with N identities and M candidate packages, the entire eligibility evaluation is **one MGet round-trip plus in-process computation**: + +1. `MGet` `[user:profile:{h1}, user:exposures:{h1}, ..., user:profile:{hN}, user:exposures:{hN}]` — 2N keys, single round-trip. +2. Parse profiles (JSON, small) and exposure logs (binary, zero-copy). +3. Union segment memberships across identities; build user segment set. +4. For each candidate package: check segment match (set intersection), then check fcap eligibility by scanning the user's exposure log entries with lazy dedup by `impression_id`, filtered by `fcap_key` and `timestamp >= now - window`. +5. Return `eligible_package_ids`. + +Package configs (`package:identity:*`) and policies (`fcap_policy:*`) are loaded out-of-band — typically batch-loaded at resolver startup or refreshed every ~5 minutes — so they don't add to per-request round-trips. + +### Write pattern + +On TMPX decode at the impression handler: + +1. Generate `impression_id`. +2. Resolve `fcap_keys` for the package(s) the impression counts toward. +3. For each identity in the TMPX: + - `Get user:exposures:{h}` → parse binary log + - Append new `ExposureEntry` + - Prune entries older than the longest active window (default 30 days) + - `Set user:exposures:{h}` → serialized binary log + +The read-modify-write per identity is **not atomic**. Concurrent writes for the same user can lose an exposure. The reference impl ([`engine.go:478`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/engine.go#L478)) explicitly accepts this trade — under-counting under contention is benign for fcap purposes. Atomic append via Lua or a future `Store.Append` method is a deferred optimization. + +## SDK primitives + +The SDK ships impression handling as **two composable functions**, not a single bundled call. Production tracking endpoints typically decode at intake, publish to pub/sub for buffering, and let a downstream worker write the store at its own pace. Bundling decode+write into a single call would force a synchronous topology and prevent that buffering pattern. + +``` +decodeTmpx(raw_tmpx) -> ExposureLog + Decrypts HPKE ciphertext, parses the published TMPX binary format + (/docs/trusted-match/specification#binary-format), returns the resolved identity entries + in a structured form ready for serialization onto a topic or for direct write. + +writeExposure(log, fcap_keys, store_context) -> { ok } + Appends entries to each identity's exposure log with a fresh impression_id + and the supplied fcap_keys. Prunes entries older than the longest active + window. store_context wires the FrequencyStore implementation + (valkey, Aerospike, DynamoDB, etc.). +``` + +Plus the buyer-side management plane: + +``` +upsertAudience(audience_id, members, opts) // wraps sync_audiences add/remove deltas +upsertPackage(seller_agent_url, package_id, fcap_keys, audience_ids, opts) +upsertFcapPolicy(fcap_key, {window_sec, max_count}) // e.g. "campaign:42" +inspectExposures(uid_type, user_token, fcap_key?) // test helper; returns matching log entries +``` + +Plus HPKE encrypt/decrypt as net-new SDK primitives (X25519 KEM, ChaCha20-Poly1305, HKDF-SHA256 per RFC 9180 `mode_base`). The encrypt path is needed by the IdentityMatch service emitting TMPX; decrypt by the impression handler invoking `decodeTmpx`. + +The same primitive surface ships in `@adcp/client` (TS), `adcp-go`, and `adcp` (Python). Implementer chooses the language; spec/SDK does not dictate where the logic runs. + +## Pluggable store interfaces + +The SDK exposes a `Store` interface — modeled on [`adcp-go/targeting/store.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/store.go) — that an IdentityMatch service implementation calls to satisfy the conformance invariants. Buyers running their own backend (Aerospike, DynamoDB, proprietary KV) implement these interfaces against their store; the SDK ships a reference valkey-backed connector. The interfaces, not the storage layout, are what the SDK contracts on. + +Core operations the IdentityMatch path needs: + +``` +Get(key) -> string, exists +MGet(keys...) -> [string] // batched, single round-trip +Set(key, value, ttl) +SetMembers(key) -> [string] // SET-typed reads (audiences, fcap_keys) +SetIntersect(keys...) -> [string] // efficient audience intersection +ZAdd / ZCount / ZRangeByScore // ZSET-typed if you store logs as sorted sets +``` + +Specific signatures are tracked under `adcp-client#1005`. The point at protocol level: the SDK is store-agnostic by design. + +## Production topology pattern + +A typical Scope3-style impression pipeline: + +``` +publisher pixel fires {TMPX} → tracking endpoint + │ + decodeTmpx (synchronous, at intake) + │ + ▼ + pub/sub topic + │ + frequency_writer worker + │ + writeExposure (asynchronous) + │ + ▼ + valkey +``` + +Decode at intake; emit to pub/sub for buffering; downstream worker writes the store at its own pace. Buffering, retries, dedup, observability, and abuse protection live at the queue layer — none of that is the SDK's job. The SDK ships the two functions; deployment topology composes them. + +A simpler synchronous pipeline (decode + write in the same handler) is also valid for low-volume deployments. The SDK supports both because the primitives are composable. + +## Performance — measured + +Numbers below are from [`targeting/scale_test.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/scale_test.go), against the in-memory mock store, on a single goroutine. They isolate the in-process eligibility logic from network round-trips so you can reason about per-request CPU separately from valkey latency. + +### Single-dimension scaling + +**Frequency cap evaluation per eval, single package, 1 identity:** + +| Prior exposures in user's log | Eval latency | +|---|---| +| 0 | 368 ns | +| 10 | 613 ns | +| 100 | 5.3 µs | +| 1,000 | 53 µs | +| 10,000 | 118 µs | + +Linear scan with binary lazy dedup; flat below 100 entries, sub-millisecond at 10K. + +**Resolver (load media buys + build indexes for a seller, 2 store round-trips):** + +| Media buys | Packages | Cold | +|---|---|---| +| 1 | 2 | 2.2 µs | +| 100 | 200 | 223 µs | +| 500 | 1000 | 2.07 ms | + +The resolver runs once per (seller, property, country) and is cached. After cache warmup, eligibility evaluation against the cached resolver is dominated by per-package log scans rather than resolver lookup. + +### Combined scaling (packages × log_size × identities) + +The combined case is what production sizing depends on. Numbers from `TestScale_IdentityMatch_CPU_Combined` — same mock-store isolation, varying all three dimensions: + +| packages | log entries | identities | CPU/eval | +|---|---|---|---| +| 100 | 100 | 3 | 90 µs | +| 100 | 1,000 | 3 | 1.0 ms | +| 100 | 10,000 | 3 | 7.2 ms | +| 1,000 | 100 | 3 | 0.78 ms | +| 1,000 | 1,000 | 3 | 7.5 ms ← realistic Scope3-shape load | +| 1,000 | 10,000 | 3 | 58 ms ← pathological tail | + +CPU scales linearly in `packages × log_entries × identities` — the eligibility logic re-scans each user's exposure log per candidate package via `CheckFrequencyRulesMultiLog` ([`engine.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/engine.go)). + +### Throughput per CPU core + +| Profile | Per-eval CPU | QPS / core | +|---|---|---| +| Median (100 pkg × 100 log × 3 ids) | 90 µs | ~11,000 | +| Realistic Scope3 (1000 pkg × 1000 log × 3 ids) | 7.5 ms | ~130 | +| Heavy tail (1000 pkg × 10K log × 3 ids) | 58 ms | ~17 | + +Eligibility evaluation has no shared state across requests — embarrassingly parallel. Scaling out is "add cores" or "add instances," with no eligibility-path bottleneck. + +### End-to-end with valkey + +Real-world latency adds the network round-trip to valkey for the user-data MGet. Typically 150 µs–1 ms depending on co-location. End-to-end: + +- **Median**: ~0.5–1.5 ms (CPU + valkey round-trip) +- **Realistic Scope3-shape**: ~7–10 ms +- **Pathological tail**: ~60 ms (outside the 30 ms p95 latency budget — heavy users on busy publishers are the risk) + +### Algorithmic optimization (landed) + +Original impl rescanned the exposure log per candidate package: `O(packages × log_entries × identities)`. The optimization pre-buckets the log by filter hash once per request; per-package check walks only the matching bucket. Heuristic-gated at `numPackages > 50` so small-package requests stay on the naive path (avoids regression on small requests with heavy logs). + +Tracked at [adcp-go#103](https://github.com/adcontextprotocol/adcp-go/pull/103). Measured speedups vs the original implementation: + +| packages | log entries | identities | Before | After | Speedup | +|----------|------------:|-----------:|----------:|---------:|--------:| +| 1000 | 100 | 3 | 784 µs | 71 µs | 11.0× | +| 1000 | 1000 | 3 | 7,566 µs | 287 µs | 26.4× | +| 1000 | 10000 | 3 | 57,861 µs | 1,500 µs | ~38× | + +The pathological tail drops from 58ms to 1.5ms, well within the latency budget. Below the threshold (≤50 packages), the naive path is preserved. + +### What hasn't been measured + +The above is mock-store CPU only. Real production sizing also depends on: +- **Network round-trip to valkey** under contention — needs measurement against the actual deployment +- **valkey memory and CPU** for the user-data working set at production scale +- **Tail-latency behavior under load** (not single-goroutine throughput) +- **Heavy-user impression-distribution shape** — what fraction of users hit 1K+ entries in the 30-day window + +Production benchmarks tracked as a rollout-plan deliverable. + +## Conformance scenarios + +Five scenarios mapping to the conformance invariants. SDK-driven integration tests can run these against a live valkey + IdentityMatch service. Scenarios use the `fcap_keys` label model documented above; the reference impl is mid-generalization from scalar `package_id`/`campaign_id` to arbitrary fcap_keys. + +All scenarios assume `serve_window_sec = 60` (default), `package = "pkg-42"`, `seller_agent_url = "https://seller-a.example"`. + +### 1. Per-key cap trips after N exposures + +**Setup:** + +``` +SET package:identity:pkg-42 = { + target_segments: ["seg_test"], + fcap_keys: ["campaign:42"], + active: true +} +SET fcap_policy:campaign:42 = { window_sec: 86400, max_count: 5, active: true } +SET user:profile: = { segments: { "seg_test": 1.0 } } +``` + +**Step 1** — wire call: `identity_match_request {identities: [{rampid, abc}], package_ids: [pkg-42]}` → expect `eligible_package_ids: [pkg-42]`, `serve_window_sec: 60`, `tmpx: `. + +**Step 2** — buyer-internal, 5 impressions: for each, decode TMPX, generate `impression_id`, write entry to user's exposure log with `fcap_keys: ["campaign:42"]`. After 5 impressions the log contains 5 entries matching that key in the current window. + +**Step 3** — wire call: same `identity_match_request`. Eligibility scans the log, counts 5 matching entries, compares to `max_count: 5` → cap tripped → `eligible_package_ids: []`. + +### 2. Multi-identity dedup + +User has two resolved identities (`rampid:abc` and `id5:def`). Setup as Scenario 1, plus `user:profile:` with the same segments. + +**Step 1** — buyer-internal, 3 impressions, each decoded with both identities resolved in the TMPX. Each impression writes the same `impression_id` to BOTH identity logs. + +``` +user:exposures: = [ + { impression_id: "imp-001", fcap_keys: ["campaign:42"], ts: ... }, + { impression_id: "imp-002", fcap_keys: ["campaign:42"], ts: ... }, + { impression_id: "imp-003", fcap_keys: ["campaign:42"], ts: ... } +] +user:exposures: = [ same three entries ] +``` + +**Step 2** — wire call: `identity_match_request {identities: [{rampid,abc}, {id5,def}], package_ids: [pkg-42]}`. Eligibility reads both logs, dedups by `impression_id`, finds 3 distinct impressions. Under cap of 5 → eligible. + +**Step 3** — buyer-internal: 3 more impressions, but identity resolution only gets `rampid:abc` (id5 lookup fails for these). + +``` +user:exposures: += [ imp-004, imp-005, imp-006 ] +user:exposures: unchanged +``` + +**Step 4** — wire call: same request. Eligibility dedups: union of `{imp-001, imp-002, imp-003}` (in both logs) ∪ `{imp-004, imp-005, imp-006}` (only in rampid log) = 6 distinct. Cap of 5 tripped → `eligible_package_ids: []`. + +This is the case where counter approaches with merge rules under-count. The log approach with `impression_id` dedup gets the right answer regardless of identity-resolution stability. + +### 3. Audience drift via sync_audiences + +Setup as Scenario 1, with the user initially in `seg_test`. + +**Step 1** — wire call: `identity_match_request` → `eligible_package_ids: [pkg-42]`. + +**Step 2** — buyer-internal, simulate `sync_audiences` removing the user from the segment: + +``` +SET user:profile: = { segments: { } } +``` + +**Step 3** — wait `serve_window_sec` seconds (60) so the publisher re-queries. + +**Step 4** — wire call: same `identity_match_request`. Eligibility checks audience intersection: user's empty segment set ∩ package's `[seg_test]` = ∅ → package dropped → `eligible_package_ids: []`. + +### 4. Cross-seller advertiser cap + +Setup: two packages on different sellers, both mapped to the same `advertiser:13` cap: + +``` +SET package:identity:pkg-A = { fcap_keys: ["advertiser:13"], active: true } +SET package:identity:pkg-B = { fcap_keys: ["advertiser:13"], active: true } +SET fcap_policy:advertiser:13 = { window_sec: 86400, max_count: 10, active: true } +``` + +**Step 1** — wire call from Seller A: `package_ids: [pkg-A]` → eligible. + +**Step 2** — buyer-internal, 10 impressions on pkg-A, each entry's `fcap_keys` includes `advertiser:13`. + +**Step 3** — wire call from Seller B: `package_ids: [pkg-B]`. Eligibility scans the user's log, counts entries matching `advertiser:13` within window: 10 ≥ max_count → `eligible_package_ids: []`. + +The advertiser-level cap enforces across sellers because the `fcap_key` is shared. No cross-seller coordination needed; the buyer agent is the single source of truth. + +### 5. Serve-window throttle + +Setup as Scenario 1. + +**Step 1** — wire call at `t=0`: `identity_match_request` → `eligible_package_ids: [pkg-42]`, `serve_window_sec: 60`. + +**Step 2** — publisher serves one impression on pkg-42 within the 60s window. + +**Step 3** — at `t=30s`, publisher receives another ad opportunity for the same user. Per `serve_window_sec` semantic, the publisher MUST NOT re-serve pkg-42 from the cached eligibility — pkg-42 is exhausted in this window. + +**Step 4** — at `t=61s`, publisher re-queries: `identity_match_request` → fresh eligibility. The buyer agent does not need to track per-publisher window state; it just answers freshly when re-queried. + +This is the semantic the `serve_window_sec` field encodes. The buyer agent's job is correctness on each query; the publisher's job is honoring the one-impression-per-package contract within the window. + +## See also + +- [TMP Specification](/docs/trusted-match/specification) — wire spec, TMPX format, conformance invariants +- [Buyer Guide](/docs/trusted-match/buyer-guide) — buyer agent integration, Context Match + Identity Match flows +- [Migration from AXE](/docs/trusted-match/migration-from-axe) — for buyers transitioning from AXE-shaped pipelines, including the OpenRTB User.eids cross-walk +- [Privacy architecture](/docs/trusted-match/privacy-architecture) — what each party learns +- [Router architecture](/docs/trusted-match/router-architecture) — provider registration, fan-out, latency +- [`adcp-go/targeting/`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting) — reference implementation in Go diff --git a/docs/trusted-match/migration-from-axe.mdx b/docs/trusted-match/migration-from-axe.mdx index 673cdbd3f6..fe24cbe81e 100644 --- a/docs/trusted-match/migration-from-axe.mdx +++ b/docs/trusted-match/migration-from-axe.mdx @@ -85,3 +85,21 @@ New media buys should omit AXE fields entirely. The buyer agent's Context Match - **`sync_creatives`** — Same creative sync - **GAM as the ad server** — TMP still sets key-values that GAM evaluates - **Geographic and other targeting overlays** — These are media buy fields, not execution-layer concerns + +## OpenRTB User.eids cross-walk + +For buyers bridging from OpenRTB-shaped pipelines, the TMP Identity Match `identities[]` shape maps to OpenRTB 2.6 `User.eids[]` as follows: + +| AdCP TMP `identities[].uid_type` | OpenRTB 2.6 `User.eids[].source` | Notes | +|---|---|---| +| `rampid` / `rampid_derived` | `liveramp.com` | `atype: 1` for maintained, `atype: 3` for derived | +| `id5` | `id5-sync.com` | | +| `uid2` | `uidapi.com` | `atype: 3` | +| `euid` | `euid.eu` | | +| `pairid` | `iabtechlab.com/pair` | | +| `maid` | `adid` (Android) / `idfa` (iOS) | Atypically carried on `Device.ifa` rather than `User.eids` in OpenRTB | +| `hashed_email` | `liveintent.com` or buyer-specific | `atype: 3` | +| `publisher_first_party` | publisher-defined `source` URL | | +| `other` | buyer-defined `source` URL | | + +The TMP `user_token` field corresponds to `User.eids[].uids[].id`. AdCP carries up to 3 identities per Identity Match request (HPKE size budget — see [TMPX size budget](/docs/trusted-match/specification#size-budget)); OpenRTB has no such limit, so a buyer bridging from OpenRTB into TMP must apply a buyer-configured priority order to truncate (typically: deterministic graphs first — UID2, RampID — then probabilistic or publisher-scoped IDs). diff --git a/docs/trusted-match/specification.mdx b/docs/trusted-match/specification.mdx index 53138ef1cb..5169aed477 100644 --- a/docs/trusted-match/specification.mdx +++ b/docs/trusted-match/specification.mdx @@ -7,7 +7,7 @@ description: Authoritative message type definitions, field tables, privacy requi # Trusted Match Protocol Specification -**Experimental.** The Trusted Match Protocol is part of AdCP 3.0 as an experimental surface — it may change between 3.x releases with at least 6 weeks' notice. Sellers implementing TMP MUST declare `trusted_match.core` in `experimental_features`. See [experimental status](/docs/reference/experimental-status) for the full contract. +**Experimental.** The Trusted Match Protocol is part of AdCP 3.0 as an experimental surface — it may change between 3.x releases with at least 6 weeks' notice. Sellers implementing TMP MUST declare `trusted_match.core` in `experimental_features`. See [experimental status](/docs/reference/experimental-status) for the full contract. Fields on this surface are not subject to deprecation cycles until 3.0.0 GA. This is the authoritative reference for the Trusted Match Protocol (TMP). For conceptual introductions, see the [overview](/docs/trusted-match/) and [core concepts](/docs/trusted-match/context-and-identity). @@ -24,7 +24,7 @@ Specific areas expected to evolve include TMPX exposure tokens, country-partitio | **Offer** | A buyer's response to a context match request. Ranges from simple activation (package_id only) to rich proposals with brand, price, summary, and creative manifest. | | **Available package** | A package from an active media buy that is eligible for evaluation on a given placement. Package metadata — including the originating seller agent — is synced at media buy time. See [Package Sync](#package-sync). | | **Seller agent** | The buyer-side agent that sold the package into a publisher. Identified by the agent URL declared in the publisher's `adagents.json` `authorized_agents[].url`. Every `AvailablePackage` is bound to exactly one seller agent at sync time. | -| **Eligibility** | List of eligible package IDs returned by Identity Match, plus a TTL caching contract. The buyer computes eligibility from frequency caps, audience membership, and other signals; the reasons are opaque to the publisher. | +| **Eligibility** | List of eligible package IDs returned by Identity Match, plus a serve-window throttle. The buyer computes eligibility from frequency caps, audience membership, and other signals; the reasons are opaque to the publisher. | | **Artifact** | A typed content reference associated with a publisher property (article URL, episode EIDR, show Gracenote ID, music ISRC, product GTIN, conversation turn). Each artifact has a `type` and `value`. Referenced in context match requests. | | **Temporal decorrelation** | Random delay introduced between Context Match and Identity Match requests to prevent timing-based correlation. | @@ -195,19 +195,34 @@ Each entry in `identities` is an `{user_token, uid_type}` pair: ### IdentityMatchResponse -Returned by the buyer agent. A list of eligible package IDs with a caching TTL. +Returned by the buyer agent. A list of eligible package IDs with a serve-window throttle. | Field | Type | Required | Description | |---|---|---|---| | `type` | string | Yes | `"identity_match_response"`. Message type discriminator for deserialization. | | `request_id` | string | Yes | Echo of the request's `request_id`. | | `eligible_package_ids` | List\ | Yes | Package IDs the user is eligible for. Packages not listed are ineligible. | -| `ttl_sec` | integer | Yes | How long the router should cache this response, in seconds. A value of `0` means do not cache — re-query on every request. | +| `serve_window_sec` | integer | Yes | Per-package single-shot fcap window, in seconds. Range: 1–300. Default: 60. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. This is **not** a router response cache TTL — it is a buyer-asserted serve throttle. Multi-impression frequency caps are handled separately by buyer-side exposure records, updated out-of-band via TMPX impression callbacks. | | `tmpx` | string | No | HPKE-encrypted exposure token containing resolved user identity tokens. The publisher substitutes this into creative tracking URLs as `{TMPX}`. The buyer's impression pixel receives the token, enabling real-time per-user frequency state updates. Wire format: `kid.base64url_nopad(ciphertext)` (unpadded, no `=` characters). Publishers MUST treat this value as opaque pass-through data. | -The response includes eligible package IDs, a TTL, and an optional `tmpx` field. The TMPX token is an HPKE-encrypted exposure token that flows through creative tracking URLs to the buyer's impression pixel, enabling real-time per-user frequency state updates without exposing user identity to the publisher. The buyer computes eligibility from whatever identity signals they have (frequency caps, audience membership, purchase history) and returns only the packages that pass. The publisher does not need to know why a package was excluded — just which packages are eligible. +The response includes eligible package IDs, a serve-window throttle, and an optional `tmpx` field. The TMPX token is an HPKE-encrypted exposure token that flows through creative tracking URLs to the buyer's impression pixel, enabling real-time per-user frequency state updates without exposing user identity to the publisher. The buyer computes eligibility from whatever identity signals they have (frequency caps, audience membership, purchase history) and returns only the packages that pass. The publisher does not need to know why a package was excluded — just which packages are eligible. -The `ttl_sec` field is a caching contract. The buyer is saying: "Cache this for N seconds." The router caches the `eligible_package_ids` list and returns it for subsequent requests during the window — it does not track which packages have been served. The publisher enforces allocation rules (at most one ad per package, competitive separation, pod composition) using the cached eligibility as input. This eliminates the need for pod-specific or batch-specific protocol semantics — the router has cached eligibility and the publisher allocates across whatever placements exist during the TTL window (a CTV ad pod, a web page with 20 slots, a single pre-roll). The buyer doesn't need to know the allocation details. +The `serve_window_sec` field is a **per-package single-shot fcap**, not a router cache TTL. The buyer is saying: "After you serve the user one impression on each eligible package, re-query me before serving from those packages again." The router MAY still cache the response for an internal deduplication/cost-saving window, but the binding contract on the publisher side is "one impression per eligible package per window." Multi-impression frequency caps (5 per day per campaign, 100 per month per advertiser, etc.) live in buyer-side state and are updated out-of-band via TMPX impression callbacks regardless of `serve_window_sec`. + +The publisher enforces allocation rules (competitive separation, pod composition) using the eligibility list as input. This eliminates the need for pod-specific or batch-specific protocol semantics — the publisher allocates across whatever placements exist during the serve window (a CTV ad pod, a web page with 20 slots, a single pre-roll), honoring the one-impression-per-package contract. + +#### Conformance invariants for IdentityMatch eligibility + +A conformant IdentityMatch service MUST compute `eligible_package_ids` such that, for each `package_id ∈ request.package_ids`, the package is included in `eligible_package_ids` if and only if **all** of the following hold: + +1. **Audience eligibility.** Either the package has no audience requirement, OR there exists at least one audience identifier `a` such that `a` is in the package's required audience set AND `a` is in the audience-membership of at least one identity `i ∈ request.identities` (the union across the user's resolved identities intersects the package's required audiences). +2. **Frequency cap eligibility.** For every fcap policy `k` declared on the package, the count of distinct impressions for `request.identities` against `k` within `[now - window_sec, now]` is strictly less than the policy's `max_count`. The "distinct impressions" count deduplicates by `impression_id` across the user's resolved identities — an impression that was resolved with multiple identity tokens counts once. The `impression_id` MUST be **globally unique across all sellers, sources, and time**, generated by the buyer's impression handler at TMPX decode (not seller-supplied, not the TMPX nonce, not publisher-supplied) — collisions on `impression_id` across sellers would silently merge distinct impressions and under-count caps. See [Identity handling](/docs/trusted-match/identity-match-implementation#identity-handling-and-cross-identity-dedup) and [`impression_id` generation rules](/docs/trusted-match/identity-match-implementation#impression_id-generation-rules) in the implementation guide. +3. **Active state.** Packages or policies marked inactive MUST be treated as if absent. +4. **Audience freshness.** If the buyer's audience pipeline publishes a freshness deadline and the current time is past it, that audience-membership entry MUST NOT contribute to (1). + +The TMPX returned with the response MUST encode the resolved identities so the out-of-band impression handler can update exposure state atomically — see § TMPX tokens. + +Storage backend (valkey, Aerospike, DynamoDB, in-memory, anything) is implementation. Two services with different storage backends that satisfy these invariants for the same inputs MUST return the same eligibility output. See the [buyer guide](/docs/trusted-match/buyer-guide) for a reference valkey-backed data model and SDK primitives that satisfy the invariants. #### Consent @@ -592,9 +607,9 @@ The 8-byte random nonce enables deduplication at the master. The master stores n ### Caching behavior -The TMPX token is generated once per Identity Match evaluation and cached alongside the eligibility response for `ttl_sec` seconds. All impressions within the TTL window share the same TMPX value (same nonce, same tokens). +The TMPX token is generated once per Identity Match evaluation and accompanies the eligibility response for the `serve_window_sec` window. All impressions on eligible packages within that window share the same TMPX value (same nonce, same tokens). -The buyer's master MUST NOT deduplicate by TMPX value or nonce within a TTL window — each pixel fire is one impression. Multiple ads served to the same user in a CTV pod or a web page with multiple ad units all produce distinct pixel fires with the same TMPX token. The nonce deduplication only prevents replay of the same TMPX token *after* the TTL window expires — if the same nonce appears outside its original TTL window, it is a replay and MUST be rejected. +The buyer's master MUST NOT deduplicate by TMPX value or nonce within a serve window — each pixel fire is one impression. Multiple ads served to the same user in a CTV pod or a web page with multiple ad units all produce distinct pixel fires with the same TMPX token. The nonce deduplication only prevents replay of the same TMPX token *after* the serve window expires — if the same nonce appears outside its original window, it is a replay and MUST be rejected. ### Publisher obligations @@ -641,9 +656,9 @@ Context Match responses are cacheable because the same packages are evaluated fo - Routers SHOULD cache Context Match responses with a TTL of **5 minutes**. - Providers MAY include a `cache_ttl` field (integer, seconds) in Context Match responses to override the default. Routers MUST respect this value when present. -- Identity Match responses are cached per the `ttl_sec` value in the response. Cache key: `{identities_hash, provider_id, package_ids_hash, consent_hash}`, where `identities_hash` is the SHA-256 of the canonical `identities` bytes defined in [Identity Match signed fields](#identity-match-signed-fields) (computed over the per-provider filtered subset); `package_ids_hash` is SHA-256 over the JCS serialization of the sorted `package_ids` array; `consent_hash` is SHA-256 over the JCS serialization of the request's `consent` object (or JCS `null` when the field is absent — this distinguishes "consent unknown" from an explicit-empty consent object). JCS framing prevents delimiter-injection: raw consent strings or package IDs containing `|`, `,`, or `\n` cannot collide two distinct inputs. Including the identity set ensures that adding or removing tokens produces a distinct cache entry. Including the package list hash ensures cached responses are invalidated when the active package set changes (e.g., a new media buy activates). Including the consent hash prevents eligibility decisions taken under one consent state from being served under another. -- When a provider's targeting configuration changes (new packages, updated targeting rules), the provider SHOULD return `"cache_ttl": 0` until the change has propagated, then resume normal caching. -- Both `ttl_sec` and `cache_ttl` have a schema-enforced maximum of 86400 seconds (24 hours). Routers SHOULD clamp buyer-provided values to a configured maximum (recommended: 3600 seconds) to limit the blast radius of stale caches. +- Identity Match responses are bound by `serve_window_sec` (per-package single-shot fcap, max 300s, default 60s). Routers MAY apply an internal deduplication cache keyed on `{identities_hash, provider_id, package_ids_hash, consent_hash}`, where `identities_hash` is the SHA-256 of the canonical `identities` bytes defined in [Identity Match signed fields](#identity-match-signed-fields) (computed over the per-provider filtered subset); `package_ids_hash` is SHA-256 over the JCS serialization of the sorted `package_ids` array; `consent_hash` is SHA-256 over the JCS serialization of the request's `consent` object (or JCS `null` when the field is absent — this distinguishes "consent unknown" from an explicit-empty consent object). JCS framing prevents delimiter-injection: raw consent strings or package IDs containing `|`, `,`, or `\n` cannot collide two distinct inputs. Including the identity set ensures that adding or removing tokens produces a distinct cache entry. Including the package list hash ensures cached responses are invalidated when the active package set changes (e.g., a new media buy activates). Including the consent hash prevents eligibility decisions taken under one consent state from being served under another. The publisher's binding contract is the serve-window throttle, not the router's internal cache window. +- When a provider's targeting configuration changes (new packages, updated targeting rules), the provider SHOULD return `"cache_ttl": 0` (Context Match) or `"serve_window_sec": 1` (Identity Match) until the change has propagated, then resume normal values. +- `cache_ttl` (Context Match) has a schema-enforced maximum of 86400 seconds. `serve_window_sec` is bounded at 300 seconds — longer windows make per-package fcap too coarse for typical campaigns, shorter than the IdentityMatch round-trip wastes the throttle. ## Conformance Levels diff --git a/specs/identitymatch-fcap-architecture.md b/specs/identitymatch-fcap-architecture.md new file mode 100644 index 0000000000..0b2c1c315e --- /dev/null +++ b/specs/identitymatch-fcap-architecture.md @@ -0,0 +1,142 @@ +# IdentityMatch & Frequency Capping — Architecture Spec + +**Status**: landed (architecture decisions). Implementation guidance promoted to `docs/trusted-match/`. +**Target release**: 3.0.1 (additive wire change). +**Branch**: `bokelley/idmatch-design` +**PR**: [#3359](https://github.com/adcontextprotocol/adcp/pull/3359) + +This spec captures the architecture decisions behind the buyer-side IdentityMatch surface in TMP. It is a **design-history document**, not an implementation reference — the authoritative implementation guidance lives in: + +- [`docs/trusted-match/specification.mdx`](../docs/trusted-match/specification.mdx) — wire spec (normative): `serve_window_sec` field, conformance invariants for IdentityMatch eligibility, TMPX binary format. +- [`docs/trusted-match/identity-match-implementation.mdx`](../docs/trusted-match/identity-match-implementation.mdx) — implementation guidance (non-normative): `fcap_keys` label model, reference valkey data model, merge rules, SDK primitives, pluggable store interfaces, production topology, conformance scenarios. +- [`docs/trusted-match/buyer-guide.mdx`](../docs/trusted-match/buyer-guide.mdx) — buyer-agent integration walkthrough; updated for `serve_window_sec` semantic. +- [`docs/trusted-match/migration-from-axe.mdx`](../docs/trusted-match/migration-from-axe.mdx) — adds OpenRTB 2.6 `User.eids` cross-walk for buyers bridging from OpenRTB-shaped pipelines. + +Read this doc when you want to understand **why** the design landed where it did. Read the docs above when you want to **implement** against it. + +## Problem + +The TMP IdentityMatch wire spec defines what flows on the wire: identity tokens in, eligible package IDs and an HPKE-encrypted exposure token (`tmpx`) out. It did not previously define: + +1. **Buyer-side data model** — what records the buyer maintains to compute eligibility (audiences, exposures, fcap policy), and how those records are keyed. +2. **Frequency-cap semantics** — what dimensions a cap can apply to (campaign, advertiser, group, …) and how multi-identity users are handled. +3. **Cross-language SDK scope** — what primitives ship across `@adcp/client`, `adcp-go`, and `adcp` (Python), and how HPKE key management slots into existing AdCP key plumbing. +4. **Audience freshness vs. response throttle** — `ttl_sec` was documented as a router cache TTL but operationally functioned as a per-package serve throttle, conflating two distinct concerns. +5. **Conformance** — how a third party validates that an IdentityMatch implementation is correct. + +Without these decisions, the open-source IdentityMatch reference impl risked shipping with Go-shaped assumptions baked into wire-adjacent surfaces. + +## Architectural decisions + +### 1. Three layers, with explicit normative status + +| Layer | Status | What it covers | +|---|---|---| +| **Wire spec** | Normative | HTTP JSON, `serve_window_sec` semantic, TMPX binary format. Anything crossing an agent boundary. | +| **Conformance invariants** | Normative | The eligibility logic an IdentityMatch service MUST compute, expressed in terms of inputs (identities, packages, audiences, policies, exposures) and outputs (eligible_package_ids). Storage-agnostic. | +| **Reference data model** | Non-normative | Scope3's valkey-backed implementation choice. Buyers running Aerospike, DynamoDB, or anything else are conformant if their service satisfies the invariants. | + +The protocol describes **what** the service must compute, not **how** it stores the data. SDK exposes pluggable store interfaces; valkey is the reference connector. + +### 2. `fcap_keys[]` as a label model, not hierarchy + +`dimension:value` (e.g. `campaign:42`, `advertiser:13`). Multi-tenant operators adopt a tenant prefix as a deployment convention (e.g. `buyer-acme:campaign:42`) — not a protocol requirement. Charset constraint `[a-zA-Z0-9_-]+` per segment for unambiguous parsing. Buyers choose dimensions; the protocol does not enumerate them. See [implementation guide § fcap_keys label model](../docs/trusted-match/identity-match-implementation.mdx#fcap_keys-label-model). + +### 3. Cross-identity dedup via `impression_id`, not merge rules + +Records are keyed by `(uid_type, user_token)`. Buyers running their own identity graph can canonicalize before write/read; the protocol stays agnostic. Multi-identity dedup is handled at eligibility-check time by deduplicating exposure-log entries by `impression_id` — a single impression resolved to multiple identity tokens has the same `impression_id` written to all identity logs, and the read-time union recovers the count exactly. + +This approach is correct by construction for **graphless and graph-canonicalizing operators alike**, with no merge-rule policy needed. Earlier drafts of this design proposed counter-based exposure tracking with a `merge_rule` (MAX/OR/SUM) policy field; that approach under-counts when identity resolution toggles across impressions (a real concern given Scope3 is graphless). The `adcp-go/targeting/` reference impl already uses log-based dedup; this spec aligns with the existing impl rather than the abandoned counter design. See [implementation guide § Identity handling and cross-identity dedup](../docs/trusted-match/identity-match-implementation.mdx#identity-handling-and-cross-identity-dedup). + +### 4. `serve_window_sec` replaces `ttl_sec` + +The original `ttl_sec` field was documented as a router cache TTL but operationally functioned as a per-package single-shot fcap. Two distinct concerns sharing one knob meant tuning for cost (long cache) silently broke fcap, and tuning for fcap (short cache) wasted IdentityMatch round-trips. + +Replacement: `serve_window_sec` (1–300, default 60) with the corrected semantic — *after serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again.* + +`ttl_sec` is removed. No deprecation window: TMP is pre-launch (experimental, pre-3.0.0 GA) and not subject to deprecation cycles. The field is not present in the 3.0.1 schema. + +### 5. Two composable SDK primitives for impression handling, not one + +Per Slack alignment with Baiyu (Scope3 impression-tracker owner): + +``` +decodeTmpx(raw_tmpx) -> ExposureLog // pure crypto + parse +writeExposure(log, store_context) -> { ok } // pure store interaction +``` + +Production topology is `pixel → tracking endpoint → pub/sub → frequency_writer → valkey`. A bundled `recordImpression()` would force synchronous topology and break the buffering pattern. Two composable functions let any topology compose them. + +The same two primitives ship in `adcp-go`, `adcp-ts`, `adcp-py`. Spec/SDK is language-neutral; implementer picks the language that fits their infra. + +### 6. TMP IdentityMatch service is a downstream read replica + +The TMP server reads valkey on each `/identity` call. Writes go through the SDK directly to valkey (production management plane). No new wire endpoints for fcap policies, package CRUD, or impressions — all SDK-side. TMP server stays minimal. + +### 7. `sync_audiences` is the audience on-ramp + +The existing wire `sync_audiences` task has `add[]`/`remove[]` deltas of audience-member objects — exactly the CRUD shape the IdentityMatch backend needs. No schema extension required. + +## Open questions + +1. **`fcap_keys` generalization in `adcp-go/targeting`.** The reference impl currently uses scalar `package_id` and `campaign_id`; the spec defines arbitrary `fcap_keys` (advertiser, creative, line-item, etc.). Generalizing the reference impl is an in-flight refactor. +2. **Atomic exposure-log append.** Reference impl uses read-modify-write per identity, which is not atomic. Comment in `engine.go:478` explicitly accepts under-counting under contention as benign. Atomic append via Lua or a `Store.Append` method is a deferred optimization. +3. **Cap on policies per fcap_key.** One policy per key for v1; cross-cutting caps (per-day AND per-hour) are expressed as multiple keys. +4. **Identity-graph plug-point.** Pre-write/pre-read interceptors in the SDK. Default: identity passthrough. +5. **Pluggable store interface signatures.** Modeled on `adcp-go/targeting/store.go`. Specific TS/Python signatures pinned to `adcp-client#1005`. +6. **Where do fcap policies live on the wire (if anywhere)?** Currently SDK-only. Could embed in `create_media_buy` packages or add a new wire task. Decide before SDK ships. +7. **Audience strength scores.** Reference impl already supports per-segment scores in `UserProfile.Segments`. SDK should expose the strength floor at eligibility time. +8. **Production-deployment perf benchmarks.** Mock-store numbers cover the in-process eligibility path: realistic Scope3-shape load (1000 pkg × 1000 log × 3 ids) is ~7.5 ms CPU/request — comfortable. Pathological tail (1000 pkg × 10K log × 3 ids) is ~58 ms CPU/request — outside the 30 ms p95 budget. Network round-trip to real co-located valkey, cluster sharding, and tail-latency under load all need real benchmarks. Tracked as a rollout-plan deliverable. +9. **Pre-aggregate-per-fcap_key optimization** ([adcp-go#103](https://github.com/adcontextprotocol/adcp-go/pull/103) — landed as in-flight upstream PR). Pre-buckets the exposure log by filter hash once per request; per-package check walks only the matching bucket instead of re-scanning the full log. Heuristic-gated at `numPackages > 50` so small-package requests stay on the naive path (avoids a measured ~3× regression on small requests with heavy logs). Measured speedups: 1000 pkg × 1000 log × 3 ids: ~26×; 1000 pkg × 10K log × 3 ids: ~38× (pathological tail drops from 58ms to ~1.5ms, well within the latency budget). + +## Deferred security & privacy issues (follow-up) + +These came out of pre-merge review. Each warrants a focused follow-up rather than blocking this design landing. + +1. **TMPX harvest → competitor-suppression attack.** TMPX in publisher creative URLs is harvestable. Without per-impression binding (creative_id, slot_id, ts) inside the AEAD AAD, an attacker fires harvested tokens at the buyer's impression endpoint to inflate fcap counts and starve a target user out of a campaign. Mitigation: bind TMPX to per-impression context, or rate-limit-per-token at the impression handler. +2. **Eligibility-as-audience-membership oracle.** A malicious publisher submits honeypot `package_ids` and observes which return eligible to reconstruct the user's audience profile. The "publishers don't see audience records" privacy claim is wire-correct but functionally false. Mitigation: package-ownership check at IdentityMatch ingress, or k-anonymity floor on eligibility responses. +3. **Consent revocation between IdentityMatch and impression.** TMPX has no consent fingerprint; if consent is revoked during the cache window, the impression handler still writes an exposure record. GDPR/TCF problem. +4. **Side-channel via eligibility deltas.** A router observing two responses for the same user 30s apart sees `eligible_package_ids` shrink as caps trip — fingerprinting fcap state per-user. +5. **`hashed_email` in TMPX widens identity-leak surface.** Putting unsalted SHA-256 email inside a creative URL macro re-identifies on token leak. Either prohibit `hashed_email` in TMPX plaintext or require salting. +6. **DoS amplification via large `package_ids[]`.** Per-IdentityMatch valkey reads scale `O(|identities| × |candidate_packages| × |fcap_keys_per_package|)` — at 25k packages from a busy publisher, this is an amplification primitive. Cap candidate_packages at IdentityMatch ingress. +7. **§Rollout work plan ownership gaps.** No named owner for the eligibility-evaluator hot path, observability/SLO, key-rotation drill, or load testing. Address before SDK ships. + +## Rollout plan + +### What this PR landed + +- Wire spec change (additive): `serve_window_sec` field on `identity-match-response.json`. `ttl_sec` removed (pre-launch, no deprecation cycle needed). +- Doc updates to `docs/trusted-match/specification.mdx`, `buyer-guide.mdx`, `migration-from-axe.mdx`. +- New page: `docs/trusted-match/identity-match-implementation.mdx` (implementation guide). +- This architecture-rationale doc. + +### Next workstreams (not in this PR) + +1. **`@adcp/client` V6 (TS)** — tracked under `adcp-client#1005`. Implements `decodeTmpx` / `writeExposure` / `upsertAudience` / `upsertPackage` / `upsertFcapPolicy` / `inspectExposure`. Pluggable store interfaces. Valkey reference connector. HPKE encrypt/decrypt. +2. **`adcp-go` and `adcp` (Python) parity** — same primitive surface as the TS SDK. +3. **`adcp-go/identitymatch` reference TMP server** — open-source read replica for `POST /identity`. Reads via the SDK's pluggable store interfaces. +4. **Scope3 hosted IdentityMatch** — public deployment for buyers who don't want to host their own service. +5. **Training agent integration** — hosts both AdCP MCP/A2A and TMP `/identity` surfaces, sharing valkey internally. End-to-end IdentityMatch demo. +6. **Conformance harness** — runner script that uses the SDK to seed state and asserts behavior, plus calls the TMP server's `/identity` to validate eligibility responses. Lives as integration tests inside `@adcp/client` and `adcp-go`. The five conformance scenarios in the [implementation guide](../docs/trusted-match/identity-match-implementation.mdx#conformance-scenarios) map directly onto runnable test cases. +7. **TMP graduation (target: 3.1.0)** — TMP enters `supported_protocols` (currently in `experimental_features` as `trusted_match.core`). At that point AdCP storyboards can wrap the SDK-driven harness if cross-protocol integration testing becomes useful. + +## Threads consolidated from Slack 2026-04-26 + +- **Thread 1 (exposure struct location):** resolved by the three-layer model. Cross-language interop is at the Redis-operation level (`HINCRBY`, `SADD`); no proto, no JSON Schema for buyer-internal records. TMPX wire format stays as published in `docs/trusted-match/specification.mdx`. +- **Thread 2 (campaign isn't AdCP):** resolved by the `fcap_keys[]` label model. No fixed dimensions; customers choose. Tenant prefix required. Seller agent + package_id remains the seller-side identifier per `core/seller-agent-ref.json`. +- **Thread 3 (campaign logic in IdentityMatch):** resolved by the conformance invariants — backend-agnostic eligibility logic in the wire spec. +- **Thread 4 (campaign sync via Cerberus):** resolved — direct CRUD writethrough via SDK; no Cerberus. + +## Threads consolidated from Slack 2026-04-30 (impression handling) + +Per discussion with @bhuo (Scope3 impression-tracker owner) and Brian: + +- The SDK ships impression handling as **two composable functions**, not a single bundled call. `decodeTmpx` (pure crypto + parse) and `writeExposure` (pure store interaction). Production deployments separate decode at intake (synchronous) from write downstream (asynchronous, behind a queue) for buffering. Bundling forces synchronous topology and breaks the pattern. +- "JS for writers, Go for reader" framing was wrong — Brian's "JS" was shorthand for "the language the impression tracker runs in," currently Go at Scope3. Spec/SDK is language-neutral; the same two primitives ship in `adcp-go`, `adcp-ts`, `adcp-py`. +- Pub/sub buffering, retries, dedup, observability, abuse protection are deployment concerns, not protocol concerns. SDK ships the building blocks; topology is the implementer's choice. + +## Threads consolidated from PR #3359 review + +- **@oleksandr's normative/reference layering question:** the original spec called the buyer-side valkey schema "normative" while leaving an open question for a pluggable FrequencyStore interface. Inconsistent. Resolved by the three-layer model — wire spec + conformance invariants are normative; reference data model is Scope3's implementation choice, swappable. +- **Brian: counters can't dedup across identities, what about an exposure log keyed per-identity with imp_id-based dedup?** Direct comparison led to walking through correctness (counter+MAX under-counts when identity resolution toggles, log+imp_id is exact), then perf math (counter pipelined ~10-30ms vs log ~3-10ms — log structurally faster). Surveyed `adcp-go/targeting/`: the log approach is **already implemented and shipping**. Spec was speculating about an architecture the codebase had already chosen. Pivot: spec rewritten to match the existing reference impl (per-identity binary exposure log with `impression_id` dedup, single MGet read pattern, sliding window via timestamp filter, prune-on-write). All the merge-rule, FIXED/SLIDING, counter-comparison content removed. Real perf numbers from `targeting/scale_test.go` substituted for envelope math. +- **`fcap_keys` generalization** (Brian's call: "B is what we want"): spec defines the label model (`tenant:dimension:value`) as the design direction. The current reference impl uses scalar `package_id`+`campaign_id`; generalizing it to arbitrary fcap_keys is an in-flight refactor in `adcp-go/targeting`. New buyer impls SHOULD build against the label model directly. diff --git a/static/schemas/source/index.json b/static/schemas/source/index.json index 646acbb773..e96f178bf2 100644 --- a/static/schemas/source/index.json +++ b/static/schemas/source/index.json @@ -1555,7 +1555,8 @@ "description": "Per-package eligibility — boolean eligible plus optional intent score" } } - } + }, + "implementation-guidance": "Conformance invariants and a reference (non-normative) valkey-backed buyer-side data model are documented in specs/identitymatch-fcap-architecture.md. Storage backend is an implementation choice; conformant services may use any store that satisfies the invariants." }, "brand-protocol": { "description": "Brand protocol for identity retrieval, rights discovery, acquisition, and lifecycle management", diff --git a/static/schemas/source/tmp/identity-match-response.json b/static/schemas/source/tmp/identity-match-response.json index 39e83c6946..0bb95e6433 100644 --- a/static/schemas/source/tmp/identity-match-response.json +++ b/static/schemas/source/tmp/identity-match-response.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/tmp/identity-match-response.json", "title": "Identity Match Response", - "description": "Response indicating which packages the user is eligible for. The ttl_sec field defines a caching contract: the router caches this response and returns cached eligibility without re-querying the buyer during the TTL window. Extension fields (ext, context) are intentionally omitted to prevent data leakage across the identity privacy boundary.", + "description": "Response indicating which packages the user is eligible for. The serve_window_sec field defines a per-package single-shot fcap: after serving the user one impression on each eligible package, the publisher MUST re-query Identity Match before serving from those packages again. Extension fields (ext, context) are intentionally omitted to prevent data leakage across the identity privacy boundary.", "x-status": "experimental", "type": "object", "properties": { @@ -22,11 +22,12 @@ "type": "string" } }, - "ttl_sec": { + "serve_window_sec": { "type": "integer", - "description": "How long the router should cache this response, in seconds. The router returns cached eligibility without re-querying the buyer during this window. A value of 0 means do not cache.", - "minimum": 0, - "maximum": 86400 + "description": "Per-package single-shot fcap window, in seconds. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. This is NOT a router response cache TTL — it is a buyer-asserted serve throttle. Multi-impression frequency caps are handled separately by buyer-side exposure records and policies, updated out-of-band via TMPX impression callbacks. Default 60. Maximum 300 — longer windows reduce IdentityMatch load but coarsen fcap granularity below what most campaigns require.", + "minimum": 1, + "maximum": 300, + "default": 60 }, "tmpx": { "type": "string", @@ -37,7 +38,7 @@ "type", "request_id", "eligible_package_ids", - "ttl_sec" + "serve_window_sec" ], "additionalProperties": true }