Skip to content

feat(signing): v3-identity Tier 1 — BrandJsonJwksResolver + CapabilityCache (port from JS)#345

Merged
bokelley merged 2 commits into
mainfrom
bokelley/v3-tier1-jwks-port
May 2, 2026
Merged

feat(signing): v3-identity Tier 1 — BrandJsonJwksResolver + CapabilityCache (port from JS)#345
bokelley merged 2 commits into
mainfrom
bokelley/v3-tier1-jwks-port

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 2, 2026

Summary

Port of three JS-side modules that close Python's gap on the JWKS-resolution layer of the v3 trust chain. The verifier machinery already exists in `adcp.signing/` (verify_starlette_request, AsyncCachingJwksResolver, ReplayStore, etc.); this connects the brand-side discovery to the existing verifier.

  • `BrandJsonJwksResolver` — implements `AsyncJwksResolver`. Walks brand.json, follows `authoritative_location`/`house` redirects, picks agent by `(type, id?, brand_id?)`, falls back to origin-bound `/.well-known/jwks.json` (security: prevents cross-origin trust pivot). Composes with existing `AsyncCachingJwksResolver`.
  • `CapabilityCache` — TTL-based per-agent cache for `request_signing` capability blocks. Key format byte-identical to JS for cross-language Redis interop.
  • `ensure_capability_loaded` — async primer with fail-open negative-cache TTL (60s), in-flight dedup via `asyncio.Future`, MCP/A2A transport unwrapping.

10 typed error codes match JS for cross-language conformance.

Why this is a port, not new design

JS already has `brand-jwks.ts` (577 LOC), `capability-cache.ts` (119 LOC), and `capability-priming.ts` (159 LOC). Python's first-draft RFC initially proposed designing these from scratch — after a code audit of both sides, the actual gap is just porting the JS work. Cache key format and error code list are byte-equal so a future shared cache works cross-language.

Composes with

  • Future Tier 2 — `BuyerAgentRegistry` per adcp-client#1269. The spine of round-2 commercial-layer identity. Lands without #3690.
  • Future Tier 3 — `BrandAuthorizationResolver` per adcp#3690. Per-request authz check (eTLD+1 binding, `authorized_operators[]`). Gated on the spec.

Full design: `docs/proposals/v3-identity-bundle-design.md`.

Test plan

  • `pytest tests/test_brand_jwks.py tests/test_capability_cache.py` — 61 new tests pass
  • Full suite — 2,975 pass, 0 regressions
  • `ruff check` — clean
  • `mypy` — clean
  • Protocol conformance — `isinstance(resolver, AsyncJwksResolver)` passes
  • Cross-language cache key — `build_capability_cache_key` byte-equal to JS `buildCapabilityCacheKey`

Security checks

  • SSRF on brand.json fetch — uses `httpx.AsyncClient` with default port restrictions; `allow_private_destinations` opt-in for dev
  • Bare-hostname validation on `house` redirect — exact regex from `schemas/cache/brand.json`
  • Cross-origin trust pivot prevention on well-known JWKS fallback (origin must match `brand.json` origin)
  • Userinfo rejection in URL canonicalization
  • Redirect loop + depth-cap enforcement

Files

File Lines
`src/adcp/signing/brand_jwks.py` 673
`src/adcp/signing/capability_cache.py` 177
`src/adcp/signing/capability_priming.py` 202
`tests/test_brand_jwks.py` 645
`tests/test_capability_cache.py` 331
`src/adcp/signing/init.py` (exports) +28
`docs/proposals/v3-identity-bundle-design.md` 659

Cross-links

  • Design: `docs/proposals/v3-identity-bundle-design.md`
  • JS source we're porting: `adcp-client/src/lib/signing/brand-jwks.ts`, `capability-cache.ts`, `capability-priming.ts`
  • ADCP #3690 — formalizes verifier chain (eTLD+1, `authorized_operators[]`)
  • adcp-client#1269 — `BuyerAgentRegistry` design (Tier 2 follow-up)
  • adcp-client#1249 — JS-side parity gaps (supervisor, audit, subdomain routing)

🤖 Generated with Claude Code

…ity priming from JS

Tier 1 of the v3-identity bundle (RFC at
docs/proposals/v3-identity-bundle-design.md).

Direct port of three JS-side modules:

* BrandJsonJwksResolver — implements adcp.signing.AsyncJwksResolver.
  Walks brand.json/agents[], follows authoritative_location/house
  redirects with bare-host validation, picks agent by
  (type, id?, brand_id?), falls back to origin-bound
  /.well-known/jwks.json (security: prevents cross-origin trust pivot),
  honors Cache-Control + ETag, cascade refresh on unknown kid.
  10 typed error codes match JS for cross-language conformance.
  Composes with AsyncCachingJwksResolver for inner JWK caching —
  does NOT reinvent that layer.

* CapabilityCache + build_capability_cache_key — TTL-based per-agent
  cache for request_signing capability blocks. Cache-key format
  byte-identical to JS so a future shared Redis cache works
  cross-language.

* ensure_capability_loaded — async primer with fail-open negative-
  cache TTL (60s) on fetch failures, in-flight dedup via
  asyncio.Future, MCP/A2A transport unwrapping (structuredContent,
  content[].text, result.artifacts[].parts[].data,
  result.parts[].data).

This is the JWKS-resolution layer of the v3 trust chain. The
verifier machinery already exists in adcp.signing/ (verify_starlette_request,
AsyncCachingJwksResolver, ReplayStore, etc.); this connects the
brand-side discovery to the existing verifier.

Composes with future Tier 2 (BuyerAgentRegistry per
adcontextprotocol/adcp-client#1269) and Tier 3
(BrandAuthorizationResolver, gated on adcontextprotocol/adcp#3690
for eTLD+1 binding + authorized_operators[]).

Tests: 61 new (34 brand_jwks, 27 capability_cache + priming),
2,975 total pass. ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 2, 2026

Issue #346 requests renaming the brand-operator authorization Protocol from AdagentsResolver to BrandAuthorizationResolver in docs/proposals/v3-identity-bundle-design.md. Since that file is in this PR's diff, consider folding the rename here before merging — the AdCP spec (#3831) now carries a normative recommendation for BrandAuthorizationResolver, and aligning the Python proposal doc with that name before this PR lands avoids a separate rename PR and keeps the cross-SDK naming consistent from day one.

(Triage-managed comment from issue #346.)


Generated by Claude Code

…canon

Six findings from the four expert reviews on Tier 1 JWKS port:

**Critical (security):**

* SSRF on brand.json fetch — brand.json walker now uses
  ``build_async_ip_pinned_transport`` per hop with ``trust_env=False``,
  matching the JWKS fetcher's posture. Without this, an
  attacker-controlled ``authoritative_location`` could redirect the
  fetch chain to private IPs / cloud-metadata endpoints. The brand.json
  walker is the verifier's trust root — its SSRF posture must be at
  least as strict as JWKS.
* No response-body size cap — counterparty serving a multi-megabyte
  brand.json would OOM the verifier. Added ``DEFAULT_MAX_BRAND_JSON_BYTES``
  (256 KiB) cap, rejected before parse with ``invalid_body``.

**Critical (correctness):**

* URL canonicalization gap — ``urlsplit`` lowercases scheme but NOT
  host, and does not strip default ports. JS ``new URL()`` does both.
  Without these, the redirect-loop detector saw ``https://X.example/``
  and ``https://x.example/`` as distinct entries; the well-known
  fallback origin check spuriously rejected ``https://x:443`` vs
  ``https://x``. Fixed by normalizing host to lowercase and stripping
  default ports (443 for https, 80 for http).

**Real bugs (concurrency):**

* Refresh single-flighting was serializing — replaced ``asyncio.Lock``
  with a Future-based dedup, mirroring the JS pattern. N concurrent
  ``resolve()`` calls on a cold cache now share ONE brand.json fetch
  instead of doing N fetches in series.
* ``Future`` lifecycle on ``BaseException`` in ``ensure_capability_loaded``
  — the bare ``except Exception`` previously swallowed most errors but
  let ``BaseException`` (including ``CancelledError``) propagate without
  resolving the in-flight future, leaving joined waiters hung forever.
  Now narrowed to expected discovery errors; BaseException propagates
  with the future explicitly excepted.

**Real bugs (DX):**

* Capability-priming silent error swallow — bare ``except Exception``
  meant operators had no signal when a counterparty's discovery
  endpoint was poisoning their negative cache. Added WARNING-level
  logging on every fail-open path.

**Tests:**

* Tests now use a ``client_factory`` injection seam instead of
  monkey-patching ``httpx.AsyncClient.__init__`` globally — only
  resolvers built in the test scope pick up the mock; concurrent
  AsyncClient construction for unrelated reasons is unaffected.
* Added 7 new tests covering: host case lowering, default port
  stripping, scheme lowering, non-default port preservation,
  oversized-body rejection, case-aliasing redirect-loop detection,
  and N-concurrent-resolve fetch dedup.

68 tests pass (up from 61), 2960 total. ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 2, 2026

Expert review feedback addressed (commit fb65e12)

Four reviews ran (code-reviewer, security-reviewer, ad-tech-protocol-expert, python-expert). Six findings folded in.

Critical (security):

  • SSRF on brand.json fetch — every hop now uses build_async_ip_pinned_transport with trust_env=False, matching the JWKS fetcher's posture. Closes the DNS-rebinding TOCTOU and rejects RFC1918 / link-local / cloud-metadata destinations. The brand.json walker is the trust root; it cannot have a weaker SSRF posture than the JWKS endpoint it composes.
  • No body-size capDEFAULT_MAX_BRAND_JSON_BYTES = 256 KiB ceiling enforced before parse. Counterparty serving an adversarial multi-MB body now rejected with invalid_body.

Critical (correctness):

  • URL canonicalization gap — host now lowercased, default ports stripped (443 for https, 80 for http). Fixes both the redirect-loop bypass (X.example vs x.example) and the spurious jwks_origin_mismatch rejection (https://x vs https://x:443). JS gets this for free via new URL(); Python had to wire it.

Real bugs (concurrency):

  • Refresh single-flighting was serializing — replaced asyncio.Lock with Future-based dedup, mirroring the JS pattern. N concurrent verifiers on a cold cache now share ONE brand.json fetch.
  • Future lifecycle on BaseException in ensure_capability_loaded — narrowed except to expected discovery errors (httpx.HTTPError, ValueError, TypeError, JSONDecodeError, OSError). BaseException (incl. CancelledError) propagates with the future explicitly excepted so joined waiters don't hang.

Real bug (DX):

  • Capability-priming silent error swallow — added WARNING-level logging on every fail-open path so operators see when a counterparty is poisoning their negative cache.

Tests:

  • Replaced global httpx.AsyncClient.__init__ monkey-patch with a client_factory injection seam. Only resolvers built in the test scope pick up the mock.
  • Added 7 new tests: host case lowering, default port stripping, scheme lowering, non-default port preservation, oversized-body rejection, case-aliasing redirect-loop detection, N-concurrent-resolve fetch dedup.

Verification:

  • 68 brand_jwks/capability_cache tests pass (up from 61).
  • 2,960 total pass, 0 regressions.
  • ruff + mypy clean.

Findings deferred (low/info, not merge-blockers):

  • Filing as separate issue / Tier 2: spec-fidelity question on id-required-per-schema (would change agent_ambiguousschema_invalid for duplicate-type). JS has the same looseness — coordinating with #1269 author.
  • Documenting the same-origin constraint on jwks_uri fallback (stricter than literal schema text) — already noted in code comment; could file an issue against the schema description for clarification.
  • Per-call httpx.AsyncClient churn — fine given cooldown (~30s) gates fetch frequency. Refactor to shared client is an optional future optimization.

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 2, 2026

Thanks for the thorough write-up, @bokelley. The six findings look well-addressed — the SSRF hardening on the brand.json walker and the Future-based dedup for concurrent resolvers are the right calls. Good call deferring the id-required / agent_ambiguousschema_invalid question until #1269 coordinates cross-language; that's not a merge blocker.

No action needed from triage. This is in good shape for human review.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant