feat(decisioning): WebhookDeliverySupervisor + SQLAlchemy A2A stores example#348
feat(decisioning): WebhookDeliverySupervisor + SQLAlchemy A2A stores example#348
Conversation
…res example Round-2 P1 reliability layer for F12 sync-completion webhooks. The existing WebhookSender is the transport (HTTP-Signatures POST, one attempt, no retry); production sellers wrap it with retry, circuit breaker, and per-attempt audit. This adds the SDK-side seam. Tracks adopter feedback that salesagent's webhook reliability layer (~1,041 LOC across webhook_delivery_service.py + protocol_webhook_service.py) has no SDK home. After this PR, those LOC compose against the Protocol seam instead of being adopter-rolled. Components in src/adcp/webhook_supervisor.py: * WebhookDeliverySupervisor Protocol — async send_mcp surface, Protocol-conformant; adopters with infra-side retry (Celery, Kafka, durable outbox) implement against their queue. * InMemoryWebhookDeliverySupervisor reference impl — wraps a WebhookSender with per-endpoint CircuitBreaker (CLOSED/OPEN/HALF_OPEN state machine, 5-failure threshold, 60s recovery), exponential- backoff RetryPolicy with jitter, per-sequence_key monotonic counter for delivery-report sequence numbers, optional DeliveryLogSink Protocol for BYO persistence. * DeliveryAttempt frozen dataclass — one record per attempt (success / failure / circuit_open) for audit-log persistence. * DeliveryLogSink Protocol — adopters wire sinks to their existing webhook_delivery_log tables; sink failures swallowed (broken sink must not cascade into delivery loss). Wire-through: * PlatformHandler accepts webhook_supervisor=; F12 auto-emit routes through supervisor when configured, falls back to bare sender otherwise. Backward-compat: existing webhook_sender= path unchanged. * serve() and create_adcp_server_from_platform() forward the new param. Boot-time validate_webhook_sender_for_platform now accepts either a sender or a supervisor (changed missing-sender error code from "webhook_sender" to "webhook_sender_or_supervisor"). A2A stores example: examples/a2a_sqlalchemy_tasks.py. Companion to the SQLite reference (examples/a2a_db_tasks.py). Same Protocol surface (TaskStore + PushNotificationConfigStore) backed by SQLAlchemy ORM so the same code runs against any backend SQLA supports — SQLite for the demo, Postgres / MySQL in production. Salesagent and other SQLAlchemy-based sellers wrap their existing models behind the Protocols. Tests: 22 new in test_webhook_supervisor.py (retry math, circuit breaker state machine, supervisor success/retry/exception paths, sink failure isolation, F12 wire-through, boot-validation). Existing F12 + serve tests updated for the supervisor parameter. 79 affected tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aker_key, sink timeout
Ten findings from four expert reviews on the WebhookDeliverySupervisor:
**Critical (spec compliance):**
- Idempotency-key reuse on retry — spec requires reusing the same key
on every retry. Refactored to use sender.resend(last_result) for
attempts 2+; only attempt 1 (or after exception with no result) calls
send_mcp fresh.
**High:**
- Cross-tenant breaker collision — added optional breaker_key parameter;
multi-tenant sellers scope via f"{tenant_id}:{url}".
- Sink unbounded on hot path — wrapped in asyncio.wait_for with
configurable RetryPolicy.sink_timeout_seconds (default 5s).
- Sequence number burned on circuit-open — allocated only after
can_attempt() returns True.
- InMemoryWebhookDeliverySupervisor.__init__ now raises ValueError when
sender is None (preserves F12 boot fail-fast).
- UTC = timezone.utc moved below all imports for ruff isort compliance.
**Medium:**
- response_time_ms switched from datetime deltas to time.monotonic().
- record_success while OPEN now warns + transitions to HALF_OPEN with
success_count=1 (was silently flipping to CLOSED).
- DeliveryAttempt.notification_type new optional field for delivery
reports parity with salesagent's WebhookDeliveryLog.
- sequence_key docstring clarified per-stream recommendation
(f"{media_buy_id}:{url}").
Tests: 30 (up from 22). 8 new tests cover idempotency-key reuse via
resend, breaker_key tenant isolation, sequence-number no-burn on
circuit-open, sink timeout, notification_type passthrough, monotonic
clock, init-time None-sender rejection, sender-only backward-compat.
2,990 total tests pass. ruff + mypy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expert review feedback addressed (commit d267fb3)Four reviews ran (code-reviewer, security-reviewer, ad-tech-protocol-expert, python-expert). Ten findings folded in. Critical (spec compliance):
High:
Medium:
Tests: 30 (up from 22). 8 new covering: idempotency-key reuse via Verification:
Findings deferred (low/info, not merge-blockers):
|
|
Thanks for the thorough write-up. All 10 findings addressed, test count up from 22 → 30, 2,990 total passing, ruff + mypy clean — looks ready for human review. Generated by Claude Code |
|
Acknowledged — good to see all 10 findings resolved and test coverage up to 30 in this suite (2,990 total passing). Handing off to human review. Triaged by Claude Code. Session: https://claude.ai/code/session_012oVqtFW56c8kWh7HvU8SzY Generated by Claude Code |
Summary
Round-2 P1: Protocol seam for webhook delivery reliability. The existing `WebhookSender` is the transport (HTTP-Signatures POST, one attempt, no retry); production sellers wrap it with retry / circuit breaker / per-attempt audit. This adds the SDK-side seam so adopters compose existing primitives instead of writing ~1,000 LOC of reliability themselves.
Plus a SQLAlchemy A2A stores example (companion to the existing SQLite reference) so adopters with existing SQLAlchemy schemas wrap their models behind `TaskStore` + `PushNotificationConfigStore`.
What's new
`src/adcp/webhook_supervisor.py` (522 LOC):
Wire-through
SQLAlchemy A2A stores example
`examples/a2a_sqlalchemy_tasks.py` (450 LOC) — companion to `examples/a2a_db_tasks.py` (raw-SQLite reference). Same Protocol surface backed by SQLAlchemy ORM so the same code runs against any backend SQLA supports — SQLite for the demo, Postgres / MySQL in production. Salesagent and other SQLAlchemy-based sellers wrap their existing models behind the Protocols.
Salesagent impact
Test plan
Cross-links
🤖 Generated with Claude Code