Make the SDK fast for 1-agent use; adapt to multi-agent automatically

The SDK was built for two-cluster handoff (A records HTTP, B verifies via proofs). Real usage today — customer-support-sdk-demo, anything Claude-Code-shaped — is one agent doing everything, and that flow pays multi-agent ceremony on every tool call. This proposal removes the ceremony and speeds up multi-agent at the same time. Same code path, no mode flag, no detection logic.

---

**3 tool calls, 3 claims (two of them hit the same intercept row):**

```
NOW  —  every backend round-trip is sequential

  agent  ─► tool 1 ──► [ preprocess poll ]
  agent  ─► tool 2 ──► [ preprocess poll ]
  agent  ─► tool 3 ──► [ preprocess poll ]
  build  ─► QR claim 1
  build  ─► QR claim 2   (duplicates QR1 — same intercept row)
  build  ─► QR claim 3
  eval   ─► fetch claim 1
  eval   ─► fetch claim 2
  eval   ─► fetch claim 3
  eval   ─► verify QR1
  eval   ─► verify QR2
  eval   ─► verify QR3
  ──────────────────────────────────────────────────────────►  time
```

```
PROPOSED  —  preprocess coalesced, claims deduped, the rest parallel

  agent  ─► tool 1, tool 2, tool 3                (no preprocess wait)
  worker ─►   [ 1 preprocess, debounced ]
  build  ─►   [ QR1, QR2  in parallel ]           (3 claims → 2 records)
  eval   ─►   [ fetch1, fetch2, fetch3  in parallel ]
  eval   ─►   [ verify1, verify2  in parallel ]
  ──────────────────────────────────────────────────────────►  time
```

For **K** claims over **M** unique intercept rows:

| Stage                | Now                 | Proposed                      |
|----------------------|---------------------|-------------------------------|
| Tool-call latency    | K × preprocess wait | **~0** (worker is async)      |
| Preprocess runs      | K (one per call)    | **1** (debounced)             |
| Query records made   | K (one per claim)   | **M** (deduped by row)        |
| Evaluator fetches    | K sequential        | K **parallel** (≤8 at once)   |
| Evaluator verifies   | M sequential        | M **parallel** (≤8 at once)   |

K=10, M=5 → 10 preprocesses become 1, 10 query records become 5, 20 sequential evaluator round-trips become 2 parallel batches.

---

## What's slow today

Walking through the same 3-tool-call, 3-claim run:

1. **Each tool call blocks while preprocess runs end-to-end.** `_storage.py:170` synchronously kicks preprocess after every intercept and polls until done. 3 calls = 3 sequential preprocess runs blocking the agent.
2. **Each claim's query record is created serially.** `payload_builder.py:141-162` loops claims doing POST /query → POST /generate_proof → poll, one at a time. K claims = K sequential round-trips, even when claims duplicate.
3. **The evaluator does the same.** `evaluator.py:113-133` fetches each claim's record serially; `:183-209` verifies each unique `query_record_id` serially.
4. **`set_interceptor_context` is mandatory and easy to get wrong.** Interceptor default `"unknown"` (`interceptor.py:238`) doesn't match payload-builder default `"fetch_and_claim"` (`payload_builder.py:30`) — forget the wrap and the lookup silently misses with an empty payload.
5. **Bootstrap always runs preprocess.** `client.py:50` runs it even on padding-only tables.
6. **Polling floors are too high.** `_preprocess.py:102/117` poll every 0.3s / 0.1s — most preprocesses finish faster than the floor.

---

## The fix

**One worker thread coalesces preprocess.** Replace the synchronous per-intercept call in `_storage.py:170` with a "dirty" flag picked up by a debounced background worker. Worker runs preprocess once per 50 ms window. `_build_claims` and `evaluate_handoff` block on a condition variable until the proof catches up to their snapshot (`SELECT MAX(id) FROM provably_intercepts`).

This is the whole single-vs-multi-agent story in one mechanism:

- 1 agent, 10 sequential calls → **1** preprocess (was 10)
- N agents interleaving → still 1 worker; each agent's evaluate waits for its own snapshot

**Dedupe per-intercept query records.** Today `payload_builder.py:141-162` creates a query record per claim, even when several claims target the same intercept row. Group claims by **SQL signature** (`row_id` when the interceptor recorded one, else the fallback `WHERE action_name = '...'` at `_query_records.py:83-88`) before creating; share the resulting `query_record_id` across the group. K claims with M unique signatures → **M** query records instead of K. `evaluator.py:183-209` already dedupes by `query_record_id`, so this falls out for free downstream.

**Parallelize the per-claim loops.** Three places, all bounded `ThreadPoolExecutor(max_workers=8)`:

- Query-record creation (`payload_builder.py:141`) — over the deduped set
- Evaluator fetch (`evaluator.py:113`)
- Evaluator verify (`evaluator.py:183`)

**Make `set_interceptor_context` optional.** Align interceptor + payload-builder defaults to `"_default"`. Single-agent users skip the wrap entirely. Multi-agent users keep labeling agents the way they always have — no behavior change for them.

**Two small wins.** Skip startup preprocess on padding-only tables (`client.py:50`). Drop polling floors to 0.05s with exponential ramp.

**One sugar.** `provably.verify(claims)` — a one-call wrapper around `build_handoff_payload` + `evaluate_handoff`. Old two-step API stays.

---

## Code: now vs proposed for a single-agent user

```python
# Now
provably.configure_indexing(enable_indexing=True)

set_interceptor_context(agent_id="demo", action_name="get_weather")  # mandatory
requests.get(...)

payload = provably.build_handoff_payload(claims)
verdict = provably.evaluate_handoff(
    payload, provably_base_url=..., postgres_url=..., org_id_fallback=...,
)
```

```python
# Proposed
provably.configure_indexing(enable_indexing=True)

requests.get(...)  # no wrap

verdict = provably.verify(claims)
```

---

## Files touched

- `_preprocess.py` — worker thread, cond-var sync, adaptive polling
- `_storage.py:170` — `mark_dirty()` instead of sync preprocess
- `payload_builder.py` — snapshot fence; dedupe claims by intercept row; parallel query-record creation; default `intercept_agent_id="_default"`
- `evaluator.py` — parallel Phase 1+2 fetch and Phase 3 verify
- `interceptor.py:238` — default agent_id `"_default"`
- `client.py:50` — skip bootstrap preprocess on padding-only table
- `__init__.py` — export `verify()`

No deletions. No breaking imports. No new required public surface.

---

## How we verify

- `pytest tests/unit/`, `tests/e2e/test_interceptor_e2e.py`, `tests/e2e/test_post_handoff_e2e.py` pass unchanged
- `time python examples/openai_agents/agent_run.py` before vs after
- Run customer-support-sdk-demo end-to-end — `evaluate` should drop substantially on multi-claim runs with no code changes
- New concurrency test: two threads insert intercepts while a third calls `_build_claims`; verify the claims reflect the highest committed `id`

---

## Open for discussion

- Is the 50 ms debounce the right default, or should it be tunable?
- Is `max_workers=8` safe against Rust BE rate limits?
- Should `verify()` accept the same kwargs as `evaluate_handoff` (timeout, etc.) or stay minimal?
- **Worker thread lifecycle**: when does it start (first `mark_dirty()`? import-time?) and how does it stop (`atexit`? explicit `shutdown()`?). Needs to be nailed down in the PR.
- **Polling floor of 0.05s**: chosen without measuring the actual preprocess-completion distribution from the Rust BE. If most preprocesses finish in 80–150 ms, 0.05s costs ~3× more polls than 0.3s with little payoff. Worth benchmarking before locking in.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the SDK fast for 1-agent use; adapt to multi-agent automatically #34

What's slow today

The fix

Code: now vs proposed for a single-agent user

Files touched

How we verify

Open for discussion

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stage	Now	Proposed
Tool-call latency	K × preprocess wait	~0 (worker is async)
Preprocess runs	K (one per call)	1 (debounced)
Query records made	K (one per claim)	M (deduped by row)
Evaluator fetches	K sequential	K parallel (≤8 at once)
Evaluator verifies	M sequential	M parallel (≤8 at once)

Make the SDK fast for 1-agent use; adapt to multi-agent automatically #34

Description

What's slow today

The fix

Code: now vs proposed for a single-agent user

Files touched

How we verify

Open for discussion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions