Skip to content

feat: {id}/{path:path} patterns in url-allowlist + json_path array indexing#21

Merged
SimoneBottoni merged 2 commits into
mainfrom
feat/allowlist-patterns
May 11, 2026
Merged

feat: {id}/{path:path} patterns in url-allowlist + json_path array indexing#21
SimoneBottoni merged 2 commits into
mainfrom
feat/allowlist-patterns

Conversation

@aural-psynapse
Copy link
Copy Markdown
Contributor

@aural-psynapse aural-psynapse commented May 6, 2026

Closes #20 and #23.

Two related improvements bundled per request — both follow-ups to the trusted-endpoint pattern work in #19.

1. set_intercept_url_allowlist placeholders (closes #20)

Same FastAPI/Express-style placeholders as the trusted-endpoint registry, single shared matcher (_matches_registered from provably.trusted_endpoints):

Placeholder Matches Example
{name} exactly one path segment /customers/{id} matches /customers/42 but not /customers/42/orders
{name:path} any subtree /customers/{rest:path} matches both above

Plain URLs without { keep exact-match semantics — zero migration. Consumers that already register patterns in trusted_endpoints no longer need to enumerate concrete URLs separately for the allowlist.

_url_in_allowlist(nurl) does exact set membership first (O(1)), then iterates only pattern entries on a miss. Plain-URL allowlists pay no per-request iteration cost. The two callsites in _record_and_maybe_tamper (recording gate + tamper-hook gate) collapse into a single in_allowlist boolean — the membership was being computed twice.

2. json_path array indexing (closes #23)

_get_by_json_path previously walked dot-separated dict keys only — any list segment raised expected dict at segment X, got list. Made field_extraction / schema_type / range_threshold unusable against any tool response containing an array.

Now supports two equivalent surfaces:

  • Bracket form: items[0].subject, [0].status, matrix[1][0].v
  • Numeric-segment fallback: items.0.subject (easier shape for naive LLMs)

Implementation:

  • _normalize_json_path lifts bracket indices into their own dot segments via regex: items[0].subjectitems.[0].subject. The empty-segment filter swallows leading double-dots when paths start with a bracket.
  • New _step_into(cursor, segment) handles three cases: bracket against list, numeric against list, key against dict. Type mismatch raises KeyError, out-of-range raises IndexError.
  • evaluate_claim's except clause now also catches IndexError so out-of-range indices surface as CAUGHT with the underlying message in detail.

Pure-dict paths are unchanged.

Tests

15 new (5 for allowlist patterns, 10 for json_path array indexing). Coverage:

Allowlist patterns: pattern matches concrete URL, single-segment placeholder rejects extra segments, {rest:path} covers subtree, mixed exact+pattern allowlist preserves per-entry semantics, plain URLs still exact-only.

json_path arrays: normalization rewrites, bracket at root, bracket inside dict, numeric fallback, nested lists, out-of-range, dict-step against list still raises KeyError, existing pure-dict paths still pass, end-to-end field_extraction PASS and out-of-range CAUGHT.

131/131 tests pass (was 116 on main, +15). Ruff clean.

…rl_allowlist

Parity with the trust gate from PR #19. Reuses the `_matches_registered`
helper from `trusted_endpoints.py` instead of duplicating regex logic, so
both code paths share one matcher.

Implementation:

- `set_intercept_url_allowlist` keeps storing entries as a normalized set
  (no schema change). Pattern detection is by URL content; auto-applied.
- New private `_url_in_allowlist(nurl)` does exact set membership first
  (O(1)), then iterates only the pattern entries (those containing `{`)
  on a miss. Plain-URL allowlists pay no per-request iteration cost.
- The two callsites in `_record_and_maybe_tamper` (recording gate +
  tamper-hook gate) collapse into one `in_allowlist` boolean to avoid
  computing the membership twice.

Closes #20.

5 new unit tests covering: pattern matches concrete URL, single-segment
placeholder rejects extra segments, `{rest:path}` covers subtree, mixed
exact+pattern allowlist retains per-entry semantics, plain URLs still
exact-only (no accidental prefix match).

121/121 tests pass (was 116, +5).
@aural-psynapse aural-psynapse self-assigned this May 6, 2026
SimoneBottoni
SimoneBottoni previously approved these changes May 6, 2026
`_get_by_json_path` previously walked dot-separated dict keys only — any
list segment raised "expected dict at segment X, got list". This made
`field_extraction` / `schema_type` / `range_threshold` unusable against
any tool response containing an array.

Now supports two equivalent surfaces:

- Bracket form: `items[0].subject`, `[0].status`, `matrix[1][0].v`
- Numeric-segment fallback: `items.0.subject` (easier shape for naive LLMs)

Implementation:

- `_normalize_json_path` lifts bracket indices into their own dot
  segments via regex sub: `items[0].subject` → `items.[0].subject`.
  The empty-segment filter in `_get_by_json_path` swallows leading
  double-dots when the path starts with a bracket.
- New `_step_into(cursor, segment)` helper handles the three cases:
  bracket against list, numeric against list, key against dict. Anything
  else raises (KeyError for type mismatch, IndexError for out-of-range).
- `evaluate_claim`'s except clause now also catches IndexError so
  out-of-range indices surface as CAUGHT with the underlying message
  in the detail.

10 new tests covering: normalization rewrites, bracket at root,
bracket inside dict, numeric fallback, nested lists (matrix[i][j].k),
out-of-range, dict-step against list still raises KeyError, existing
pure-dict paths still pass, end-to-end `field_extraction` PASS and
out-of-range CAUGHT.

131/131 tests pass (was 121, +10). Closes #23.
@aural-psynapse aural-psynapse changed the title feat(intercept): {id}/{path:path} patterns in set_intercept_url_allowlist feat: {id}/{path:path} patterns in url-allowlist + json_path array indexing May 6, 2026
@aural-psynapse aural-psynapse added the enhancement New feature or request label May 6, 2026
@SimoneBottoni SimoneBottoni merged commit fb08757 into main May 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

2 participants