Skip to content

feat(redact): add OpenAI Privacy Filter as optional 8th detection layer#1214

Open
peyton-alt wants to merge 6 commits into
mainfrom
feat/openai-privacy-filter
Open

feat(redact): add OpenAI Privacy Filter as optional 8th detection layer#1214
peyton-alt wants to merge 6 commits into
mainfrom
feat/openai-privacy-filter

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented May 14, 2026

Summary

Opt-in 8th redaction layer that shells out to the user-installed OpenAI Privacy Filter (opf) binary. Runs only at condensation + export boundaries; per-turn writes stay on the existing 7-layer pipeline. Default-off; users opt in via redaction.openai_privacy_filter.enabled = true.

This PR was rewritten from main on 2026-05-14. The first iteration accumulated review-fix scope across multiple bot review passes and reached ~3000 insertions. The rewrite keeps every real correctness/privacy fix found during that cycle while skipping speculative scope (~1547 insertions, 4 commits, 19 files). The original work is preserved at feat/openai-privacy-filter-v1-backup for reference.

What's included

  • Two public entry points (StringWithPrivacyFilter, JSONLContentWithPrivacyFilter) plus thin Bytes wrappers — four plain entry points unchanged so per-turn temp writes never invoke OPF
  • \x1e-joined batched inference: one shell-out per scope amortizes the OPF cold-start across the whole transcript
  • Process-scoped circuit breaker: first OPF failure disables it for the rest of the invocation (closes OPF: process-scoped circuit breaker after first failure #1218)
  • Split createRedactedBlobFromFile so per-turn temp writes (addDirectoryToChanges) stay on the plain pipeline while copyMetadataDir uses the OPF-enabled variant
  • PromptsRedactedContent field on Write/UpdateCommittedOptions: pre-compute the joined-prompt redaction once in finalizeAllTurnCheckpoints and condenseSingleCheckpoint so multi-checkpoint commits don't re-run OPF N×
  • Settings layer rejects unknown category names at parse time — silent zero-detection of a privacy category is effectively a correctness bug
  • Sanitized stdout/stderr in shellout errors (OPF could echo input content; bytes-only diagnostics)
  • charToByteOffset correct for multibyte UTF-8 (no off-by-one at end)
  • Span docs reflect byte offsets (shellout adapter converts from OPF's character offsets)
  • Inline shell-out runtime, label map, and progress UX in redact/opf.go — no redact/opf_runtime/ subpackage, no separate progress writer

What's intentionally NOT included (deferred to follow-ups)

  • Summary-field batching via a StringsWithPrivacyFilter API — summaries are opt-in (only when IsSummarizeEnabled), have 5–10 short fields, and add ~10s worst case. Acceptable until someone complains.
  • doctor_bundle per-entry batching — diagnostic command, runs rarely, worst case slow not broken.
  • on_failure settings field — dropped entirely. DisallowUnknownFields rejects any user who tries to set it. Warn is the only supported mode today; if block-mode runtime wiring lands later, the field comes back in lockstep.

Test plan

  • mise run check (fmt + lint + test:ci + Vogon canary + roger-roger external-agent canary) — all green on 672044c2f
  • Real-agent smoke test on a fresh repo with private_person enabled — confirmed:
    • No OpenAI Privacy Filter: scanning transcript during the agent turn (per-turn writes don't invoke OPF)
    • → scanning transcript… ✓ done (29.4s) then (2.3s) at commit time (transcript + joined-prompts passes)
    • [REDACTED_PERSON] appears in entire checkpoint explain HEAD for both the prompt and the assistant transcript

🤖 Generated with Claude Code

@peyton-alt peyton-alt requested a review from a team as a code owner May 14, 2026 18:00
Copilot AI review requested due to automatic review settings May 14, 2026 18:00
Comment thread cmd/entire/cli/checkpoint/committed.go Outdated
Comment thread redact/opf_progress.go Outdated
Comment thread redact/opf_runtime/shellout.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

redact/opf_runtime/shellout.go:199

  • charToByteOffset accepts an offset that is one rune past the end: for a 3-rune string, charOff == 4 reaches byteOff == len(s) and returns len(s) instead of -1. That lets malformed OPF spans at EOF be treated as valid and can over-redact the last input in a batch; only charOff == runeCount should map to len(s).
	for i := range charOff {
		if byteOff >= len(s) {
			if i == charOff-1 && byteOff == len(s) {
				return byteOff
			}

cmd/entire/cli/checkpoint/committed.go:1778

  • createRedactedBlobFromFile is shared by the temporary shadow-branch metadata writers (temporary.go:987 and temporary.go:1046). Switching this shared helper to the OPF-enabled redactors makes per-turn temporary metadata writes invoke OPF, contrary to the new design that keeps OPF at condensation/export boundaries and potentially adding OPF latency to every turn. Split the helper or pass a redaction mode so temporary paths continue using the plain redactors.
		redacted, jsonlErr := redact.JSONLBytesWithPrivacyFilter(ctx, content)
		if jsonlErr != nil {
			content = redact.BytesWithPrivacyFilter(ctx, content)

Comment thread redact/opf_runtime/shellout.go Outdated
Comment thread redact/opf_runtime/runtime.go Outdated
Comment thread cmd/entire/cli/settings/settings.go Outdated
Comment thread redact/opf.go Outdated
Comment thread cmd/entire/cli/checkpoint/committed.go Outdated
Comment thread cmd/entire/cli/strategy/manual_commit_hooks.go Outdated
peyton-alt added a commit that referenced this pull request May 14, 2026
Cursor Bugbot HIGH: split createRedactedBlobFromFile so per-turn temporary
writes use the plain 7-layer pipeline while committed writes use the full
8-layer pipeline (including OPF). The shared helper had silently leaked
OPF into per-turn writes via addDirectoryToChanges/addDirectoryToEntries.

Copilot Critical:
- on_failure enum validation now runs on every settings load path
  (LoadFromBytes + loadFromFile), not only the merge path.
- Parse-error path in shellOut no longer embeds stdout.String() in the
  returned error so transcript fragments don't leak to logs or TTY.

Copilot Important:
- Hoist joined-prompt redaction out of finalizeAllTurnCheckpoints' per-
  checkpoint loop and per-prompt loop; pre-compute once and pass through
  via PromptsRedactedContent. Drops OPF calls on prompts from
  len(prompts) + 2N (N=checkpoint count) to 1 per finalize.
- Same one-shot pre-redaction applied to condenseSingleCheckpoint so v1
  and v2 writers reuse a single OPF result per checkpoint.
- Span doc corrected: boundaries are byte offsets (shellout adapter
  translates from OPF's character offsets before returning Spans).

Cursor Bugbot Low: charToByteOffset no longer returns len(s) for
charOff == runeCount+1; tests pin the end-of-string and past-end cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: eed66017e863
@peyton-alt peyton-alt requested a review from Copilot May 14, 2026 20:17
@peyton-alt
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 1012600. Configure here.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.

Comment thread redact/opf_runtime/shellout.go Outdated
Comment thread cmd/entire/cli/settings/settings.go Outdated
Comment thread redact/opf.go Outdated
Comment thread cmd/entire/cli/checkpoint/committed.go Outdated
Comment thread cmd/entire/cli/doctor_bundle.go Outdated
Comment thread cmd/entire/cli/settings/settings.go Outdated
Comment thread docs/security-and-privacy.md Outdated
Comment thread redact/opf.go Outdated
Comment thread cmd/entire/cli/checkpoint/temporary.go Outdated
Comment thread redact/opf.go Outdated
peyton-alt added a commit that referenced this pull request May 14, 2026
Privacy / safety:
- shellout: sanitize the cmd-run exit-error path (line 146) — stderr can
  contain echoed input from a misconfigured opf wrapper, so embedding it
  verbatim would leak transcript content via .entire/logs and the TTY.
  Report exit failure + stderr byte count instead. Mirrors the existing
  parse-error treatment.

Validation / UX:
- settings: reject unknown opf categories (e.g. "private_peerson") at
  parse time via redact.IsKnownOPFCategory. Previously a typo left OPF
  "enabled" but with zero detections and no feedback. Closed against the
  canonical label map in redact/opf.go.

Reliability:
- redact: process-scoped circuit breaker (opfBreakerTripped atomic.Bool).
  First detectOPF failure trips it; subsequent calls short-circuit before
  shelling out. One broken OPF install used to mean N × 30s timeouts
  per commit/bundle — now it's one warning plus graceful fallback.
  Reconfigure / ResetOPFConfigForTest clear the breaker so a fresh
  process retries.

Perf:
- redact: new public StringsWithPrivacyFilter([]string) []string that
  batches N inputs into a single RedactBatch call. Mirrors the
  JSONLContentWithPrivacyFilter design (has-space filter + dedupe + one
  inference pass + per-input span distribution).
- checkpoint.redactSummary: flatten Intent/Outcome/Friction/OpenItems/
  Learnings.{Repo,Workflow,Code.Finding} into one batched call. A summary
  with several Friction or Code entries used to pay the cold-start once
  per field; now once total. Preserves nil-vs-empty slice shape.

Doc / comment hygiene:
- security-and-privacy.md cost note updated to "~25–30s on CPU" (was
  "a few seconds"), now matches realistic commit behavior; mentions the
  circuit breaker.
- redact/opf.go detectOPF perf comment updated — no longer references
  the per-leaf model; references RedactBatch instead.
- handleOPFFailure TODO updated — block-mode is now rejected at settings
  parse time; relaxing that and wiring block-mode propagation must happen
  in lockstep.
- temporary.go broken doc-link replaced with a pointer to
  security-and-privacy.md (the design spec was never committed).

Tests:
- TestShellOut_ExitError_DoesNotLeakStderr pins the new stderr policy.
- TestShellOut_NonZeroExit asserts the sanitized contract (no passthrough).
- TestLoadFromBytes_RejectsUnknownCategory table-tests typo rejection.
- TestDetectOPF_CircuitBreakerSkipsAfterFirstFailure pins breaker.
- TestStringsWithPrivacyFilter_{BatchesSingleOPFCall,FallsBackOnBatchError}
  pin the batched-strings contract.
- TestRedactSummary_PreservesNilVsEmptySliceShape replaces the removed
  per-helper tests with a behavior-level assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: cbbcb124ec2e
peyton-alt and others added 4 commits May 14, 2026 18:23
Adds OPF as an opt-in 8th region producer in the redaction pipeline.
Two public entry points (StringWithPrivacyFilter,
JSONLContentWithPrivacyFilter) gate the OPF call; the four plain entry
points (String, Bytes, JSONLBytes, JSONLContent) are unchanged so
per-turn temp writes never invoke OPF.

Single inference pass per scope via \x1e-joined batching — opf otherwise
runs a fresh inference pass per newline-delimited input, defeating the
batch. Process-scoped atomic circuit breaker disables OPF after the
first runtime failure so a broken opf install costs one warning instead
of N×30s timeouts.

Settings layer (redaction.openai_privacy_filter) accepts enabled +
categories + command + timeout_seconds. The on_failure field is
intentionally absent: warn-only is the only mode the runtime supports
today, and DisallowUnknownFields rejects users who try to opt into a
fail-closed mode that doesn't exist. Category names are validated
against the canonical map at parse time — silent zero-detection of a
privacy category is effectively a correctness bug.

Shell-out runtime, progress UX, and label mapping are all inlined in
redact/opf.go (no separate subpackage, no separate progress writer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits createRedactedBlobFromFile so per-turn temp writes (shared via
addDirectoryToChanges) stay on the plain 7-layer pipeline, and committed
writes use the full 8-layer pipeline that includes OPF. Without the
split, OPF would leak into the agent loop and add the OPF cold-start
to every per-turn write.

Adds PromptsRedactedContent to Write/UpdateCommittedOptions so the
finalize hook + single-checkpoint condense pre-compute the joined-prompt
redaction once and pass it through. Without this, each checkpoint
within a turn re-runs StringWithPrivacyFilter over identical input
(N×OPF on a turn with N checkpoints), and the v1+v2 dual-write doubles
that to 2N.

The transcript redaction in finalizeAllTurnCheckpoints and
condenseSingleCheckpoint moves to JSONLBytesWithPrivacyFilter; the
existing redactSessionJSONLBytes test seam gains a context argument so
tests can still swap a deterministic stub.

Wires settings.OpenAIPrivacyFilter into redact.ConfigurePrivacyFilter
from EnsureRedactionConfigured at startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an "Optional OpenAI Privacy Filter (opf)" section to
docs/security-and-privacy.md parallel to the existing "Optional PII
redaction" section: prerequisites (pip install opf), enable example,
per-category replacement-token table, full settings reference, failure
behavior (warn + circuit breaker), realistic cost (~25-30s on CPU), and
a "Verifying it's working" recipe.

Also updates the layer-count summary in the intro to mention the new
opt-in eighth pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Leftover from when assert.Contains replaced an earlier strings.Contains
call. The test build failed in test:ci with "strings" imported and not
used; this change is just removing the dead import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@peyton-alt peyton-alt force-pushed the feat/openai-privacy-filter branch from ea9b4f5 to 672044c Compare May 14, 2026 22:52
peyton-alt and others added 2 commits May 15, 2026 11:35
Move GetOPFConfigForTest and ResetOPFConfigForTest from opf.go to
redact/export_test.go so the redact package's public API no longer
exposes mutators for the global OPF config.

Introduce redact.RedactedJoinedPrompts as a typed wrapper around the
pre-redacted joined-prompts blob written to checkpoint prompt.txt.
Construct only via redact.JoinedPrompts (runs the full 8-layer pipeline
on the joined input) or AlreadyRedactedJoinedPrompts (trusted-source
escape hatch). Rename WriteCommittedOptions.PromptsRedactedContent
(string) and the matching field on UpdateCommittedOptions to
PromptsRedacted (RedactedJoinedPrompts) so the "this content was
produced by the redaction pipeline" claim becomes a compile-time
invariant: callers cannot assign an arbitrary string. The raw
Prompts []string field gets a docstring warning that it must be
consumed only via redactJoinedPrompts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three tests in manifest_test.go hardcoded `started := time.Date(2026, 5, 8, ...)`
as the session.State.StartedAt anchor. session.StateStore.Load auto-deletes
sessions whose StartedAt is older than StaleSessionThreshold (7 days) and
returns nil, so the hardcoded date silently rots: tests pass while the
calendar is inside the 7-day window, then fail forever once it crosses.

CI on PR #964 caught this — same SHA passed yesterday (6 days after the
hardcoded date) and failed today (7 days after). Unrelated to the
streaming/diagnostic work in this PR; the manifest_test.go file isn't
touched by any other commit on this branch.

Switch all three tests to `time.Now().UTC().Add(-time.Hour)` so the
session is always one hour old at test time. Still exercises the
5-second jitter check inside matchReviewSessionState; stays well inside
the staleness window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

OPF: process-scoped circuit breaker after first failure

2 participants