feat(redact): add OpenAI Privacy Filter as optional 8th detection layer by peyton-alt · Pull Request #1214 · entireio/cli

peyton-alt · 2026-05-14T18:00:13Z

Summary

Opt-in 8th redaction layer that shells out to the user-installed OpenAI Privacy Filter (opf) binary. Runs only at condensation + export boundaries; per-turn writes stay on the existing 7-layer pipeline. Default-off; users opt in via redaction.openai_privacy_filter.enabled = true.

This PR was rewritten from main on 2026-05-14. The first iteration accumulated review-fix scope across multiple bot review passes and reached ~3000 insertions. The rewrite keeps every real correctness/privacy fix found during that cycle while skipping speculative scope (~1547 insertions, 4 commits, 19 files). The original work is preserved at feat/openai-privacy-filter-v1-backup for reference.

What's included

Two public entry points (StringWithPrivacyFilter, JSONLContentWithPrivacyFilter) plus thin Bytes wrappers — four plain entry points unchanged so per-turn temp writes never invoke OPF
\x1e-joined batched inference: one shell-out per scope amortizes the OPF cold-start across the whole transcript
Process-scoped circuit breaker: first OPF failure disables it for the rest of the invocation (closes OPF: process-scoped circuit breaker after first failure #1218)
Split createRedactedBlobFromFile so per-turn temp writes (addDirectoryToChanges) stay on the plain pipeline while copyMetadataDir uses the OPF-enabled variant
PromptsRedactedContent field on Write/UpdateCommittedOptions: pre-compute the joined-prompt redaction once in finalizeAllTurnCheckpoints and condenseSingleCheckpoint so multi-checkpoint commits don't re-run OPF N×
Settings layer rejects unknown category names at parse time — silent zero-detection of a privacy category is effectively a correctness bug
Sanitized stdout/stderr in shellout errors (OPF could echo input content; bytes-only diagnostics)
charToByteOffset correct for multibyte UTF-8 (no off-by-one at end)
Span docs reflect byte offsets (shellout adapter converts from OPF's character offsets)
Inline shell-out runtime, label map, and progress UX in redact/opf.go — no redact/opf_runtime/ subpackage, no separate progress writer

What's intentionally NOT included (deferred to follow-ups)

Summary-field batching via a StringsWithPrivacyFilter API — summaries are opt-in (only when IsSummarizeEnabled), have 5–10 short fields, and add ~10s worst case. Acceptable until someone complains.
doctor_bundle per-entry batching — diagnostic command, runs rarely, worst case slow not broken.
on_failure settings field — dropped entirely. DisallowUnknownFields rejects any user who tries to set it. Warn is the only supported mode today; if block-mode runtime wiring lands later, the field comes back in lockstep.

Test plan

mise run check (fmt + lint + test:ci + Vogon canary + roger-roger external-agent canary) — all green on 672044c2f
Real-agent smoke test on a fresh repo with private_person enabled — confirmed:
- No OpenAI Privacy Filter: scanning transcript during the agent turn (per-turn writes don't invoke OPF)
- → scanning transcript… ✓ done (29.4s) then (2.3s) at commit time (transcript + joined-prompts passes)
- [REDACTED_PERSON] appears in entire checkpoint explain HEAD for both the prompt and the assistant transcript

🤖 Generated with Claude Code

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

redact/opf_runtime/shellout.go:199

charToByteOffset accepts an offset that is one rune past the end: for a 3-rune string, charOff == 4 reaches byteOff == len(s) and returns len(s) instead of -1. That lets malformed OPF spans at EOF be treated as valid and can over-redact the last input in a batch; only charOff == runeCount should map to len(s).

	for i := range charOff {
		if byteOff >= len(s) {
			if i == charOff-1 && byteOff == len(s) {
				return byteOff
			}

cmd/entire/cli/checkpoint/committed.go:1778

createRedactedBlobFromFile is shared by the temporary shadow-branch metadata writers (temporary.go:987 and temporary.go:1046). Switching this shared helper to the OPF-enabled redactors makes per-turn temporary metadata writes invoke OPF, contrary to the new design that keeps OPF at condensation/export boundaries and potentially adding OPF latency to every turn. Split the helper or pass a redaction mode so temporary paths continue using the plain redactors.

		redacted, jsonlErr := redact.JSONLBytesWithPrivacyFilter(ctx, content)
		if jsonlErr != nil {
			content = redact.BytesWithPrivacyFilter(ctx, content)

Cursor Bugbot HIGH: split createRedactedBlobFromFile so per-turn temporary writes use the plain 7-layer pipeline while committed writes use the full 8-layer pipeline (including OPF). The shared helper had silently leaked OPF into per-turn writes via addDirectoryToChanges/addDirectoryToEntries. Copilot Critical: - on_failure enum validation now runs on every settings load path (LoadFromBytes + loadFromFile), not only the merge path. - Parse-error path in shellOut no longer embeds stdout.String() in the returned error so transcript fragments don't leak to logs or TTY. Copilot Important: - Hoist joined-prompt redaction out of finalizeAllTurnCheckpoints' per- checkpoint loop and per-prompt loop; pre-compute once and pass through via PromptsRedactedContent. Drops OPF calls on prompts from len(prompts) + 2N (N=checkpoint count) to 1 per finalize. - Same one-shot pre-redaction applied to condenseSingleCheckpoint so v1 and v2 writers reuse a single OPF result per checkpoint. - Span doc corrected: boundaries are byte offsets (shellout adapter translates from OPF's character offsets before returning Spans). Cursor Bugbot Low: charToByteOffset no longer returns len(s) for charOff == runeCount+1; tests pin the end-of-string and past-end cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eed66017e863

peyton-alt · 2026-05-14T20:17:37Z

@BugBot review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 1012600. Configure here.}

Copilot

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.

Privacy / safety: - shellout: sanitize the cmd-run exit-error path (line 146) — stderr can contain echoed input from a misconfigured opf wrapper, so embedding it verbatim would leak transcript content via .entire/logs and the TTY. Report exit failure + stderr byte count instead. Mirrors the existing parse-error treatment. Validation / UX: - settings: reject unknown opf categories (e.g. "private_peerson") at parse time via redact.IsKnownOPFCategory. Previously a typo left OPF "enabled" but with zero detections and no feedback. Closed against the canonical label map in redact/opf.go. Reliability: - redact: process-scoped circuit breaker (opfBreakerTripped atomic.Bool). First detectOPF failure trips it; subsequent calls short-circuit before shelling out. One broken OPF install used to mean N × 30s timeouts per commit/bundle — now it's one warning plus graceful fallback. Reconfigure / ResetOPFConfigForTest clear the breaker so a fresh process retries. Perf: - redact: new public StringsWithPrivacyFilter([]string) []string that batches N inputs into a single RedactBatch call. Mirrors the JSONLContentWithPrivacyFilter design (has-space filter + dedupe + one inference pass + per-input span distribution). - checkpoint.redactSummary: flatten Intent/Outcome/Friction/OpenItems/ Learnings.{Repo,Workflow,Code.Finding} into one batched call. A summary with several Friction or Code entries used to pay the cold-start once per field; now once total. Preserves nil-vs-empty slice shape. Doc / comment hygiene: - security-and-privacy.md cost note updated to "~25–30s on CPU" (was "a few seconds"), now matches realistic commit behavior; mentions the circuit breaker. - redact/opf.go detectOPF perf comment updated — no longer references the per-leaf model; references RedactBatch instead. - handleOPFFailure TODO updated — block-mode is now rejected at settings parse time; relaxing that and wiring block-mode propagation must happen in lockstep. - temporary.go broken doc-link replaced with a pointer to security-and-privacy.md (the design spec was never committed). Tests: - TestShellOut_ExitError_DoesNotLeakStderr pins the new stderr policy. - TestShellOut_NonZeroExit asserts the sanitized contract (no passthrough). - TestLoadFromBytes_RejectsUnknownCategory table-tests typo rejection. - TestDetectOPF_CircuitBreakerSkipsAfterFirstFailure pins breaker. - TestStringsWithPrivacyFilter_{BatchesSingleOPFCall,FallsBackOnBatchError} pin the batched-strings contract. - TestRedactSummary_PreservesNilVsEmptySliceShape replaces the removed per-helper tests with a behavior-level assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: cbbcb124ec2e

Adds OPF as an opt-in 8th region producer in the redaction pipeline. Two public entry points (StringWithPrivacyFilter, JSONLContentWithPrivacyFilter) gate the OPF call; the four plain entry points (String, Bytes, JSONLBytes, JSONLContent) are unchanged so per-turn temp writes never invoke OPF. Single inference pass per scope via \x1e-joined batching — opf otherwise runs a fresh inference pass per newline-delimited input, defeating the batch. Process-scoped atomic circuit breaker disables OPF after the first runtime failure so a broken opf install costs one warning instead of N×30s timeouts. Settings layer (redaction.openai_privacy_filter) accepts enabled + categories + command + timeout_seconds. The on_failure field is intentionally absent: warn-only is the only mode the runtime supports today, and DisallowUnknownFields rejects users who try to opt into a fail-closed mode that doesn't exist. Category names are validated against the canonical map at parse time — silent zero-detection of a privacy category is effectively a correctness bug. Shell-out runtime, progress UX, and label mapping are all inlined in redact/opf.go (no separate subpackage, no separate progress writer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Splits createRedactedBlobFromFile so per-turn temp writes (shared via addDirectoryToChanges) stay on the plain 7-layer pipeline, and committed writes use the full 8-layer pipeline that includes OPF. Without the split, OPF would leak into the agent loop and add the OPF cold-start to every per-turn write. Adds PromptsRedactedContent to Write/UpdateCommittedOptions so the finalize hook + single-checkpoint condense pre-compute the joined-prompt redaction once and pass it through. Without this, each checkpoint within a turn re-runs StringWithPrivacyFilter over identical input (N×OPF on a turn with N checkpoints), and the v1+v2 dual-write doubles that to 2N. The transcript redaction in finalizeAllTurnCheckpoints and condenseSingleCheckpoint moves to JSONLBytesWithPrivacyFilter; the existing redactSessionJSONLBytes test seam gains a context argument so tests can still swap a deterministic stub. Wires settings.OpenAIPrivacyFilter into redact.ConfigurePrivacyFilter from EnsureRedactionConfigured at startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds an "Optional OpenAI Privacy Filter (opf)" section to docs/security-and-privacy.md parallel to the existing "Optional PII redaction" section: prerequisites (pip install opf), enable example, per-category replacement-token table, full settings reference, failure behavior (warn + circuit breaker), realistic cost (~25-30s on CPU), and a "Verifying it's working" recipe. Also updates the layer-count summary in the intro to mention the new opt-in eighth pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Leftover from when assert.Contains replaced an earlier strings.Contains call. The test build failed in test:ci with "strings" imported and not used; this change is just removing the dead import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move GetOPFConfigForTest and ResetOPFConfigForTest from opf.go to redact/export_test.go so the redact package's public API no longer exposes mutators for the global OPF config. Introduce redact.RedactedJoinedPrompts as a typed wrapper around the pre-redacted joined-prompts blob written to checkpoint prompt.txt. Construct only via redact.JoinedPrompts (runs the full 8-layer pipeline on the joined input) or AlreadyRedactedJoinedPrompts (trusted-source escape hatch). Rename WriteCommittedOptions.PromptsRedactedContent (string) and the matching field on UpdateCommittedOptions to PromptsRedacted (RedactedJoinedPrompts) so the "this content was produced by the redaction pipeline" claim becomes a compile-time invariant: callers cannot assign an arbitrary string. The raw Prompts []string field gets a docstring warning that it must be consumed only via redactJoinedPrompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three tests in manifest_test.go hardcoded `started := time.Date(2026, 5, 8, ...)` as the session.State.StartedAt anchor. session.StateStore.Load auto-deletes sessions whose StartedAt is older than StaleSessionThreshold (7 days) and returns nil, so the hardcoded date silently rots: tests pass while the calendar is inside the 7-day window, then fail forever once it crosses. CI on PR #964 caught this — same SHA passed yesterday (6 days after the hardcoded date) and failed today (7 days after). Unrelated to the streaming/diagnostic work in this PR; the manifest_test.go file isn't touched by any other commit on this branch. Switch all three tests to `time.Now().UTC().Add(-time.Hour)` so the session is always one hour old at test time. Still exercises the 5-second jitter check inside matchReviewSessionState; stays well inside the staleness window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

peyton-alt requested a review from a team as a code owner May 14, 2026 18:00

Copilot AI review requested due to automatic review settings May 14, 2026 18:00

Copilot started reviewing on behalf of peyton-alt May 14, 2026 18:00 View session

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread cmd/entire/cli/checkpoint/committed.go Outdated

Comment thread redact/opf_progress.go Outdated

Comment thread redact/opf_runtime/shellout.go Outdated

Copilot AI reviewed May 14, 2026

View reviewed changes

peyton-alt requested a review from Copilot May 14, 2026 18:57

Copilot started reviewing on behalf of peyton-alt May 14, 2026 18:58 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

peyton-alt requested a review from Copilot May 14, 2026 20:17

Copilot started reviewing on behalf of peyton-alt May 14, 2026 20:18 View session

cursor Bot reviewed May 14, 2026

View reviewed changes

Copilot AI reviewed May 14, 2026

View reviewed changes

peyton-alt mentioned this pull request May 14, 2026

OPF: process-scoped circuit breaker after first failure #1218

Closed

peyton-alt and others added 4 commits May 14, 2026 18:23

peyton-alt force-pushed the feat/openai-privacy-filter branch from ea9b4f5 to 672044c Compare May 14, 2026 22:52

peyton-alt and others added 2 commits May 15, 2026 11:35

Conversation

peyton-alt commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

What's intentionally NOT included (deferred to follow-ups)

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peyton-alt commented May 14, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

peyton-alt commented May 14, 2026 •

edited

Loading