feat(redact): add OpenAI Privacy Filter as optional 8th detection layer#1214
Open
peyton-alt wants to merge 6 commits into
Open
feat(redact): add OpenAI Privacy Filter as optional 8th detection layer#1214peyton-alt wants to merge 6 commits into
peyton-alt wants to merge 6 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (2)
redact/opf_runtime/shellout.go:199
charToByteOffsetaccepts an offset that is one rune past the end: for a 3-rune string,charOff == 4reachesbyteOff == len(s)and returnslen(s)instead of-1. That lets malformed OPF spans at EOF be treated as valid and can over-redact the last input in a batch; onlycharOff == runeCountshould map tolen(s).
for i := range charOff {
if byteOff >= len(s) {
if i == charOff-1 && byteOff == len(s) {
return byteOff
}
cmd/entire/cli/checkpoint/committed.go:1778
createRedactedBlobFromFileis shared by the temporary shadow-branch metadata writers (temporary.go:987andtemporary.go:1046). Switching this shared helper to the OPF-enabled redactors makes per-turn temporary metadata writes invoke OPF, contrary to the new design that keeps OPF at condensation/export boundaries and potentially adding OPF latency to every turn. Split the helper or pass a redaction mode so temporary paths continue using the plain redactors.
redacted, jsonlErr := redact.JSONLBytesWithPrivacyFilter(ctx, content)
if jsonlErr != nil {
content = redact.BytesWithPrivacyFilter(ctx, content)
peyton-alt
added a commit
that referenced
this pull request
May 14, 2026
Cursor Bugbot HIGH: split createRedactedBlobFromFile so per-turn temporary writes use the plain 7-layer pipeline while committed writes use the full 8-layer pipeline (including OPF). The shared helper had silently leaked OPF into per-turn writes via addDirectoryToChanges/addDirectoryToEntries. Copilot Critical: - on_failure enum validation now runs on every settings load path (LoadFromBytes + loadFromFile), not only the merge path. - Parse-error path in shellOut no longer embeds stdout.String() in the returned error so transcript fragments don't leak to logs or TTY. Copilot Important: - Hoist joined-prompt redaction out of finalizeAllTurnCheckpoints' per- checkpoint loop and per-prompt loop; pre-compute once and pass through via PromptsRedactedContent. Drops OPF calls on prompts from len(prompts) + 2N (N=checkpoint count) to 1 per finalize. - Same one-shot pre-redaction applied to condenseSingleCheckpoint so v1 and v2 writers reuse a single OPF result per checkpoint. - Span doc corrected: boundaries are byte offsets (shellout adapter translates from OPF's character offsets before returning Spans). Cursor Bugbot Low: charToByteOffset no longer returns len(s) for charOff == runeCount+1; tests pin the end-of-string and past-end cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eed66017e863
Contributor
Author
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 1012600. Configure here.
peyton-alt
added a commit
that referenced
this pull request
May 14, 2026
Privacy / safety:
- shellout: sanitize the cmd-run exit-error path (line 146) — stderr can
contain echoed input from a misconfigured opf wrapper, so embedding it
verbatim would leak transcript content via .entire/logs and the TTY.
Report exit failure + stderr byte count instead. Mirrors the existing
parse-error treatment.
Validation / UX:
- settings: reject unknown opf categories (e.g. "private_peerson") at
parse time via redact.IsKnownOPFCategory. Previously a typo left OPF
"enabled" but with zero detections and no feedback. Closed against the
canonical label map in redact/opf.go.
Reliability:
- redact: process-scoped circuit breaker (opfBreakerTripped atomic.Bool).
First detectOPF failure trips it; subsequent calls short-circuit before
shelling out. One broken OPF install used to mean N × 30s timeouts
per commit/bundle — now it's one warning plus graceful fallback.
Reconfigure / ResetOPFConfigForTest clear the breaker so a fresh
process retries.
Perf:
- redact: new public StringsWithPrivacyFilter([]string) []string that
batches N inputs into a single RedactBatch call. Mirrors the
JSONLContentWithPrivacyFilter design (has-space filter + dedupe + one
inference pass + per-input span distribution).
- checkpoint.redactSummary: flatten Intent/Outcome/Friction/OpenItems/
Learnings.{Repo,Workflow,Code.Finding} into one batched call. A summary
with several Friction or Code entries used to pay the cold-start once
per field; now once total. Preserves nil-vs-empty slice shape.
Doc / comment hygiene:
- security-and-privacy.md cost note updated to "~25–30s on CPU" (was
"a few seconds"), now matches realistic commit behavior; mentions the
circuit breaker.
- redact/opf.go detectOPF perf comment updated — no longer references
the per-leaf model; references RedactBatch instead.
- handleOPFFailure TODO updated — block-mode is now rejected at settings
parse time; relaxing that and wiring block-mode propagation must happen
in lockstep.
- temporary.go broken doc-link replaced with a pointer to
security-and-privacy.md (the design spec was never committed).
Tests:
- TestShellOut_ExitError_DoesNotLeakStderr pins the new stderr policy.
- TestShellOut_NonZeroExit asserts the sanitized contract (no passthrough).
- TestLoadFromBytes_RejectsUnknownCategory table-tests typo rejection.
- TestDetectOPF_CircuitBreakerSkipsAfterFirstFailure pins breaker.
- TestStringsWithPrivacyFilter_{BatchesSingleOPFCall,FallsBackOnBatchError}
pin the batched-strings contract.
- TestRedactSummary_PreservesNilVsEmptySliceShape replaces the removed
per-helper tests with a behavior-level assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: cbbcb124ec2e
Adds OPF as an opt-in 8th region producer in the redaction pipeline. Two public entry points (StringWithPrivacyFilter, JSONLContentWithPrivacyFilter) gate the OPF call; the four plain entry points (String, Bytes, JSONLBytes, JSONLContent) are unchanged so per-turn temp writes never invoke OPF. Single inference pass per scope via \x1e-joined batching — opf otherwise runs a fresh inference pass per newline-delimited input, defeating the batch. Process-scoped atomic circuit breaker disables OPF after the first runtime failure so a broken opf install costs one warning instead of N×30s timeouts. Settings layer (redaction.openai_privacy_filter) accepts enabled + categories + command + timeout_seconds. The on_failure field is intentionally absent: warn-only is the only mode the runtime supports today, and DisallowUnknownFields rejects users who try to opt into a fail-closed mode that doesn't exist. Category names are validated against the canonical map at parse time — silent zero-detection of a privacy category is effectively a correctness bug. Shell-out runtime, progress UX, and label mapping are all inlined in redact/opf.go (no separate subpackage, no separate progress writer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits createRedactedBlobFromFile so per-turn temp writes (shared via addDirectoryToChanges) stay on the plain 7-layer pipeline, and committed writes use the full 8-layer pipeline that includes OPF. Without the split, OPF would leak into the agent loop and add the OPF cold-start to every per-turn write. Adds PromptsRedactedContent to Write/UpdateCommittedOptions so the finalize hook + single-checkpoint condense pre-compute the joined-prompt redaction once and pass it through. Without this, each checkpoint within a turn re-runs StringWithPrivacyFilter over identical input (N×OPF on a turn with N checkpoints), and the v1+v2 dual-write doubles that to 2N. The transcript redaction in finalizeAllTurnCheckpoints and condenseSingleCheckpoint moves to JSONLBytesWithPrivacyFilter; the existing redactSessionJSONLBytes test seam gains a context argument so tests can still swap a deterministic stub. Wires settings.OpenAIPrivacyFilter into redact.ConfigurePrivacyFilter from EnsureRedactionConfigured at startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an "Optional OpenAI Privacy Filter (opf)" section to docs/security-and-privacy.md parallel to the existing "Optional PII redaction" section: prerequisites (pip install opf), enable example, per-category replacement-token table, full settings reference, failure behavior (warn + circuit breaker), realistic cost (~25-30s on CPU), and a "Verifying it's working" recipe. Also updates the layer-count summary in the intro to mention the new opt-in eighth pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Leftover from when assert.Contains replaced an earlier strings.Contains call. The test build failed in test:ci with "strings" imported and not used; this change is just removing the dead import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ea9b4f5 to
672044c
Compare
Move GetOPFConfigForTest and ResetOPFConfigForTest from opf.go to redact/export_test.go so the redact package's public API no longer exposes mutators for the global OPF config. Introduce redact.RedactedJoinedPrompts as a typed wrapper around the pre-redacted joined-prompts blob written to checkpoint prompt.txt. Construct only via redact.JoinedPrompts (runs the full 8-layer pipeline on the joined input) or AlreadyRedactedJoinedPrompts (trusted-source escape hatch). Rename WriteCommittedOptions.PromptsRedactedContent (string) and the matching field on UpdateCommittedOptions to PromptsRedacted (RedactedJoinedPrompts) so the "this content was produced by the redaction pipeline" claim becomes a compile-time invariant: callers cannot assign an arbitrary string. The raw Prompts []string field gets a docstring warning that it must be consumed only via redactJoinedPrompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three tests in manifest_test.go hardcoded `started := time.Date(2026, 5, 8, ...)` as the session.State.StartedAt anchor. session.StateStore.Load auto-deletes sessions whose StartedAt is older than StaleSessionThreshold (7 days) and returns nil, so the hardcoded date silently rots: tests pass while the calendar is inside the 7-day window, then fail forever once it crosses. CI on PR #964 caught this — same SHA passed yesterday (6 days after the hardcoded date) and failed today (7 days after). Unrelated to the streaming/diagnostic work in this PR; the manifest_test.go file isn't touched by any other commit on this branch. Switch all three tests to `time.Now().UTC().Add(-time.Hour)` so the session is always one hour old at test time. Still exercises the 5-second jitter check inside matchReviewSessionState; stays well inside the staleness window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Opt-in 8th redaction layer that shells out to the user-installed OpenAI Privacy Filter (
opf) binary. Runs only at condensation + export boundaries; per-turn writes stay on the existing 7-layer pipeline. Default-off; users opt in viaredaction.openai_privacy_filter.enabled = true.This PR was rewritten from
mainon 2026-05-14. The first iteration accumulated review-fix scope across multiple bot review passes and reached ~3000 insertions. The rewrite keeps every real correctness/privacy fix found during that cycle while skipping speculative scope (~1547 insertions, 4 commits, 19 files). The original work is preserved atfeat/openai-privacy-filter-v1-backupfor reference.What's included
StringWithPrivacyFilter,JSONLContentWithPrivacyFilter) plus thin Bytes wrappers — four plain entry points unchanged so per-turn temp writes never invoke OPF\x1e-joined batched inference: one shell-out per scope amortizes the OPF cold-start across the whole transcriptcreateRedactedBlobFromFileso per-turn temp writes (addDirectoryToChanges) stay on the plain pipeline whilecopyMetadataDiruses the OPF-enabled variantPromptsRedactedContentfield on Write/UpdateCommittedOptions: pre-compute the joined-prompt redaction once infinalizeAllTurnCheckpointsandcondenseSingleCheckpointso multi-checkpoint commits don't re-run OPF N×charToByteOffsetcorrect for multibyte UTF-8 (no off-by-one at end)Spandocs reflect byte offsets (shellout adapter converts from OPF's character offsets)redact/opf.go— noredact/opf_runtime/subpackage, no separate progress writerWhat's intentionally NOT included (deferred to follow-ups)
StringsWithPrivacyFilterAPI — summaries are opt-in (only whenIsSummarizeEnabled), have 5–10 short fields, and add ~10s worst case. Acceptable until someone complains.doctor_bundleper-entry batching — diagnostic command, runs rarely, worst case slow not broken.on_failuresettings field — dropped entirely.DisallowUnknownFieldsrejects any user who tries to set it. Warn is the only supported mode today; if block-mode runtime wiring lands later, the field comes back in lockstep.Test plan
mise run check(fmt + lint + test:ci + Vogon canary + roger-roger external-agent canary) — all green on672044c2fprivate_personenabled — confirmed:OpenAI Privacy Filter: scanning transcriptduring the agent turn (per-turn writes don't invoke OPF)→ scanning transcript… ✓ done (29.4s)then(2.3s)at commit time (transcript + joined-prompts passes)[REDACTED_PERSON]appears inentire checkpoint explain HEADfor both the prompt and the assistant transcript🤖 Generated with Claude Code