Redact client secrets and shell-stdin literal transcripts#1203
Redact client secrets and shell-stdin literal transcripts#1203peyton-alt wants to merge 4 commits into
Conversation
Adds three structural redaction rules to close transcript leak gaps: - CLIENT_SECRET / API_KEY / TOKEN env-shape values, in both raw text and JSONL structured fields (e.g. GITHUB_CLIENT_SECRET=...). - Shell-stdin literals piped into secret-management commands such as `printf 'value' | examplectl secret put EXAMPLE_API_KEY`, which are often human-ish and below the entropy threshold. - Identifier env-var shapes (`*_id`, `*_account`) are preserved when the value is not independently high-entropy, preventing benign IDs like `GOOGLE_ADSENSE_ACCOUNT=pub-...` from being collapsed to `REDACTED` by the entropy detector. Shell variable references like `\$EXAMPLE_API_KEY` continue to be treated as placeholders rather than secret values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7daa370e8203
There was a problem hiding this comment.
Pull request overview
This PR strengthens transcript redaction in the redact package to close three structural gaps: env-shape secret keys (*_CLIENT_SECRET, *_API_KEY, *_TOKEN) in raw text and JSONL fields, shell-stdin literals piped into secret-management CLIs (e.g. printf 'value' | examplectl secret put ...), and an over-redaction guard so *_id/*_account identifiers with non-high-entropy values aren't collapsed by the entropy detector. $VAR references continue to be treated as placeholders.
Changes:
- Add
secretValueKeyShape/secretValuePattern/secretJSONKeyRegexand ashellStdinSecretPatternregex; wire them intoString/detectCredentialValues/detectShellStdinSecretsand into the JSON key sensitivity check. - Add
isNonSecretIdentifierAssignment+isHighEntropySecretCandidateto preserve*_id/*_accountenv assignments with non-entropic values, andisShellVariableReferenceto treat$VARas a placeholder. - Extend unit tests covering
GITHUB_CLIENT_SECRET,GOOGLE_ADSENSE_ACCOUNT, structured JSONL credential fields, and shell-stdinprintf | secret putredaction plus over-redaction guards.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| redact/redact.go | New regexes and helpers to detect env-shape secrets and shell-stdin literals, plus an identifier-assignment guard against entropy over-redaction. |
| redact/redact_test.go | New tests for client-secret env vars, identifier preservation, structured JSONL credential fields, and shell-stdin `printf |
|
@BugBot review |
Per the existing pattern documented on dbPasswordKeyShape — the same key list should compose both the env-var regex and the JSON-key regex so the vendor list stays in one place. The hand-written duplicate in secretJSONKeyRegex would have silently diverged if a new key type were added to one side but not the other. Behavior is unchanged: the regex runs on normalized keys, where [_-]+ and _+ match identically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 20953e233268
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 4018d6a. Configure here.
Following review feedback on this PR, four scoped changes narrow the
new patterns introduced earlier so they stop flat-redacting prose,
debug identifiers, and account fields:
- secretValuePattern now uses `=` only (not `[:=]`). The JSON-aware
path handles `key: value` shapes via secretJSONKeyRegex, so dropping
`:` from the raw-text regex prevents prose like "Got the token: foo"
from over-redacting the next word.
- secretValueKeyShape splits into two alternation branches so bare
`secret`/`token` keywords require at least one prefix segment (e.g.
csrf_token, auth_secret). Standalone `{"token":"x"}` /
`{"secret":"x"}` JSON fields are no longer flat-redacted; entropy
and Betterleaks still catch real high-entropy values.
- New nonSecretTokenPrefixes allowlist (cancel, continuation, cursor,
idempotency, next, page, pagination, prev, previous, sync) is
checked in isCredentialJSONSecretKey so debug/pagination tokens like
`{"cancel_token":"abc-debug"}` and `{"pagination_token":"..."}` are
preserved in transcripts. Real credential-shaped values still get
caught by the per-value String() call.
- shouldSkipJSONLField now also skips keys ending in `account`, so
identifier fields like `{"google_adsense_account":"pub-..."}` are
preserved by design rather than by entropy coincidence.
Regression tests cover the three previously-leaking prose cases, the
seven JSON token-key cases, and a high-entropy account value to prove
the protection is structural rather than entropy-dependent. The PR's
original coverage of `printf 'value' | examplectl secret put KEY` and
structured credential field redaction is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 02c170aecfae
Three tests in manifest_test.go hardcoded `started := time.Date(2026, 5, 8, ...)` as the session.State.StartedAt anchor. session.StateStore.Load auto-deletes sessions whose StartedAt is older than StaleSessionThreshold (7 days) and returns nil, so the hardcoded date silently rots: tests pass while the calendar is inside the 7-day window, then fail forever once it crosses. CI on PR #964 caught this — same SHA passed yesterday (6 days after the hardcoded date) and failed today (7 days after). Unrelated to the streaming/diagnostic work in this PR; the manifest_test.go file isn't touched by any other commit on this branch. Switch all three tests to `time.Now().UTC().Add(-time.Hour)` so the session is always one hour old at test time. Still exercises the 5-second jitter check inside matchReviewSessionState; stays well inside the staleness window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
https://entire.io/gh/entireio/cli/trails/372
Summary
Closes structural redaction gaps in transcript content:
*_CLIENT_SECRET,*_API_KEY,*_TOKENenv-shape redaction in both raw text and JSONL structured fields (e.g.GITHUB_CLIENT_SECRET=…was previously leaking).printf 'value' | examplectl secret put EXAMPLE_API_KEYis now redacted. These secrets are often human-ish (correct-horse-client) and below the entropy threshold, so entropy + Betterleaks couldn't catch them — the structural shape is the signal.*_id/*_accountkeys with non-high-entropy values are preserved (soGOOGLE_ADSENSE_ACCOUNT=pub-1234567890123456no longer collapses toREDACTEDby the entropy detector). Values that independently look secret-like still redact.\$VARreferences in secret-put commands continue to be treated as placeholders, not values.The regex tier stays the deterministic, structural layer — important for layered composition with future ML-based redaction (e.g. OpenAI Privacy Filter), which is strong on free-text PII but lists "missed secrets for novel credential formats" and "over-redaction of benign high-entropy strings" as known failure modes.
Test plan
go test ./redact -count=1— full package, all greenmise run fmt && mise run lint— 0 issuesmise run test:ci— unit + integration + Vogon canary (58/58) + roger-roger canary (4/4), exit 0github_client_secret_env_var,google_adsense_account_is_preserved_as_identifier,TestJSONLContent_StructuredCredentialFieldsRedacted(extended withGITHUB_CLIENT_SECRET),TestJSONLContent_ShellStdinSecretCommandRedactsPrintfLiteral,TestJSONLContent_ShellStdinSecretCommandOverRedactionGuards🤖 Generated with Claude Code
Note
Medium Risk
Expands and reorders redaction heuristics with new regexes and guard logic, which could introduce false positives/negatives in transcript redaction. Changes are localized to redaction behavior but affect all logged/serialized content.
Overview
Extends deterministic redaction beyond DB-password keys to also redact secret-shaped keys (e.g.
*_CLIENT_SECRET,*_API_KEY,*_TOKEN) in both raw textkey=value/key:valueforms and structured JSON fields.Adds a new detector for shell transcripts that pipe a literal into secret-management commands (e.g.
printf '...' | ... secrets put ...), redacting only the stdin literal while preserving command context and treating$VARreferences as placeholders.Tightens entropy-based redaction by skipping low-entropy identifier assignments like
*_id/*_accountunless the value itself looks like a high-entropy secret. Updates/extends tests to cover the new secret shapes, shell-stdin redaction, and the new over-redaction guard.Reviewed by Cursor Bugbot for commit 4018d6a. Configure here.