fix: add standalone-primary pickle migration mode#902
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Performance BenchmarksCompared
|
1221963 to
f26b178
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f26b1781aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
aa60a2d to
8ac54be
Compare
There was a problem hiding this comment.
💡 Codex Review
modelaudit/modelaudit/scanners/pickle_scanner.py
Lines 4214 to 4219 in 8ac54be
Guard the legacy-exception path behind use_standalone_pickle_primary. In _scan_pickle_stream_with_package_engine, any non-parse exception from legacy fallback always records pickle_scan_runtime_failed and returns success=False before the standalone-primary branch runs. This lets a clean standalone scan fail solely due to legacy compatibility code, breaking standalone-primary ownership semantics.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8ac54be to
bb028cd
Compare
There was a problem hiding this comment.
💡 Codex Review
modelaudit/modelaudit/scanners/pickle_scanner.py
Lines 4214 to 4219 in bb028cd
In _scan_pickle_stream_with_package_engine, any non-parse exception from the legacy fallback unconditionally records an operational error on package_result and returns failure. That makes use_standalone_pickle_primary=True still fail because of legacy fallback runtime errors, contradicting standalone-primary ownership semantics and potentially forcing exit code 2 on otherwise complete package scans. Fresh evidence: this branch executes before _merge_missing_pickle_checks(..., propagate_fallback_state=False).
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
* fix: narrow legacy pickle stdlib policy Remove broad CRITICAL treatment for noisy stdlib modules in the root pickle scanner while preserving exact dangerous helper coverage and warning-level suspicious refs. * fix: preserve exact stdlib helper detections Keep logging configuration loaders and uuid subprocess getnode helpers as exact dangerous callables in both root and standalone pickle policies. * fix: flag uuid getnode pickle calls Keep uuid.getnode in the exact dangerous-call policy after narrowing broad uuid module severity, since it can dispatch to platform helper probes that invoke subprocess-backed paths. Cover both root PickleScanner and standalone picklescan policy regressions. * fix: route misnamed compressed wrappers by header (#904) * fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames.
* fix: harden standalone pickle scanner * fix: fail closed on truncated pickle literal scans * fix: fail closed on oversized encoded pickle literals * fix: deep-freeze standalone pickle reports * fix: mark incomplete nested pickle scans inconclusive * fix: narrow standalone pickle wildcard globals * fix: preserve deep nested pickle findings * fix: bound nested pickle recursion resources * fix: add standalone-primary pickle migration mode (#902) * fix: add standalone-primary pickle migration mode * fix: isolate standalone pickle primary fallback state * fix: narrow legacy pickle stdlib policy (#903) * fix: narrow legacy pickle stdlib policy Remove broad CRITICAL treatment for noisy stdlib modules in the root pickle scanner while preserving exact dangerous helper coverage and warning-level suspicious refs. * fix: preserve exact stdlib helper detections Keep logging configuration loaders and uuid subprocess getnode helpers as exact dangerous callables in both root and standalone pickle policies. * fix: flag uuid getnode pickle calls Keep uuid.getnode in the exact dangerous-call policy after narrowing broad uuid module severity, since it can dispatch to platform helper probes that invoke subprocess-backed paths. Cover both root PickleScanner and standalone picklescan policy regressions. * fix: route misnamed compressed wrappers by header (#904) * fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames. * docs: normalize unreleased changelog * fix: address pickle scanner review feedback * fix: surface nested pickle incomplete notices
Summary
This PR continues the picklescan audit by tightening the root
PickleScannermigration boundary.The split-package docs said the root scanner should treat the standalone
modelaudit-picklescanengine as primary and merge legacy-only compatibility checks into it. A direct default flip is not safe yet: the focused scanner suite produced 58 compatibility failures because callers and tests still depend on legacy-primary result shape. This change makes that future path explicit and testable without breaking current behavior.Changes:
use_standalone_pickle_primary=Truescanner config to exercise the standalone-primary merge path.pickle_primary_enginemetadata so results show whether the root scanner returned the legacy or standalone-primary result.scripts/compare_pickle_scanners.pywith--include-rootand--root-standalone-primaryto compare the actual root scanner alongside the pure legacy/package/adapter baselines.Validation
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/uv run pytest tests/scanners/test_pickle_scanner.py tests/scanners/test_picklescan_adapter.py tests/scripts/test_compare_pickle_scanners.py -q --tb=shortuv run pytest -n auto -m "not slow and not integration" --maxfail=1env PYTHONPATH=packages/modelaudit-picklescan/src uv run python scripts/compare_pickle_scanners.py --root-standalone-primaryNotes
This is intentionally stacked on top of #901 /
mdangelo/codex/picklescan-audit-hardening; it should be reviewed as the migration-boundary layer after the standalone picklescan hardening work.The remaining root drift is rule-shape/accounting drift, not verdict or exit-status drift. The next audit should close those remaining standalone-primary rule drifts so the default can eventually flip safely.