Skip to content

fix: mark corrupt NumPy object payloads inconclusive#912

Open
mldangelo-oai wants to merge 2 commits intomainfrom
mdangelo/codex/numpy-pickle-boundary-audit
Open

fix: mark corrupt NumPy object payloads inconclusive#912
mldangelo-oai wants to merge 2 commits intomainfrom
mdangelo/codex/numpy-pickle-boundary-audit

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

@mldangelo-oai mldangelo-oai commented Apr 10, 2026

Summary

  • filter embedded pickle parse-incomplete noise when NumPy object-array payloads include trailing bytes
  • mark corrupt non-malicious object payloads as inconclusive with scan_outcome metadata so core/CLI returns exit 2
  • preserve real embedded pickle security findings and add direct/core regression coverage

Validation

  • uv run pytest tests/scanners/test_numpy_scanner.py -k "trailing_bytes or object_dtype_triggers_cve or malicious_exit1"
  • uv run pytest tests/scanners/test_numpy_scanner.py
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run pytest -n auto -m "not slow and not integration" --maxfail=1
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • git diff --check

Summary by CodeRabbit

  • Bug Fixes
    • Improved NumPy file analysis to classify trailing bytes after pickle payloads as inconclusive findings rather than security issues, reducing false positives.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) April 10, 2026 18:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Warning

Rate limit exceeded

@mldangelo-oai has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 51 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 51 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ffbb2ab6-235f-4cdf-8619-e867d1b74ebf

📥 Commits

Reviewing files that changed from the base of the PR and between be1e0bf and d6e82a7.

📒 Files selected for processing (2)
  • modelaudit/scanners/numpy_scanner.py
  • tests/scanners/test_numpy_scanner.py

Walkthrough

The changes modify the NumPy scanner to handle trailing bytes after embedded pickle payloads in object-dtype arrays. A new helper method identifies pickle diagnostics that become irrelevant when trailing bytes are detected. The scan logic now filters these superseded diagnostics and marks results as inconclusive rather than escalating to security findings.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md
Added bug fix entry documenting that trailing bytes after NumPy object-array pickle payloads are now treated as inconclusive rather than escalating to security findings.
Scanner Logic
modelaudit/scanners/numpy_scanner.py
Added _is_trailing_pickle_parse_noise() static helper to identify superseded pickle diagnostics. Modified scan() to detect trailing bytes, filter affected issues/checks, append integrity check failure, and set metadata fields (analysis_incomplete, scan_outcome, scan_outcome_reasons) before returning early.
Test Coverage
tests/scanners/test_numpy_scanner.py
Updated imports and strengthened assertions in test_object_dtype_numpy_trailing_bytes_fail_integrity() to verify error absence, inconclusive scan outcome, and absence of critical issues. Added test_object_dtype_numpy_trailing_bytes_exit2_not_security_finding() to validate exit code 2 with no critical findings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop, trailing bytes no more—
What once was critical, now inconclusive lore,
The pickle stream ends, but noise remains,
We filter the static, let clarity reign,
Integrity intact, no false alarms soar! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: mark corrupt NumPy object payloads inconclusive' directly and clearly summarizes the main change: handling corrupt NumPy object payloads by marking them as inconclusive instead of escalating to security findings.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mdangelo/codex/numpy-pickle-boundary-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 6 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 2 improved, 4 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 757.94ms -> 686.68ms (-9.4%).

Top improvements:

  • tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle -70.0% (94.63ms -> 28.41ms, safe_model.pkl, size=49.4 KiB, files=1)
  • tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle -36.6% (198.5us -> 125.9us, safe_model.pkl, size=49.4 KiB, files=1)
Benchmark Target Size Files Baseline Current Change Status
tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle safe_model.pkl 49.4 KiB 1 94.63ms 28.41ms -70.0% improved
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle safe_model.pkl 49.4 KiB 1 198.5us 125.9us -36.6% improved
tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip state_dict.pt 1.5 MiB 1 43.2us 42.1us -2.5% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip state_dict.pt 1.5 MiB 1 35.50ms 35.17ms -0.9% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory mixed-corpus 1.7 MiB 54 137.61ms 136.55ms -0.8% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory duplicate-corpus 840.0 KiB 81 489.96ms 486.39ms -0.7% stable

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/scanners/test_numpy_scanner.py (1)

348-359: 🧹 Nitpick | 🔵 Trivial

Assert the superseded parse-noise is gone, not just the new outcome.

This regression still passes if the embedded parse_incomplete / “stream was fully consumed” notices leak back alongside the integrity check. Add a negative assertion for those diagnostics so _is_trailing_pickle_parse_noise() is actually covered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_numpy_scanner.py` around lines 348 - 359, Add a negative
assertion that the parse-noise diagnostics produced by
_is_trailing_pickle_parse_noise() are not present alongside the trailing-bytes
integrity failure: in the test (around the existing asserts using result,
result.checks, and result.metadata) assert that no check or metadata reason
contains the parse-noise indicators (e.g., no check.message or check.name
mentions "stream was fully consumed" or "parse_incomplete", and
"parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]).
This ensures the parse-noise is fully suppressed rather than merely overshadowed
by the new integrity outcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/numpy_scanner.py`:
- Around line 369-375: Replace the open-coded inconclusive branch (where you set
result.metadata keys, call result.finish(False) and return) with a call to the
shared inconclusive finalizer used by modelaudit/scanners/pickle_scanner.py (the
BaseScanner-level helper); invoke that helper with the current result and the
reason "numpy_object_pickle_trailing_bytes" so the centralized routine sets
metadata, finishes the result and handles any CLI/cache bookkeeping
consistently, then exit after calling it.

In `@tests/scanners/test_numpy_scanner.py`:
- Around line 362-371: The test currently only asserts exit code 2 and absence
of CRITICAL issues; update it to also assert the aggregated scan result remains
unsuccessful and uncachable so the inconclusive path is explicit: after calling
scan_model_directory_or_file(str(path)) assert result.success is False (or the
appropriate failure flag on the returned result) and assert result.cacheable is
False (or the equivalent cache/core-path indicator) in addition to the existing
assertions; then add a second variant using the same trailing-bytes technique
but with a malicious payload so that scanning yields at least one
IssueSeverity.CRITICAL and assert determine_exit_code(result) != 2 and that a
CRITICAL issue exists to ensure the unconditional scan_outcome=INCONCLUSIVE
branch cannot downgrade real findings.

---

Outside diff comments:
In `@tests/scanners/test_numpy_scanner.py`:
- Around line 348-359: Add a negative assertion that the parse-noise diagnostics
produced by _is_trailing_pickle_parse_noise() are not present alongside the
trailing-bytes integrity failure: in the test (around the existing asserts using
result, result.checks, and result.metadata) assert that no check or metadata
reason contains the parse-noise indicators (e.g., no check.message or check.name
mentions "stream was fully consumed" or "parse_incomplete", and
"parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]).
This ensures the parse-noise is fully suppressed rather than merely overshadowed
by the new integrity outcome.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a5480d3-cd1c-428c-95e1-f965627cb412

📥 Commits

Reviewing files that changed from the base of the PR and between f285a05 and be1e0bf.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/scanners/numpy_scanner.py
  • tests/scanners/test_numpy_scanner.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant