fix: mark corrupt NumPy object payloads inconclusive by mldangelo-oai · Pull Request #912 · promptfoo/modelaudit

mldangelo-oai · 2026-04-10T18:26:14Z

Summary

filter embedded pickle parse-incomplete noise when NumPy object-array payloads include trailing bytes
mark corrupt non-malicious object payloads as inconclusive with scan_outcome metadata so core/CLI returns exit 2
preserve real embedded pickle security findings and add direct/core regression coverage

Validation

uv run pytest tests/scanners/test_numpy_scanner.py -k "trailing_bytes or object_dtype_triggers_cve or malicious_exit1"
uv run pytest tests/scanners/test_numpy_scanner.py
uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run pytest -n auto -m "not slow and not integration" --maxfail=1
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
git diff --check

Summary by CodeRabbit

Bug Fixes
- Improved NumPy file analysis to classify trailing bytes after pickle payloads as inconclusive findings rather than security issues, reducing false positives.

coderabbitai · 2026-04-10T18:26:29Z

Warning

Rate limit exceeded

@mldangelo-oai has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 51 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 51 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ffbb2ab6-235f-4cdf-8619-e867d1b74ebf

📥 Commits

Reviewing files that changed from the base of the PR and between be1e0bf and d6e82a7.

📒 Files selected for processing (2)

modelaudit/scanners/numpy_scanner.py
tests/scanners/test_numpy_scanner.py

Walkthrough

The changes modify the NumPy scanner to handle trailing bytes after embedded pickle payloads in object-dtype arrays. A new helper method identifies pickle diagnostics that become irrelevant when trailing bytes are detected. The scan logic now filters these superseded diagnostics and marks results as inconclusive rather than escalating to security findings.

Changes

Cohort / File(s)	Summary
Documentation `CHANGELOG.md`	Added bug fix entry documenting that trailing bytes after NumPy object-array pickle payloads are now treated as inconclusive rather than escalating to security findings.
Scanner Logic `modelaudit/scanners/numpy_scanner.py`	Added `_is_trailing_pickle_parse_noise()` static helper to identify superseded pickle diagnostics. Modified `scan()` to detect trailing bytes, filter affected issues/checks, append integrity check failure, and set metadata fields (`analysis_incomplete`, `scan_outcome`, `scan_outcome_reasons`) before returning early.
Test Coverage `tests/scanners/test_numpy_scanner.py`	Updated imports and strengthened assertions in `test_object_dtype_numpy_trailing_bytes_fail_integrity()` to verify error absence, inconclusive scan outcome, and absence of critical issues. Added `test_object_dtype_numpy_trailing_bytes_exit2_not_security_finding()` to validate exit code 2 with no critical findings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop, trailing bytes no more—
What once was critical, now inconclusive lore,
The pickle stream ends, but noise remains,
We filter the static, let clarity reign,
Integrity intact, no false alarms soar! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: mark corrupt NumPy object payloads inconclusive' directly and clearly summarizes the main change: handling corrupt NumPy object payloads by marking them as inconclusive instead of escalating to security findings.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mdangelo/codex/numpy-pickle-boundary-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-10T18:26:52Z

Workflow run and artifacts

Performance Benchmarks

Compared 6 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 2 improved, 4 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 757.94ms -> 686.68ms (-9.4%).

Top improvements:

tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle -70.0% (94.63ms -> 28.41ms, safe_model.pkl, size=49.4 KiB, files=1)
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle -36.6% (198.5us -> 125.9us, safe_model.pkl, size=49.4 KiB, files=1)

Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	94.63ms	28.41ms	-70.0%	improved
`tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	198.5us	125.9us	-36.6%	improved
`tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	43.2us	42.1us	-2.5%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	35.50ms	35.17ms	-0.9%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory`	`mixed-corpus`	1.7 MiB	54	137.61ms	136.55ms	-0.8%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory`	`duplicate-corpus`	840.0 KiB	81	489.96ms	486.39ms	-0.7%	stable

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/scanners/test_numpy_scanner.py (1)
348-359: 🧹 Nitpick | 🔵 Trivial

Assert the superseded parse-noise is gone, not just the new outcome.

This regression still passes if the embedded parse_incomplete / “stream was fully consumed” notices leak back alongside the integrity check. Add a negative assertion for those diagnostics so _is_trailing_pickle_parse_noise() is actually covered.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_numpy_scanner.py` around lines 348 - 359, Add a negative
assertion that the parse-noise diagnostics produced by
_is_trailing_pickle_parse_noise() are not present alongside the trailing-bytes
integrity failure: in the test (around the existing asserts using result,
result.checks, and result.metadata) assert that no check or metadata reason
contains the parse-noise indicators (e.g., no check.message or check.name
mentions "stream was fully consumed" or "parse_incomplete", and
"parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]).
This ensures the parse-noise is fully suppressed rather than merely overshadowed
by the new integrity outcome.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/numpy_scanner.py`:
- Around line 369-375: Replace the open-coded inconclusive branch (where you set
result.metadata keys, call result.finish(False) and return) with a call to the
shared inconclusive finalizer used by modelaudit/scanners/pickle_scanner.py (the
BaseScanner-level helper); invoke that helper with the current result and the
reason "numpy_object_pickle_trailing_bytes" so the centralized routine sets
metadata, finishes the result and handles any CLI/cache bookkeeping
consistently, then exit after calling it.

In `@tests/scanners/test_numpy_scanner.py`:
- Around line 362-371: The test currently only asserts exit code 2 and absence
of CRITICAL issues; update it to also assert the aggregated scan result remains
unsuccessful and uncachable so the inconclusive path is explicit: after calling
scan_model_directory_or_file(str(path)) assert result.success is False (or the
appropriate failure flag on the returned result) and assert result.cacheable is
False (or the equivalent cache/core-path indicator) in addition to the existing
assertions; then add a second variant using the same trailing-bytes technique
but with a malicious payload so that scanning yields at least one
IssueSeverity.CRITICAL and assert determine_exit_code(result) != 2 and that a
CRITICAL issue exists to ensure the unconditional scan_outcome=INCONCLUSIVE
branch cannot downgrade real findings.

---

Outside diff comments:
In `@tests/scanners/test_numpy_scanner.py`:
- Around line 348-359: Add a negative assertion that the parse-noise diagnostics
produced by _is_trailing_pickle_parse_noise() are not present alongside the
trailing-bytes integrity failure: in the test (around the existing asserts using
result, result.checks, and result.metadata) assert that no check or metadata
reason contains the parse-noise indicators (e.g., no check.message or check.name
mentions "stream was fully consumed" or "parse_incomplete", and
"parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]).
This ensures the parse-noise is fully suppressed rather than merely overshadowed
by the new integrity outcome.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a5480d3-cd1c-428c-95e1-f965627cb412

📥 Commits

Reviewing files that changed from the base of the PR and between f285a05 and be1e0bf.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/scanners/numpy_scanner.py
tests/scanners/test_numpy_scanner.py

modelaudit/scanners/numpy_scanner.py

tests/scanners/test_numpy_scanner.py

fix: mark corrupt numpy object payloads inconclusive

be1e0bf

mldangelo-oai enabled auto-merge (squash) April 10, 2026 18:26

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

modelaudit/scanners/numpy_scanner.py Outdated Show resolved Hide resolved

tests/scanners/test_numpy_scanner.py Show resolved Hide resolved

fix: reuse inconclusive contract for numpy tails

d6e82a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mark corrupt NumPy object payloads inconclusive#912

fix: mark corrupt NumPy object payloads inconclusive#912
mldangelo-oai wants to merge 2 commits intomainfrom
mdangelo/codex/numpy-pickle-boundary-audit

mldangelo-oai commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mldangelo-oai commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading