fix: mark corrupt NumPy object payloads inconclusive#912
fix: mark corrupt NumPy object payloads inconclusive#912mldangelo-oai wants to merge 2 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe changes modify the NumPy scanner to handle trailing bytes after embedded pickle payloads in object-dtype arrays. A new helper method identifies pickle diagnostics that become irrelevant when trailing bytes are detected. The scan logic now filters these superseded diagnostics and marks results as inconclusive rather than escalating to security findings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Performance BenchmarksCompared Top improvements:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/scanners/test_numpy_scanner.py (1)
348-359: 🧹 Nitpick | 🔵 TrivialAssert the superseded parse-noise is gone, not just the new outcome.
This regression still passes if the embedded
parse_incomplete/ “stream was fully consumed” notices leak back alongside the integrity check. Add a negative assertion for those diagnostics so_is_trailing_pickle_parse_noise()is actually covered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scanners/test_numpy_scanner.py` around lines 348 - 359, Add a negative assertion that the parse-noise diagnostics produced by _is_trailing_pickle_parse_noise() are not present alongside the trailing-bytes integrity failure: in the test (around the existing asserts using result, result.checks, and result.metadata) assert that no check or metadata reason contains the parse-noise indicators (e.g., no check.message or check.name mentions "stream was fully consumed" or "parse_incomplete", and "parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]). This ensures the parse-noise is fully suppressed rather than merely overshadowed by the new integrity outcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/numpy_scanner.py`:
- Around line 369-375: Replace the open-coded inconclusive branch (where you set
result.metadata keys, call result.finish(False) and return) with a call to the
shared inconclusive finalizer used by modelaudit/scanners/pickle_scanner.py (the
BaseScanner-level helper); invoke that helper with the current result and the
reason "numpy_object_pickle_trailing_bytes" so the centralized routine sets
metadata, finishes the result and handles any CLI/cache bookkeeping
consistently, then exit after calling it.
In `@tests/scanners/test_numpy_scanner.py`:
- Around line 362-371: The test currently only asserts exit code 2 and absence
of CRITICAL issues; update it to also assert the aggregated scan result remains
unsuccessful and uncachable so the inconclusive path is explicit: after calling
scan_model_directory_or_file(str(path)) assert result.success is False (or the
appropriate failure flag on the returned result) and assert result.cacheable is
False (or the equivalent cache/core-path indicator) in addition to the existing
assertions; then add a second variant using the same trailing-bytes technique
but with a malicious payload so that scanning yields at least one
IssueSeverity.CRITICAL and assert determine_exit_code(result) != 2 and that a
CRITICAL issue exists to ensure the unconditional scan_outcome=INCONCLUSIVE
branch cannot downgrade real findings.
---
Outside diff comments:
In `@tests/scanners/test_numpy_scanner.py`:
- Around line 348-359: Add a negative assertion that the parse-noise diagnostics
produced by _is_trailing_pickle_parse_noise() are not present alongside the
trailing-bytes integrity failure: in the test (around the existing asserts using
result, result.checks, and result.metadata) assert that no check or metadata
reason contains the parse-noise indicators (e.g., no check.message or check.name
mentions "stream was fully consumed" or "parse_incomplete", and
"parse_incomplete" is not present in result.metadata["scan_outcome_reasons"]).
This ensures the parse-noise is fully suppressed rather than merely overshadowed
by the new integrity outcome.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2a5480d3-cd1c-428c-95e1-f965627cb412
📒 Files selected for processing (3)
CHANGELOG.mdmodelaudit/scanners/numpy_scanner.pytests/scanners/test_numpy_scanner.py
Summary
Validation
Summary by CodeRabbit