Skip to content

fix: mark malformed Keras H5 configs inconclusive#917

Open
mldangelo-oai wants to merge 2 commits intomainfrom
mdangelo/codex/keras-h5-boundary-audit
Open

fix: mark malformed Keras H5 configs inconclusive#917
mldangelo-oai wants to merge 2 commits intomainfrom
mdangelo/codex/keras-h5-boundary-audit

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

@mldangelo-oai mldangelo-oai commented Apr 10, 2026

Summary

  • mark malformed or structurally invalid Keras H5 model/training config metadata as scan_outcome=inconclusive
  • preserve security precedence so real warning/critical findings still produce security exits
  • add direct, aggregate exit-code, cache-rerun, null JSON, and security-precedence regression coverage

Validation

  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run pytest tests/scanners/test_keras_h5_scanner.py
  • uv run pytest -n auto -m "not slow and not integration" --maxfail=1
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • git diff --check

Summary by CodeRabbit

  • Bug Fixes

    • Malformed Keras H5 configurations are now classified as inconclusive and produce fail-closed scan outcomes, while still preserving security precedence for critical findings.
  • Tests

    • Added comprehensive tests covering malformed Keras configs, inconclusive handling, exit-code behavior, cache persistence, and related test helpers.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Walkthrough

Keras H5 scanner now treats malformed or unparseable model_config/training_config as inconclusive rather than clean or security-complete. Parsing failures and structural validation issues add inconclusive INFO checks, set metadata flags, and influence final success/failure logic; tests updated to exercise these fail-closed outcomes and cache behavior.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Added a bug fix entry stating malformed Keras H5 configurations are classified as inconclusive scans.
Scanner Implementation
modelaudit/scanners/keras_h5_scanner.py
Introduced INCONCLUSIVE_SCAN_OUTCOME usage and _JSON_ATTRIBUTE_PARSE_FAILED sentinel. Replaced direct json.loads with _load_json_attribute(...), added _finish_scan_result(...), _mark_inconclusive_scan_result(...), and _scan_result_has_security_findings(...). Added _MODEL_CONTAINER_CLASSES. Enhanced _scan_model_config to validate config, layers, layer config types, nested-model recursion, and wrapper-owned nested-layer payloads; failures add INFO checks and mark scans inconclusive.
Tests
tests/scanners/test_keras_h5_scanner.py
Added create_raw_config_h5_file(...) to write raw (possibly malformed) H5 attributes; loosened fixture typing to accept non-dict inputs. Added _assert_inconclusive_keras_h5_scan(...), integrated higher-level audit helpers (scan_model_directory_or_file, determine_exit_code), reset cache between runs, and tests covering malformed model/training config parsing, structural validation, security-precedence behavior, and cache persistence of inconclusive results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble at JSON, a curious bite,

malformed bits now shine inconclusive and bright.
Not broken nor safe — just a thoughtful pause,
checks record reasons, respecting the laws.
Hop, hop, the scanner learns with gentle applause 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: marking malformed Keras H5 configs as inconclusive, which is the primary objective of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mdangelo/codex/keras-h5-boundary-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 6 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 6 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 679.36ms -> 676.90ms (-0.4%).

Benchmark Target Size Files Baseline Current Change Status
tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip state_dict.pt 1.5 MiB 1 46.9us 50.0us +6.5% stable
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle safe_model.pkl 49.4 KiB 1 165.8us 172.7us +4.2% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle safe_model.pkl 49.4 KiB 1 28.06ms 27.84ms -0.8% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip state_dict.pt 1.5 MiB 1 33.73ms 33.57ms -0.5% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory duplicate-corpus 840.0 KiB 81 481.11ms 479.18ms -0.4% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory mixed-corpus 1.7 MiB 54 136.25ms 136.09ms -0.1% stable

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) April 10, 2026 20:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/keras_h5_scanner.py`:
- Around line 523-537: The code only validates layer_config type when
layer_class is in self.suspicious_layer_types, which lets malformed wrappers
(e.g., TimeDistributed with config as a string) bypass checks; move or duplicate
the dict-type validation for layer_config so every layer runs the check before
any further inspection: for each layer obtained in the scanner, call
layer_config = layer.get("config", {}), validate isinstance(layer_config, dict)
and on failure call self._mark_inconclusive_scan_result(result,
"keras_h5_layer_config_invalid_type") and result.add_check(...) (same payload
currently used), then set layer_config = {} and continue; keep subsequent logic
that inspects wrapped/inner layers (and uses self.suspicious_layer_types)
unchanged but rely on the earlier universal validation to prevent silent skips
(reference symbols: layer_config, layer.get, self.suspicious_layer_types,
_mark_inconclusive_scan_result, result.add_check, self.current_file_path).

In `@tests/scanners/test_keras_h5_scanner.py`:
- Around line 294-309: The helper _assert_inconclusive_keras_h5_scan currently
only asserts the check name and status; update it to also assert the specific
"fail-closed" message appears where expected (e.g., in result.checks'
message/text field and in metadata["scan_outcome_reasons"] or
audit_result.file_metadata[str(model_path)]), so that the test verifies the
operational fail-closed message alongside success/exit-code/cache behavior;
locate checks via KerasH5Scanner().scan(...) and audit_result from
scan_model_directory_or_file(...) and add assertions that the check with name ==
expected_check_name has the expected fail-closed message and that the same
message is present in the metadata scan_outcome_reasons.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9678ece1-c35d-47d1-a591-8a2b54199d64

📥 Commits

Reviewing files that changed from the base of the PR and between f285a05 and 9efa7eb.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/scanners/keras_h5_scanner.py
  • tests/scanners/test_keras_h5_scanner.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/keras_h5_scanner.py`:
- Around line 734-749: The nested-container branch only runs when "config" in
layer, so layers with a container class (self._MODEL_CONTAINER_CLASSES) but
missing "config" (e.g. {"class_name":"Functional"}) silently pass; update the
logic in the block that handles layer_class in self._MODEL_CONTAINER_CLASSES to
explicitly handle the case where "config" is absent by calling
self._mark_inconclusive_scan_result(result,
"keras_h5_nested_model_layers_missing") and adding the same
result.add_check(...) entry (use the same name "Nested Model Layers Presence
Validation", passed=False, rule_code="S902", severity=IssueSeverity.INFO,
location=self.current_file_path and details={"layer_class": layer_class,
"expected_key":"config.layers"}) so malformed nested containers are marked
inconclusive rather than scanning clean.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 521d20e8-8aeb-4d40-9e8c-06da88e9e6fe

📥 Commits

Reviewing files that changed from the base of the PR and between 9efa7eb and aebbe94.

📒 Files selected for processing (2)
  • modelaudit/scanners/keras_h5_scanner.py
  • tests/scanners/test_keras_h5_scanner.py

Comment on lines +734 to 749
if layer_class in self._MODEL_CONTAINER_CLASSES and "config" in layer:
nested_config = layer.get("config")
if isinstance(nested_config, dict) and "layers" in nested_config:
self._scan_model_config(layer, result)
elif isinstance(nested_config, dict):
self._mark_inconclusive_scan_result(result, "keras_h5_nested_model_layers_missing")
result.add_check(
name="Nested Model Layers Presence Validation",
passed=False,
message=f"Nested model config for {layer_class} is missing required layers list",
rule_code="S902",
severity=IssueSeverity.INFO,
location=self.current_file_path,
details={"layer_class": layer_class, "expected_key": "config.layers"},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Nested container layers without config can still scan clean.

At Line 734, nested container validation only runs when "config" in layer. A malformed nested layer like {"class_name": "Functional"} skips this block and does not get marked inconclusive, which reopens a false-clean path.

🔧 Suggested fix
-            if layer_class in self._MODEL_CONTAINER_CLASSES and "config" in layer:
-                nested_config = layer.get("config")
-                if isinstance(nested_config, dict) and "layers" in nested_config:
-                    self._scan_model_config(layer, result)
-                elif isinstance(nested_config, dict):
+            if layer_class in self._MODEL_CONTAINER_CLASSES:
+                nested_config = layer.get("config")
+                if isinstance(nested_config, dict) and "layers" in nested_config:
+                    self._scan_model_config(layer, result)
+                else:
                     self._mark_inconclusive_scan_result(result, "keras_h5_nested_model_layers_missing")
                     result.add_check(
                         name="Nested Model Layers Presence Validation",
                         passed=False,
                         message=f"Nested model config for {layer_class} is missing required layers list",
                         rule_code="S902",
                         severity=IssueSeverity.INFO,
                         location=self.current_file_path,
                         details={"layer_class": layer_class, "expected_key": "config.layers"},
                     )

Based on learnings: "When a scan intentionally fails closed, make behavior operationally explicit and test the message plus success, exit-code, and cache semantics".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/keras_h5_scanner.py` around lines 734 - 749, The
nested-container branch only runs when "config" in layer, so layers with a
container class (self._MODEL_CONTAINER_CLASSES) but missing "config" (e.g.
{"class_name":"Functional"}) silently pass; update the logic in the block that
handles layer_class in self._MODEL_CONTAINER_CLASSES to explicitly handle the
case where "config" is absent by calling
self._mark_inconclusive_scan_result(result,
"keras_h5_nested_model_layers_missing") and adding the same
result.add_check(...) entry (use the same name "Nested Model Layers Presence
Validation", passed=False, rule_code="S902", severity=IssueSeverity.INFO,
location=self.current_file_path and details={"layer_class": layer_class,
"expected_key":"config.layers"}) so malformed nested containers are marked
inconclusive rather than scanning clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant