docs: add repo correctness audit ledger by mldangelo-oai · Pull Request #921 · promptfoo/modelaudit

mldangelo-oai · 2026-04-10T20:39:35Z

Summary

add a repo-wide correctness audit ledger with explicit proof obligations
inventory every scanner and cross-cutting layer with current evidence levels
record current boundary-hardening findings and the next high-risk audit backlog

Validation

uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run pytest -n auto -m "not slow and not integration" --maxfail=1
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
git diff --check

Summary by CodeRabbit

Documentation
- Added comprehensive audit documentation outlining quality standards and verification processes for the codebase.

coderabbitai · 2026-04-10T20:39:44Z

Walkthrough

A new documentation file establishes a repo-wide correctness audit ledger defining proof obligations across routing, parsing, security, and resource usage. It specifies evidence levels (E0–E4), audit scope coverage, scanner inventory tracking, and an iterative audit workflow with a current findings table and high-risk backlog.

Changes

Cohort / File(s)	Summary
Correctness Audit Ledger `docs/agents/repo-correctness-audit.md`	New documentation defining correctness standards with proof obligations, evidence levels, audit scope coverage, scanner inventory, workflow procedures, PR ledger with findings, high-risk items, and notes log.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Hop along, dear code, with standards so clear,
A correctness ledger now holds repo dear,
With proof obligations mapped out with care,
And audit workflows floating through the air,
We'll chase those bugs and fix them all—cheer!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: add repo correctness audit ledger' directly and accurately summarizes the main change: adding a new documentation file that establishes a repo-wide correctness audit ledger with explicit proof obligations and audit tracking.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mdangelo/codex/repo-correctness-audit-ledger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/agents/repo-correctness-audit.md`:
- Around line 91-137: The docs table in docs/agents/repo-correctness-audit.md
will drift from the canonical scanner registry; update the workflow to derive
this table from the source scanner_registry_metadata.py instead of manual edits:
add a script (e.g., generate_scanner_inventory_doc) that reads SCANNER_REGISTRY
(or the module-level registry/metadata in scanner_registry_metadata.py), emits
the markdown table and a generated-at timestamp + scanner count, and wire that
script into CI (or commit its output) so docs are regenerated automatically;
update the README/table header to note it is autogenerated from
scanner_registry_metadata.py.
- Around line 154-156: Run Prettier to fix the markdown lint errors (MD013
line-length violations and MD018) in this document: execute the recommended
command to install dev deps and reformat the file (npm ci --ignore-scripts &&
npx prettier --write docs/agents/repo-correctness-audit.md), review the
resulting changes around the “Earlier open PRs from the same boundary-hardening
campaign include `#901` and `#907` through `#916`” paragraph to confirm MD018 is
resolved, and commit the formatted file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a228a09-7e3f-4bdb-ad26-660bd07834e2

📥 Commits

Reviewing files that changed from the base of the PR and between f285a05 and 1591c1e.

📒 Files selected for processing (1)

docs/agents/repo-correctness-audit.md

coderabbitai · 2026-04-11T04:58:40Z

docs/agents/repo-correctness-audit.md

+| Scanner               | Primary files/formats                                      | Current evidence | Next proof target                                               |
+| --------------------- | ---------------------------------------------------------- | ---------------- | --------------------------------------------------------------- |
+| `pickle`              | `.pkl`, `.pickle`, `.dill`, `.bin`, `.pt`, `.pth`, `.ckpt` | E3               | post-budget and malformed opcode corpus parity                  |
+| `picklescan_adapter`  | standalone picklescan bridge                               | E3               | adapter/cache equivalence for inconclusive reports              |
+| `pytorch_zip`         | ZIP-backed PyTorch checkpoints                             | E3               | ZIP metadata parse boundaries and nested pickle cache semantics |
+| `pytorch_binary`      | raw `.bin` PyTorch-like blobs                              | E1               | bounded binary fallback and benign weight near-matches          |
+| `joblib`              | `.joblib`, compressed/raw pickle wrappers                  | E3               | codec failure semantics and cache preservation                  |
+| `jax_checkpoint`      | JAX/Orbax/checkpoint pickles                               | E1               | index/metadata structure failures and nested pickle routing     |
+| `flax_msgpack`        | `.msgpack`, `.flax`, `.orbax`, `.jax`                      | E1               | msgpack extension types, depth, and partial unpack coverage     |
+| `numpy`               | `.npy`, `.npz`                                             | E3               | object-array pickle failures and `.npz` member routing          |
+| `safetensors`         | `.safetensors`                                             | E3               | malformed header/schema and dtype consistency                   |
+| `keras_h5`            | HDF5 Keras models                                          | E3, PR #917      | cache and aggregate semantics after malformed config fixes      |
+| `keras_zip`           | `.keras` ZIP models                                        | E3, PR #918      | metadata/weights alias ambiguity after malformed config fixes   |
+| `tf_savedmodel`       | SavedModel dirs, `.pb`                                     | E1               | protobuf parse budgets and function library edges               |
+| `tf_metagraph`        | `.meta`                                                    | E1               | protobuf parse budgets and attr truncation semantics            |
+| `tflite`              | `.tflite`, routed `.bin`                                   | E3, PR #916      | flatbuffer table bounds and custom-op recovery                  |
+| `onnx`                | `.onnx`                                                    | E3, PR #915      | external data path policy and dtype coverage                    |
+| `coreml`              | `.mlmodel`                                                 | E3               | protobuf truncation, linked model paths, custom layer strings   |
+| `openvino`            | `.xml` IR                                                  | E3               | XML parse failures, entity/DOCTYPE boundaries, companion `.bin` |
+| `gguf`                | `.gguf`, `.ggml`, related                                  | E3, PR #914      | metadata value type matrix and tensor offset checks             |
+| `xgboost`             | `.bst`, `.model`, `.json`, `.ubj`                          | E1               | JSON/UBJSON malformed root, subprocess isolation                |
+| `lightgbm`            | `.model`, `.txt`, `.lgb`, `.lightgbm`                      | E1               | text parser bounds and native-library indicators                |
+| `catboost`            | `.cbm`                                                     | E3, PR #924      | binary marker bounds and metadata strings                       |
+| `mxnet`               | `*-symbol.json`, `*-NNNN.params`                           | E3, PR #923      | graph reference traversal and metadata payload recovery         |
+| `nemo`                | `.nemo` tar archives                                       | E3, PR #919      | multi-config precedence and malformed member combinations       |
+| `jinja2_template`     | tokenizer configs, YAML, templates, GGUF metadata          | E3, PR #920      | cache preservation and GGUF metadata extraction failures        |
+| `skops`               | `.skops` ZIP archives                                      | E3               | JSON schema variations and duplicate member precedence          |
+| `torchserve_mar`      | `.mar` archives                                            | E3               | manifest schema roots and handler AST edge cases                |
+| `oci_layer`           | OCI `.manifest`                                            | E3               | manifest schema roots, local-vs-remote layer resolution         |
+| `zip`                 | generic ZIP/NPZ/MAR fallback                               | E3               | unsupported member failure semantics and cleanup                |
+| `tar`                 | tar families                                               | E3               | unsupported member failure semantics and cleanup                |
+| `sevenzip`            | `.7z`                                                      | E3               | nested routing parity with ZIP/TAR                              |
+| `compressed`          | `.gz`, `.bz2`, `.xz`, `.lz4`, `.zlib`                      | E3               | wrapper extension inference and temporary cleanup               |
+| `manifest`            | model/config manifests                                     | E3, PR #922      | JSON/YAML/TOML malformed roots and nested scanning              |
+| `metadata`            | model cards/docs/text                                      | E1               | secret/security pattern false positives and truncation          |
+| `text`                | general text docs                                          | E0               | duplicate responsibility with metadata/manifest                 |
+| `pmml`                | `.pmml`                                                    | E3               | XML parse boundaries and extension payload recovery             |
+| `paddle`              | `.pdmodel`, `.pdiparams`                                   | E3, PR #925      | protobuf/op descriptor parse failures                           |
+| `cntk`                | `.dnn`, `.cmf`                                             | E3               | split reference tracking and malformed binary handling          |
+| `rknn`                | `.rknn`                                                    | E1               | marker and string extraction bounds                             |
+| `torch7`              | `.t7`, `.th`, `.net`                                       | E1               | legacy serialization parse failures                             |
+| `r_serialized`        | `.rds`, `.rda`, `.rdata`                                   | E1               | format header variants and string extraction bounds             |
+| `executorch`          | `.ptl`, `.pte`                                             | E1               | archive/table parse failures and nested payloads                |
+| `tensorrt`            | `.engine`, `.plan`, `.trt`                                 | E3               | plugin marker matrix and binary truncation                      |
+| `llamafile`           | `.llamafile`, `.exe`, extensionless                        | E1               | executable header routing and model payload boundaries          |
+| `weight_distribution` | optional secondary analysis                                | E0               | optional dependency isolation and non-security failure behavior |
+


🧹 Nitpick | 🔵 Trivial

Reduce scanner-inventory drift against registry metadata.

This table is high-value but manually curated. It will drift from modelaudit/scanner_registry_metadata.py unless you pin a source snapshot (e.g., scanner count + generated-at note, or scripted generation).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/agents/repo-correctness-audit.md` around lines 91 - 137, The docs table in docs/agents/repo-correctness-audit.md will drift from the canonical scanner registry; update the workflow to derive this table from the source scanner_registry_metadata.py instead of manual edits: add a script (e.g., generate_scanner_inventory_doc) that reads SCANNER_REGISTRY (or the module-level registry/metadata in scanner_registry_metadata.py), emits the markdown table and a generated-at timestamp + scanner count, and wire that script into CI (or commit its output) so docs are regenerated automatically; update the README/table header to note it is autogenerated from scanner_registry_metadata.py.

coderabbitai · 2026-04-11T04:58:40Z

docs/agents/repo-correctness-audit.md

+Earlier open PRs from the same boundary-hardening campaign include #901 and
+#907 through #916. All open PR entries remain provisional until CI and review
+complete; treat them as evidence of audited findings, not landed behavior.


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify markdown lint/format status for this file fd -i "repo-correctness-audit.md" docs -x sed -n '150,160p' {} npx markdownlint-cli2 "docs/agents/repo-correctness-audit.md"

Repository: promptfoo/modelaudit

Length of output: 8509

Run prettier to fix markdown formatting violations.

The file has 68 markdown linting errors. Line 155 has the MD018 error you mentioned, but the primary issues are MD013 (line-length) violations across lines 79–87, 91–136, and 142–152. Per the coding guideline, format the file using:

npm ci --ignore-scripts && npx prettier --write docs/agents/repo-correctness-audit.md

This will resolve all linting errors, including the MD018 issue on line 155.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 155-155: No space after hash on atx style heading

(MD018, no-missing-space-atx)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/agents/repo-correctness-audit.md` around lines 154 - 156, Run Prettier to fix the markdown lint errors (MD013 line-length violations and MD018) in this document: execute the recommended command to install dev deps and reformat the file (npm ci --ignore-scripts && npx prettier --write docs/agents/repo-correctness-audit.md), review the resulting changes around the “Earlier open PRs from the same boundary-hardening campaign include `#901` and `#907` through `#916`” paragraph to confirm MD018 is resolved, and commit the formatted file.

docs: add repo correctness audit ledger

51b47c1

mldangelo-oai enabled auto-merge (squash) April 10, 2026 20:39

mldangelo-oai added 2 commits April 10, 2026 15:45

docs: update correctness audit ledger

1f4244c

docs: format correctness audit ledger

1591c1e

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

mldangelo-oai merged commit 06be0b6 into main Apr 11, 2026
6 checks passed

mldangelo-oai deleted the mdangelo/codex/repo-correctness-audit-ledger branch April 11, 2026 05:33

github-actions bot mentioned this pull request Apr 11, 2026

chore(main): release 0.2.35 #929

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add repo correctness audit ledger#921

docs: add repo correctness audit ledger#921
mldangelo-oai merged 3 commits intomainfrom
mdangelo/codex/repo-correctness-audit-ledger

mldangelo-oai commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mldangelo-oai commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading