docs: add TEST-NEEDS.md and/or PROOF-NEEDS.md from audit

hyperpolymath · claude · hyperpolymath · commit c4681aeed22e · 2026-03-30T13:23:00.000+01:00
Documents testing and proof gaps identified during batch audit.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/PROOF-NEEDS.md b/PROOF-NEEDS.md
@@ -0,0 +1,49 @@
+# PROOF-NEEDS.md
+<!-- SPDX-License-Identifier: PMPL-1.0-or-later -->
+
+## Current State
+
+- **LOC**: ~9,400
+- **Languages**: Haskell, Idris2, Zig
+- **Existing ABI proofs**: `src/abi/*.idr` (template-level)
+- **Dangerous patterns**: None detected in Haskell source
+
+## What Needs Proving
+
+### Taint Analysis (src/Sanctify/Analysis/Taint.hs)
+- Tracks tainted data flow through PHP code
+- Prove: taint propagation is sound (no tainted value reaches a sink without sanitization)
+- This is the security-critical core of the tool
+
+### Security Analysis (src/Sanctify/Analysis/Security.hs)
+- Detects security vulnerabilities in PHP
+- Prove: analysis does not have false negatives for the declared vulnerability classes
+
+### Dead Code Analysis (src/Sanctify/Analysis/DeadCode.hs)
+- Prove: reported dead code is genuinely unreachable
+
+### Type Checker (src/Sanctify/Analysis/Types.hs)
+- PHP type inference
+- Prove: type inference is sound with respect to PHP runtime semantics
+
+### Parser Correctness (src/Sanctify/Parser/)
+- `Parser.hs`, `Lexer.hs`, `Token.hs`
+- Prove: parser accepts valid PHP and rejects invalid PHP (or at minimum, is conservative)
+
+### Transform Soundness (src/Sanctify/Transform/)
+- `Sanitize.hs`, `Strict.hs`, `StrictTypes.hs`, `TypeHints.hs`
+- Prove: code transformations preserve program semantics
+- Prove: sanitization transforms eliminate the security vulnerabilities they claim to fix
+
+### WordPress-Specific Rules (src/Sanctify/WordPress/)
+- `Constraints.hs`, `Hooks.hs`, `Security.hs`
+- Prove: WordPress hook analysis correctly models WordPress execution order
+
+## Recommended Prover
+
+- **Agda** or **Lean4** — Haskell analysis tools have a strong tradition of formal verification
+- **Idris2** for ABI contracts
+
+## Priority
+
+**HIGH** — Security analysis tool. If the taint analysis is unsound, users trust code that is actually vulnerable. False negatives in a security tool are worse than no tool at all.
diff --git a/TEST-NEEDS.md b/TEST-NEEDS.md
@@ -0,0 +1,45 @@
+# TEST-NEEDS: sanctify-php
+
+## Current State
+
+| Category | Count | Details |
+|----------|-------|---------|
+| **Source modules** | 20 | Haskell: AST, Parser (Lexer, Token), Analysis (Advanced, DeadCode, Security, Taint), Transform (Sanitize, Strict, StrictTypes, TypeHints), WordPress (Constraints, Hooks, Security), Config, Emit, Report, Ruleset |
+| **Unit tests** | ~67 | SecuritySpec.hs (~30), TransformSpec.hs (~37) |
+| **Integration tests** | ~69 | Main.hs test harness |
+| **E2E tests** | 0 | No end-to-end with actual PHP files through full pipeline |
+| **Test fixtures** | 9 | PHP fixture files for SQL injection, XSS, WordPress, dead code, etc. |
+| **Benchmarks** | 0 | None |
+
+## What's Missing
+
+### E2E Tests
+- [ ] No test that runs sanctify-php as a binary on a PHP codebase
+- [ ] No test that validates transformed PHP output is syntactically valid
+
+### Aspect Tests
+- [ ] **Security**: SecuritySpec exists but only ~30 tests for a SECURITY ANALYSIS TOOL. Needs 200+
+- [ ] **Performance**: No tests for large PHP codebases (1000+ files)
+- [ ] **Concurrency**: No parallel analysis tests
+- [ ] **Error handling**: No tests for malformed PHP, encoding issues, huge files
+
+### Benchmarks Needed
+- [ ] Parsing throughput (lines/second on real WordPress codebases)
+- [ ] Taint analysis scaling with codebase size
+- [ ] Memory usage on large projects
+
+### Self-Tests
+- [ ] No self-diagnostic mode
+
+## FLAGGED ISSUES
+- **A security analysis tool with ~30 security tests** is embarrassing. This needs an order of magnitude more.
+- **Taint analysis module has 0 dedicated tests** -- the most critical analysis capability is untested
+- **Dead code detection has 0 dedicated tests** (only fixture files exist)
+
+## Priority: P1 (HIGH)
+
+## FAKE-FUZZ ALERT
+
+- `tests/fuzz/placeholder.txt` is a scorecard placeholder inherited from rsr-template-repo — it does NOT provide real fuzz testing
+- Replace with an actual fuzz harness (see rsr-template-repo/tests/fuzz/README.adoc) or remove the file
+- Priority: P2 — creates false impression of fuzz coverage