Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
6c17021
test: Add no-SCIP review test, markdown entropy threshold
SimplyLiz Mar 22, 2026
f5e3535
fix: Address 3 issues from windup PR analysis
SimplyLiz Mar 22, 2026
68e3131
Merge pull request #170 from SimplyLiz/fix/windup-salvage
SimplyLiz Mar 22, 2026
434a37c
feat: Detect languages in subdirectories, support monorepo indexing
SimplyLiz Mar 22, 2026
a4c3468
test: Add monorepo language detection regression tests
SimplyLiz Mar 22, 2026
4d28b07
feat: Detect languages in subdirectories, support monorepo indexing
SimplyLiz Mar 22, 2026
29edba8
test: Add monorepo language detection regression tests
SimplyLiz Mar 22, 2026
54c5107
Merge pull request #172 from SimplyLiz/feature/monorepo-detection
SimplyLiz Mar 23, 2026
d505dd5
feat: Address 5 items from external technical review
SimplyLiz Mar 23, 2026
5eb84eb
Merge pull request #173 from SimplyLiz/fix/review-feedback
SimplyLiz Mar 23, 2026
16fad6f
Fix false positives in review checks (#174)
SimplyLiz Mar 24, 2026
c4261c8
fix: generated file detection, check summary reconciliation, glob mat…
SimplyLiz Mar 24, 2026
139e9a9
feat: Add regulatory compliance audit (GDPR, EU AI Act, ISO 27001, IS…
SimplyLiz Mar 24, 2026
22647b0
feat: token-optimized review skill with early exit and targeted reads
SimplyLiz Mar 24, 2026
79fb890
feat: Expand compliance audit to 20 frameworks with cross-framework m…
SimplyLiz Mar 24, 2026
df92ecf
fix: tighten review skill early-exit criteria and add blind spots sec…
SimplyLiz Mar 24, 2026
ec2a538
docs: Add compliance audit documentation and CI/CD examples
SimplyLiz Mar 24, 2026
010ee7f
feat: add /ckb-audit skill for token-optimized compliance auditing
SimplyLiz Mar 24, 2026
d363614
fix: Reduce false positives in review bug-patterns (29 → 2)
SimplyLiz Mar 24, 2026
32031e0
fix: file handle leaks, concurrency limit, err shadow, crossmap dedup
SimplyLiz Mar 25, 2026
c319ef8
fix: improve /ckb-review skill from dogfood findings
SimplyLiz Mar 25, 2026
317b888
fix: reduce compliance audit false positives by 53%
SimplyLiz Mar 25, 2026
0cb8ae1
fix: sync review skill with CKB output fields and correct check count
SimplyLiz Mar 25, 2026
4d525c9
fix: sync both skills with correct framework IDs and enriched output …
SimplyLiz Mar 25, 2026
cc00eda
fix: eliminate self-detection FPs and improve SQL injection heuristic
SimplyLiz Mar 25, 2026
6a0bb00
fix: round 4 — SQL injection regex precision and weak-crypto test skips
SimplyLiz Mar 25, 2026
452e63f
fix: round 5 — eliminate all remaining FPs, achieve 100/100 on self-a…
SimplyLiz Mar 25, 2026
cd8595f
fix: dogfood review — eliminate FPs across bug-patterns, secrets, cou…
SimplyLiz Mar 25, 2026
5323df2
fix: correct 4 wrong OWASP ASVS article references
SimplyLiz Mar 25, 2026
79effd5
docs: regenerate checks.md from source code with correct check IDs
SimplyLiz Mar 25, 2026
e3938a2
feat: add 5 new OWASP ASVS checks (8 → 13 total)
SimplyLiz Mar 25, 2026
72e9bde
docs: update check count to 131 and add 5 new OWASP ASVS checks
SimplyLiz Mar 25, 2026
ab7356d
docs: add compliance audit to CLAUDE.md
SimplyLiz Mar 25, 2026
0d782fe
feat: add 5 review-relevant tools to review preset (28 → 33 tools)
SimplyLiz Mar 25, 2026
7ddd201
feat: add 22 orphaned tools to presets, zero tools uncovered
SimplyLiz Mar 25, 2026
1a5f195
feat: surface startLine, endLine, lines in searchSymbols and explore …
SimplyLiz Mar 25, 2026
4de2d99
feat: auditCompliance MCP tool, per-symbol complexity, preset and sco…
SimplyLiz Mar 25, 2026
f00fedf
fix: improve MCP tool descriptions for auditCompliance and reviewPR
SimplyLiz Mar 25, 2026
f9b2ef8
fix: explore keySymbols now returns functions with complexity, not ju…
SimplyLiz Mar 26, 2026
b8a70fc
fix: eliminate 9 bug-pattern FPs found by dogfood review
SimplyLiz Mar 26, 2026
e632355
fix: zero bug-pattern findings — fix all 33 remaining from dogfood re…
SimplyLiz Mar 26, 2026
e0def08
fix: compliance audit crash — tree-sitter thread safety in IEC 61508 …
SimplyLiz Mar 26, 2026
4d88882
fix: reduce compliance audit noise by 92% (11,356 → 886 findings)
SimplyLiz Mar 26, 2026
5f08098
feat: --recommend flag for compliance audit + SQL injection precision
SimplyLiz Mar 26, 2026
6c36025
fix: insecure-random crypto/rand FP, eval-injection .github skip, SQL…
SimplyLiz Mar 26, 2026
5732242
fix: compliance audit score 48→70 — eliminate FPs across 10 check cat…
SimplyLiz Mar 26, 2026
1ac12e0
fix: add 'when to use' hints to 15 decision-critical MCP tool descrip…
SimplyLiz Mar 26, 2026
024f344
fix: compliance audit score 70→90 — address all remaining findings
SimplyLiz Mar 26, 2026
a23d2a4
fix: compliance audit score 90→95 — resolve TODOs, SBOM, provenance
SimplyLiz Mar 26, 2026
4966b7e
feat: implement all daemon stubs and query engine stubs — score 48→97
SimplyLiz Mar 26, 2026
49dfa7a
feat: listSymbols, getSymbolGraph, searchSymbols complexity, index wa…
SimplyLiz Mar 26, 2026
d31f427
feat: searchSymbols server-side filtering + batchGet reference counts
SimplyLiz Mar 26, 2026
3cbc76e
fix: listSymbols/searchSymbols returning 0 on MCP — two root causes
SimplyLiz Mar 26, 2026
90179f0
fix: listSymbols excludes struct fields (#) by default, filters anony…
SimplyLiz Mar 26, 2026
c55538b
fix: class body ranges in listSymbols + complexity in getSymbolGraph
SimplyLiz Mar 26, 2026
00c8f54
fix: coupling check FP for Flutter l10n files (fixes #185)
SimplyLiz Mar 27, 2026
7b66bab
feat: comprehensive generated file detection across ecosystems
SimplyLiz Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions .claude/commands/audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
Run a CKB-augmented compliance audit optimized for minimal token usage.

## Input
$ARGUMENTS - Optional: framework(s) to audit (default: auto-detect from repo context). Examples: "gdpr", "gdpr,pci-dss,hipaa", "all"

## Philosophy

CKB already ran deterministic checks across 20 regulatory frameworks, mapped every finding
to a specific regulation article, and assigned confidence scores. The LLM's job is ONLY what
CKB can't do: assess whether findings are real compliance risks or false positives given the
repo's actual purpose, and prioritize remediation by business impact.

### Available frameworks (20 total)

**Privacy:** gdpr, ccpa, iso27701
**AI:** eu-ai-act
**Security:** iso27001, nist-800-53, owasp-asvs, soc2, hipaa
**Industry:** pci-dss, dora, nis2, fda-21cfr11, eu-cra
**Supply chain:** sbom-slsa
**Safety:** iec61508, iso26262, do-178c
**Coding:** misra, iec62443

### CKB's blind spots (what the LLM must catch)

CKB maps code patterns to regulation articles using AST + regex + tree-sitter. It is
structurally correct but contextually blind:

- **Business context**: CKB flags PII patterns in a healthcare app and a game engine equally
- **Architecture awareness**: a finding in dead/test code vs production code has different weight
- **Compensating controls**: CKB can't see infrastructure-level encryption, WAFs, or IAM policies
- **Regulatory applicability**: CKB flags HIPAA in a repo that doesn't handle PHI
- **Risk prioritization**: 50 findings need ordering by actual business/legal exposure
- **Cross-reference noise**: the same hardcoded credential maps to 6 frameworks — that's 1 fix, not 6

## Phase 1: Structural scan (~2k tokens into context)

```bash
ckb audit compliance --framework=$ARGUMENTS --format=json --min-confidence=0.7 2>/dev/null
```

For large repos, scope to a specific path to reduce noise:
```bash
ckb audit compliance --framework=$ARGUMENTS --scope=src/api --format=json --min-confidence=0.7 2>/dev/null
```

If no framework specified, pick based on repo context:
- Has health/patient/medical code → `hipaa,gdpr`
- Has payment/billing/card code → `pci-dss,soc2`
- EU company or processes EU data → `gdpr,dora,nis2`
- AI/ML code → `eu-ai-act`
- Safety-critical/embedded → `iec61508,iso26262,misra`
- General SaaS → `iso27001,soc2,owasp-asvs`
- If unsure → `iso27001,owasp-asvs` (broadest applicability)

From the JSON output, extract:
- `score`, `verdict` (pass/warn/fail)
- `coverage[]` — per-framework scores with passed/warned/failed/skipped check counts
- `findings[]` — with check, severity, file, startLine, message, suggestion, confidence, CWE
- `checks[]` — per-check status and summary
- `summary` — total findings by severity, files scanned

Note:
- **Per-framework scores**: which frameworks are clean vs problematic
- **Finding count by severity**: errors are your priority
- **CWE references**: cross-reference with known vulnerability databases
- **Confidence scores**: low confidence (< 0.7) findings are likely false positives

**Early exit**: If verdict=pass and all framework scores ≥ 90, write a one-line summary and stop.

## Phase 2: Triage findings (targeted reads only)

Do NOT read every flagged file. Group findings by root cause first:

1. **Deduplicate cross-framework findings** — a hardcoded secret flagged by GDPR, PCI DSS, HIPAA, and ISO 27001 is one fix
2. **Check for dominant category** — if > 50% of findings are one category (e.g., "sql-injection"), investigate that category systemically (is the pattern matching too broad?) rather than checking each file individually
3. **Check applicability** — does this repo actually fall under the flagged framework? (e.g., HIPAA findings in a non-healthcare repo)
4. **Read only error-severity files** — warnings and info can wait
5. **For each error finding**, read just the flagged lines (not the whole file) and assess:
- Is this a real compliance risk or a pattern false positive?
- Are there compensating controls elsewhere? (check imports, config, middleware)
- What's the remediation effort: one-liner fix vs architectural change?

## Phase 3: Write the audit summary (be terse)

```markdown
## [COMPLIANT|NEEDS REMEDIATION|NON-COMPLIANT] — CKB score: [N]/100

[One sentence: what frameworks were audited and overall posture]

### Critical findings (must remediate)
1. **[framework]** `file:line` Art. [X] — [issue + remediation in one sentence]
2. ...

### Not applicable (false positives from context)
[List findings CKB flagged but that don't apply to this repo, with one-line reason]

### Cross-framework deduplication
[N findings deduplicated to M root causes]

### Framework scores
| Framework | Score | Status | Checks |
|-----------|-------|--------|--------|
| [name] | [N] | [pass/warn/fail] | [passed]/[total] |
```

If fully compliant: just the header + framework scores. Nothing else.

## Anti-patterns (token waste)

- Reading every flagged file → waste (group by root cause, read only errors)
- Treating cross-framework duplicates as separate issues → waste (1 code fix = 1 issue)
- Explaining what each regulation requires → waste (CKB already mapped articles)
- Re-checking frameworks CKB scored at 100 → waste
- Auditing frameworks that don't apply to this repo → waste
- Reading low-confidence findings (< 0.7) → waste (likely false positives)
- Suggesting infrastructure controls for code-level findings → out of scope
- Using wrong framework IDs (use pci-dss not pcidss, owasp-asvs not owaspasvs) → CKB error
186 changes: 113 additions & 73 deletions .claude/commands/review.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,138 @@
Run a comprehensive code review using CKB's deterministic analysis + your semantic review.
Run a CKB-augmented code review optimized for minimal token usage.

## Input
$ARGUMENTS - Optional: base branch (default: main), or "staged" for staged changes, or a PR number

## MCP vs CLI
## Philosophy

CKB runs as an MCP server in this environment. MCP mode is strongly preferred for interactive review because the SCIP index stays loaded between calls — drill-down tools like `findReferences`, `analyzeImpact`, and `explainSymbol` execute instantly against the in-memory index. CLI mode reloads the index on every invocation.
CKB already answered the structural questions (secrets? breaking? dead code? test gaps?).
The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design,
and intent. Every source line you read costs tokens — read only what CKB says is risky.

## The Three Phases
### CKB's blind spots (what the LLM must catch)

### Phase 1: CKB structural scan (5 seconds, 0 tokens)
CKB runs 15 deterministic checks with AST rules, SCIP index, and git history.
It is structurally sound but semantically blind:

Call the `reviewPR` MCP tool with compact mode:
```
reviewPR(baseBranch: "main", compact: true)
```
- **Logic errors**: wrong conditions (`>` vs `>=`), off-by-one, incorrect algorithm
- **Business logic**: domain-specific mistakes CKB has no context for
- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss
- **Input validation**: missing bounds checks, nil guards outside AST patterns
- **Race conditions**: concurrency issues, mutex ordering, shared state
- **Resource leaks**: file handles, goroutines, connections not closed on all paths
- **Incomplete refactoring**: callers missed across module boundaries
- **Domain edge cases**: error paths, boundary conditions tests don't cover

This returns ~1k tokens instead of ~30k — just the verdict, non-pass checks, top 10 findings, and action items. Use `compact: false` only if you need the full raw data.
CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score
of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines,
so pre-existing issues interacting with new code won't surface.

## Phase 1: Structural scan (~1k tokens into context)

If a PR number was given, get the base branch first:
```bash
BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
ckb review --base=main --format=json 2>/dev/null
```
Then pass it: `reviewPR(baseBranch: BASE, compact: true)`

> **If CKB is not running as an MCP server** (last resort), use the CLI instead:
> ```bash
> ./ckb review --base=main --format=json
> ```
> Note: CLI mode reloads the SCIP index on every call, so drill-down steps will be slower.

From CKB's output, immediately note:
- **Passed checks** → skip these categories. Don't waste tokens re-checking secrets, breaking changes, test coverage, etc.
- **Warned checks** → your review targets
- **Top hotspot files** → read these first
- **Test gaps** → functions to evaluate

### Phase 2: Drill down on CKB findings (0 tokens via MCP)

Before reading source code, use CKB's MCP tools to investigate specific findings. These calls are instant because the SCIP index is already loaded from Phase 1.

| CKB finding | Drill-down tool | What to check |
|---|---|---|
| Dead code | `findReferences(symbolId: "...")` or `searchSymbols` → `findReferences` | Does it actually have references? CKB's SCIP index can miss cross-package refs |
| Blast radius | `analyzeImpact(symbolId: "...")` | Are the "callers" real logic or just framework registrations? |
| Coupling gap | `explainSymbol(name: "...")` on the missing file | What does the co-change partner do? Does it actually need updates? |
| Bug patterns | Already verified by differential analysis | Just check the specific line CKB flagged |
| Complexity | `explainFile(path: "...")` | What functions are driving the increase? |
| Test gaps | `getAffectedTests(baseBranch: "main")` | Which tests exist? Which functions are actually untested? |
| Hotspots | `getHotspots(limit: 10)` | Full churn history for the flagged files |
If a PR number was given:
```bash
BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
ckb review --base=$BASE --format=json 2>/dev/null
```

### Phase 3: Semantic review of high-risk files
If "staged" was given:
```bash
ckb review --staged --format=json 2>/dev/null
```

Now read the actual source — but only for:
1. Files CKB ranked as top hotspots
2. Files with warned findings that survived drill-down
3. New files (CKB can't assess design quality of new code)
Parse the JSON output to extract:
- `score`, `verdict` — overall quality
- `checks[]` — status + summary per check (15 checks: breaking, secrets, tests, complexity,
coupling, hotspots, risk, health, dead-code, test-gaps, blast-radius, comment-drift,
format-consistency, bug-patterns, split)
- `findings[]` — severity + file + message + ruleId (top-level, separate from check details)
- `narrative` — CKB AI-generated summary (if available)
- `prTier` — small/medium/large
- `reviewEffort` — estimated hours + complexity
- `reviewers[]` — suggested reviewers with expertise areas
- `healthReport` — degraded/improved file counts

From checks, build three lists:
- **SKIP**: passed checks — don't touch these files or topics
- **INVESTIGATE**: warned/failed checks — these are your review scope
- **READ**: files with warn/fail findings — the only files you'll read

**Early exit**: Skip LLM ONLY when ALL conditions are met:
1. Score ≥ 90 (not 80 — per-check caps hide warnings at 80)
2. Zero warn/fail checks
3. Small change (< 100 lines of diff)
4. No new files (CKB has no SCIP history for them)

If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean
the code is semantically correct.

## Phase 2: Targeted source reading (the only token-expensive step)

Do NOT read the full diff. Do NOT read every changed file.

**For files CKB flagged (INVESTIGATE list):**
Read only the changed hunks via `git diff main...HEAD -- <file>`.

**For new files** (CKB has no history — these are your biggest blind spot):
- If it's a new package/module: read the entry point and types/interfaces first,
then follow references to understand the architecture before reading individual files
- If < 500 lines: read the file
- If > 500 lines: read the first 100 lines (types/imports) + functions CKB flagged
- Skip generated files, test files for existing tests, and config/CI/docs files

**For each file you read, look for exactly:**
- Logic errors (wrong condition, off-by-one, nil deref, race condition)
- Resource leaks (file handles, connections, goroutines not closed on error paths)
- Security issues (injection, auth bypass, secrets CKB's patterns missed)
- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch)
- Missing edge cases the tests don't cover
- Incomplete refactoring (callers that should have changed but didn't)

Do NOT look for: style, naming, formatting, documentation, test coverage —
CKB already checked these structurally.

## Phase 3: Write the review (be terse)

For each file, look for things CKB CANNOT detect:
- Logic bugs (wrong conditions, off-by-one, race conditions)
- Security issues (injection, auth bypass, data exposure)
- Design problems (wrong abstraction, unclear naming, leaky interfaces)
- Edge cases (nil inputs, empty collections, concurrent access)
- Error handling quality (not just missing — wrong strategy)
```markdown
## [APPROVE|REQUEST CHANGES|DISCUSS] — CKB score: [N]/100

### Phase 4: Write the review
[One sentence: what the PR does]

Format:
[If CKB provided narrative, include it here]

```markdown
## Summary
One paragraph: what the PR does, overall assessment.
**PR tier:** [small/medium/large] | **Review effort:** [N]h ([complexity])
**Health:** [N] degraded, [N] improved

## Must Fix
Findings that should block merge. File:line references.
### Issues
1. **[must-fix|should-fix]** `file:line` — [issue in one sentence]
2. ...

## Should Fix
Issues worth addressing but not blocking.
### CKB passed (no review needed)
[comma-separated list of passed checks]

## CKB Analysis
- Verdict: [pass/warn/fail], Score: [0-100]
- [N] checks passed, [N] warned
- Key findings: [top 3]
- False positives identified: [any CKB findings you disproved]
- Test gaps: [N] untested functions — [your assessment of which matter]
### CKB flagged (verified above)
[for each warn/fail finding: confirmed/false-positive + one-line reason]

## Recommendation
Approve / Request changes / Needs discussion
### Suggested reviewers
[reviewer — expertise area]
```

## Tips

- If CKB says "secrets: pass" — trust it, don't re-scan 100+ files
- If CKB says "breaking: pass" — trust it, SCIP-verified API comparison
- If CKB says "dead-code: FormatSARIF" — DON'T trust blindly, verify with `findReferences` or grep
- CKB's hotspot scores are based on git churn history — higher score = more volatile file = review more carefully
- CKB's complexity delta shows WHERE cognitive load increased — read those functions
If no issues found: just the header line + CKB passed list. Nothing else.

## Anti-patterns (token waste)

- Reading files CKB marked as pass → waste
- Reading generated files → waste
- Summarizing what the PR does in detail → waste (git log exists, CKB has narrative)
- Explaining why passed checks passed → waste
- Running MCP drill-down tools when CLI already gave enough signal → waste
- Reading test files to "verify test quality" → waste unless CKB flagged test-gaps
- Reading hotspot-only files with no findings → high churn ≠ needs review right now
- Trusting score >= 80 as "safe to skip" → dangerous (per-check caps hide warnings)
- Skipping new files because CKB didn't flag them → CKB has no SCIP data for new files
- Reading every new file in a large new package → read entry point + types first, then follow refs
- Ignoring reviewEffort/prTier → these tell you how thorough to be
Loading
Loading