Skip to content

feat: add static analysis pipeline with Semgrep and Qlty integration#18

Open
amar-zhuri wants to merge 24 commits intomainfrom
feat/static-analysis
Open

feat: add static analysis pipeline with Semgrep and Qlty integration#18
amar-zhuri wants to merge 24 commits intomainfrom
feat/static-analysis

Conversation

@amar-zhuri
Copy link
Collaborator

##Description:
## What this does

Runs Semgrep and Qlty on commit files automatically during evaluation. Semgrep catches security issues, Qlty catches code quality problems and code smells. Findings get fed into agent prompts so agents can reference real static analysis data in their evaluations.

How it works

  1. Git diff gives us the changed files
  2. Files get filtered — we skip anything irrelevant (165+ patterns: lock files, binaries, generated code, vendor dirs, docs, etc.)
  3. Semgrep and Qlty run in parallel on the remaining files
  4. Findings are filtered to only the lines that actually changed in the commit
  5. Results are deduplicated, sorted by severity, and injected into agent prompts

Key decisions

  • Tool installation happens at config --init time, not during evaluation — no surprise downloads mid-run
  • User exclusion patterns are additive — if someone adds "excludedPaths": ["legacy/**"] in their config, it adds to the defaults instead of wiping them
  • Graceful degradation — if Semgrep or Qlty aren't installed, evaluation continues without them and logs a warning
  • Config file stays cleanconfig --init no longer writes 165 exclusion patterns into .codewave.config.json, defaults live in code

Tests

137 tests across 13 test files covering runners, parsers, scope resolution, config merging, tool installation, and service orchestration.

Includes:
- MCP server core implementation
- Documentation updates
- Configuration changes
- pass commitDiff into static-analysis flow and filter findings by changed line ranges
- keep semgrep and qlty execution hybrid (semgrep parallel with serialized qlty pipeline)
- add broad default excludes for docs and multi-ecosystem lockfiles
- improve raw artifact readability by storing parsed stdout/stderr structures
…non-null assertions

- Extract requiresQltyInit() and qltyInitCompleted() into qlty-init-helper.ts,
  eliminating duplicated code from qlty-runner and qlty-smells-runner
- Remove ! non-null assertions in tool-runner-registry by capturing
  executable into a local const before the runnable boolean check
Semgrep was timing out at 60s when running in parallel with qlty tools.
Doubling the timeout to 120s gives semgrep enough headroom under concurrent load.
- extractFilesFromDiff now captures the b/ (new/renamed) path instead of
  a/, so renamed files are no longer silently dropped as missing on disk
- Add a quoted-path branch for git C-quoted filenames (spaces, non-ASCII)
- Add unquoteGitPath to decode octal byte sequences and simple escapes
…rruption

- Parse quoted diff --git headers: diff --git "a/..." "b/..."
- Move unquoting before backslash normalization in normalizeDiffPath so
  octal escape sequences (\303\251) are not corrupted by the \ -> / pass
- Add unquoteGitPath helper for octal byte sequences and simple escapes
- Add test covering non-ASCII filename (é) round-trip through diff parsing
…n transient failures

- Split cache into per-tool maps (cachedSemgrepByMode, cachedQltyByMode)
- Only cache a tool when it is successfully available, so a transient
  failure (network down, install timeout) does not lock it out for the
  process lifetime while the other tool remains available
- Update cache test to toStrictEqual since the cache now reconstructs
  the ToolAvailability object rather than returning the same reference
Add finding-formatter module that routes static analysis results to
agents based on category expertise. Each agent receives a filtered,
formatted view of findings relevant to their role:

- Category routing: security→architect, quality/style/bug→reviewer, etc.
- Primary agents see all severities; secondary agents see error+warning only
- Round 1 gets full findings; Round 2+ gets condensed error-only reference
- Safety cap (200 findings) prevents prompt bloat in pathological cases

Wire summary through LangGraph state → AgentContext → agent prompts.
…ll docs

Add production documentation for the static analysis feature across 7 files:
- ARCHITECTURE.md: full pipeline section with ASCII diagram, runners table,
  unified finding type, changed-lines scoping, category routing, agent injection
- ADVANCED_FEATURES.md: user-facing section covering routing, risk levels,
  round behavior, graceful degradation, and prompt format
- AGENTS.md: per-agent static analysis categories received blocks
- CONFIGURATION.md: tool installation subsection with caching and re-install
- CHANGELOG.md: v0.0.6 entry with features and fixes
- INDEX.md: navigation entries, feature coverage row, updated stats
- README.md: feature bullet and expanded quick start
…edPaths additive

- Add ~80 new default exclusion patterns covering vendor dirs, IDE configs,
  generated code, minified/bundled files, binary assets, build outputs,
  language caches, and CI/CD configs
- Fix config loader to merge user excludedPaths on top of defaults instead
  of silently replacing them
- Remove excludedPaths from config --init output so the config file stays
  clean — defaults live in code, users only add their own patterns
- Add tests for merge, deduplication, and preservation of defaults
@amar-zhuri amar-zhuri requested a review from rqirici February 27, 2026 13:33
@amar-zhuri amar-zhuri self-assigned this Feb 27, 2026
@amar-zhuri amar-zhuri added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant