Skip to content

Add tree-sitter C/C++ reachability review#199

Open
asotic04 wants to merge 37 commits into
arm:mainfrom
asotic04:reachability-initial-testing
Open

Add tree-sitter C/C++ reachability review#199
asotic04 wants to merge 37 commits into
arm:mainfrom
asotic04:reachability-initial-testing

Conversation

@asotic04
Copy link
Copy Markdown

Summary

This PR adds tree-sitter based C/C++ reachability review support to Metis.

The new pipeline builds a C/C++ function/call graph with tree-sitter, then traces source-rooted paths, and uses that graph to drive focused LLM review.

Initially, the idea was to build the graph using LLM, and then continue with the process of scanning paths that included a source. This worked well, but the creation of the graph was expensive in tokens, and it was not as deterministic as the tree-sitter method. Also, when I had the LLM creating graphs, I was also prompting it to take notes on the file, including any potentially vulnerable patterns found there. Later LLM calls, which had sole task of finding security issues, took these notes as ground truth and that resulted in overreporting of issues that the graph creation worker flagged in passing.

Changes

  • Added modular C/C++ reachability services for graph construction, path tracing, file-focused context, supplementary audits, finding annotation, and deduplication.
  • Integrated reachability into review_code, review_file, and reachability.
  • Preserved legacy/plugin review for non-C/C++ files, so mixed-language repositories still review Python and other supported languages. The idea is to add a similar tree-sitter reachability analysis to other languages soon.
  • Updated reachability review output to match the existing SaaS review schema, including numeric confidence values.
  • Preserved reachability-specific metadata in findings, including path, primary file/function/line, analysis type, canonical key, and connected-function reasoning.

C/C++ reachability flow

For C/C++ files, Metis now uses tree-sitter to extract functions, callsites, globals, source-like entry points, and sink-like operations. It resolves calls where possible, traces paths through the resulting graph, and runs focused confirmation and supplementary semantic passes over graph-selected code context.

For review_code, C/C++ reachability runs first. If the repository also contains non-C/C++ files, those continue through the existing plugin review path.

For review_file, C/C++ files use scoped reachability context around the reviewed file, including inbound, outbound, shared, lifecycle, and callback-related context.

Comment thread src/metis/cli/entry.py Outdated
Comment on lines +522 to +527
parser.add_argument(
"--reachability-extraction-model",
type=str,
default=None,
help="Deprecated; graph extraction now uses deterministic tree-sitter.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this option can be removed as I think it was added and deprecated only in this branch.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants