Skip to content

Commit b9f3c0f

Browse files
rahlkclaude
andcommitted
Release 0.1.14: call graph + CodeQL integration
Adds a real call graph to PyApplication, with Jedi and CodeQL as collaborating backends. See CHANGELOG.md for the full feature/fix list. Highlights: - PyCallEdge schema; call_graph: List[PyCallEdge] on PyApplication - call_graph module: networkx adapters with ghost-node support for RPC / third-party / framework endpoints; jedi/codeql edge derivation; heuristic constructor fallback; provenance-aware merge - CodeQL pipeline against codeql/python-all 7.x: CLI auto-downloaded per-project, pack dependencies installed once, queries colocated in the prepared qlpack; augment_call_sites backfills resolutions - Jedi-side fixes: _call_sites now respects scope boundaries; constructor calls resolve to <class>.__init__ BREAKING: removed --analysis-level. Call graph is built unconditionally; use --codeql/--no-codeql to control CodeQL participation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1d59ffc commit b9f3c0f

14 files changed

Lines changed: 879 additions & 183 deletions

File tree

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,28 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.1.14] - 2026-05-13
9+
10+
### Added
11+
- **Call graph in analysis output**: `PyApplication.call_graph: List[PyCallEdge]`. Every run now produces a call graph in addition to the symbol table. Edges carry `source`, `target` (both `PyCallable.signature`), `weight`, and `provenance` (`jedi` / `codeql` / `joern`).
12+
- **`call_graph` module** (`codeanalyzer.semantic_analysis.call_graph`) with `to_digraph` / `from_digraph` networkx adapters, `jedi_call_graph_edges`, and `merge_edges`. Endpoints absent from the symbol table become ghost nodes so RPC / third-party / framework edges are preserved.
13+
- **CodeQL Python query** rewritten against the CodeQL Python library (was Java idioms before). Resolves direct calls and constructor calls via `ClassValue.lookup("__init__")`, using the modern `Value.getACall()` predicate (CodeQL Python 7.x).
14+
- **`augment_call_sites`**: when `--codeql` is enabled, CodeQL backfills `PyCallsite.callee_signature` entries Jedi left unresolved.
15+
- **`resolve_unresolved_constructors`**: heuristic fallback that walks the symbol table by class short-name and scope to fill in constructor sites neither Jedi nor CodeQL resolved (common for classes nested inside functions/methods). Synthesizes `<class>.__init__` signatures.
16+
- **`iter_classes_in_symbol_table`**: full recursive walker over classes — including inner classes, classes nested in functions, and classes nested in class methods.
17+
18+
### Changed
19+
- **BREAKING**: Removed `--analysis-level` / `analysis_level`. The call graph is built unconditionally; use `--codeql/--no-codeql` to control CodeQL participation. Jedi-derived edges are always available.
20+
- **Jedi constructor calls now resolve to `<class>.__init__`** (was: bare `<class>`). When `script.infer()` returns a class, the qualified name is rewritten to point at the constructor — matching where method `PyCallable`s actually live in the symbol table. `PyCallsite.is_constructor_call` now reflects Jedi's type inference (was: `method_name == "__init__"`, only true for explicit `obj.__init__()` calls).
21+
- **`_call_sites` scope correctness**: replaced naive `ast.walk` with `_iter_calls_in_scope`, which stops at nested `FunctionDef` / `AsyncFunctionDef` / `ClassDef` bodies (those have their own `PyCallable.call_sites`). Decorators, default arguments, return annotations, base classes and class keyword args are still walked since they execute in the enclosing scope. Previously, outer functions over-attributed every call from every nested definition.
22+
- CodeQL CLI binary is now downloaded into `<cache_dir>/codeql/bin/` (per-project, respecting `--cache-dir`) and discovered before any CodeQL operation — including when the database cache is reused. The downloaded archive is removed after extraction.
23+
- `CodeQLQueryRunner` now accepts the resolved binary path instead of relying on `PATH`. The temporary `.ql` file is written **inside** a per-project qlpack (`<cache_dir>/codeql/qlpack/`) whose `codeql/python-all` dependency is resolved once via `codeql pack install`, eliminating the lockfile / search-path gymnastics.
24+
25+
### Fixed
26+
- **`zipfile` extraction dropped Unix permissions** on the CodeQL CLI launcher, causing `PermissionError` on first query run. Entries are now extracted with their stored `external_attr` mode applied, plus a defensive `chmod +x` on the resolved binary.
27+
- **`rglob("codeql")` matched the bundled `codeql/codeql/` directory** before the launcher file, returning a directory instead of an executable. Both `CodeQLLoader` and `_ensure_codeql_bin` now filter to `is_file()`.
28+
- **`CodeQLQueryRunner` crashed on subprocess errors** with `'NoneType' object has no attribute 'stderr'` because `stderr=None` returns `None` from `communicate()`. Now captures `stderr=PIPE` and decodes bytes safely.
29+
830
## [0.1.13] - 2025-07-22
931

1032
### Improved

README.md

Lines changed: 10 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,6 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
8080
* --input -i PATH Path to the project root directory. [default: None] [required] │
8181
│ --output -o PATH Output directory for artifacts. [default: None] │
8282
│ --format -f [json|msgpack] Output format: json or msgpack. [default: json] │
83-
│ --analysis-level -a INTEGER 1: symbol table, 2: call graph. [default: 1] │
8483
│ --codeql --no-codeql Enable CodeQL-based analysis. [default: no-codeql] │
8584
│ --eager --lazy Enable eager or lazy analysis. Defaults to lazy. [default: lazy] │
8685
│ --cache-dir -c PATH Directory to store analysis cache. [default: None] │
@@ -112,33 +111,23 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
112111

113112
This will save the analysis results in `analysis.msgpack` in the specified directory.
114113

115-
3. **Toggle analysis levels with `--analysis-level`:**
116-
```bash
117-
codeanalyzer --input ./my-python-project --analysis-level 1 # Symbol table only
118-
```
119-
Call graph analysis can be enabled by setting the level to `2`:
120-
```bash
121-
codeanalyzer --input ./my-python-project --analysis-level 2 # Symbol table + Call graph
122-
```
123-
***Note: The `--analysis-level=2` is not yet implemented in this version.***
124-
125-
4. **Analysis with CodeQL enabled:**
114+
3. **Analysis with CodeQL enabled:**
126115
```bash
127116
codeanalyzer --input ./my-python-project --codeql
128117
```
129-
This will perform CodeQL-based analysis in addition to the standard symbol table generation.
118+
Every run produces a symbol table **and** a call graph. By default, edges come from Jedi's lexical analysis. Adding `--codeql` resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges. CodeQL also backfills resolved callees on Jedi-emitted call sites where Jedi couldn't resolve them.
130119

131-
***Note: Not yet fully implemented. Please refrain from using this option until further notice.***
120+
***Note: CodeQL integration is experimental. The CLI is downloaded into `<cache_dir>/codeql/` on first use and reused thereafter.***
132121

133-
5. **Eager analysis with custom cache directory:**
122+
4. **Eager analysis with custom cache directory:**
134123
```bash
135124
codeanalyzer --input ./my-python-project --eager --cache-dir /path/to/custom-cache
136125
```
137126
This will rebuild the analysis cache at every run and store it in `/path/to/custom-cache/.codeanalyzer`. The cache will be cleared by default after analysis unless you specify `--keep-cache`.
138127

139128
If you provide --cache-dir, the cache will be stored in that directory. If not specified, it defaults to `.codeanalyzer` in the current working directory (`$PWD`).
140129

141-
6. **Quiet mode (minimal output):**
130+
5. **Quiet mode (minimal output):**
142131
```bash
143132
codeanalyzer --input /path/to/my-python-project --quiet
144133
```
@@ -236,7 +225,6 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
236225
* --input -i PATH Path to the project root directory. [default: None] [required] │
237226
│ --output -o PATH Output directory for artifacts. [default: None] │
238227
│ --format -f [json|msgpack] Output format: json or msgpack. [default: json]. │
239-
│ --analysis-level -a INTEGER 1: symbol table, 2: call graph. [default: 1] │
240228
│ --codeql --no-codeql Enable CodeQL-based analysis. [default: no-codeql] │
241229
│ --eager --lazy Enable eager or lazy analysis. Defaults to lazy. [default: lazy] │
242230
│ --cache-dir -c PATH Directory to store analysis cache. [default: None] │
@@ -261,33 +249,23 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
261249

262250
Now, you can find the analysis results in `analysis.json` in the specified directory.
263251

264-
2. **Toggle analysis levels with `--analysis-level`:**
265-
```bash
266-
codeanalyzer --input ./my-python-project --analysis-level 1 # Symbol table only
267-
```
268-
Call graph analysis can be enabled by setting the level to `2`:
269-
```bash
270-
codeanalyzer --input ./my-python-project --analysis-level 2 # Symbol table + Call graph
271-
```
272-
***Note: The `--analysis-level=2` is not yet implemented in this version.***
273-
274-
3. **Analysis with CodeQL enabled:**
252+
2. **Analysis with CodeQL enabled:**
275253
```bash
276254
codeanalyzer --input ./my-python-project --codeql
277255
```
278-
This will perform CodeQL-based analysis in addition to the standard symbol table generation.
256+
Every run produces a symbol table **and** a call graph. By default, edges come from Jedi's lexical analysis. Adding `--codeql` resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges. CodeQL also backfills resolved callees on Jedi-emitted call sites where Jedi couldn't resolve them.
279257

280-
***Note: Not yet fully implemented. Please refrain from using this option until further notice.***
258+
***Note: CodeQL integration is experimental. The CLI is downloaded into `<cache_dir>/codeql/` on first use and reused thereafter.***
281259

282-
4. **Eager analysis with custom cache directory:**
260+
3. **Eager analysis with custom cache directory:**
283261
```bash
284262
codeanalyzer --input ./my-python-project --eager --cache-dir /path/to/custom-cache
285263
```
286264
This will rebuild the analysis cache at every run and store it in `/path/to/custom-cache/.codeanalyzer`. The cache will be cleared by default after analysis unless you specify `--keep-cache`.
287265

288266
If you provide --cache-dir, the cache will be stored in that directory. If not specified, it defaults to `.codeanalyzer` in the current working directory (`$PWD`).
289267

290-
5. **Save output in msgpack format:**
268+
4. **Save output in msgpack format:**
291269
```bash
292270
codeanalyzer --input ./my-python-project --output /path/to/analysis-results --format msgpack
293271
```

codeanalyzer/__main__.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,6 @@ def main(
2727
case_sensitive=False,
2828
),
2929
] = OutputFormat.JSON,
30-
analysis_level: Annotated[
31-
int,
32-
typer.Option("-a", "--analysis-level", help="1: symbol table, 2: call graph."),
33-
] = 1,
3430
using_codeql: Annotated[
3531
bool, typer.Option("--codeql/--no-codeql", help="Enable CodeQL-based analysis.")
3632
] = False,
@@ -82,7 +78,6 @@ def main(
8278
input=input,
8379
output=output,
8480
format=format,
85-
analysis_level=analysis_level,
8681
using_codeql=using_codeql,
8782
using_ray=using_ray,
8883
rebuild_analysis=rebuild_analysis,

0 commit comments

Comments
 (0)