How It Works

GraphOptim treats code as a graph problem. This page explains the full pipeline.

Pipeline Overview

Source Code → AST → Control Flow Graph (CFG) → Weighted DiGraph → Analysis → Optimization

Step 1: Parse

Python source code is parsed into an Abstract Syntax Tree (AST) using Python's built-in ast module.

Step 2: Build CFG

The AST is transformed into a Control Flow Graph — a directed graph where:

Nodes = Basic blocks (sequences of statements with no branches)
Edges = Control flow transitions (if/else branches, loop entries/exits, etc.)
Edge weights = Estimated computational cost of the transition

┌─────────────┐
│ Entry Block  │
│  x = input() │
└──────┬──────┘
       │
┌──────▼──────┐
│  if x > 0   │──── False ────┐
└──────┬──────┘               │
       │ True                  │
┌──────▼──────┐        ┌──────▼──────┐
│  return x   │        │  return -x  │
└──────┬──────┘        └──────┬──────┘
       │                      │
┌──────▼──────────────────────▼──────┐
│              Exit Block             │
└────────────────────────────────────┘

Step 3: Extract Metrics

Seven graph-theoretic metrics are extracted from the CFG:

#	Metric	Formula	What It Detects
1	Cyclomatic Complexity	`E - N + 2P`	Overall branching density
2	Dead Nodes	`in_degree=0 and not entry`	Unreachable code blocks
3	CFG Diameter	Longest shortest path	Deep execution chains
4	Avg Shortest Path	Mean over all pairs	Execution efficiency
5	Max Betweenness Centrality	Node bottleneck score	Over-centralized functions
6	Node/Edge Ratio	`E/N`	Branching density
7	Lines of Code	Non-blank lines	Verbosity

Step 4: Detect Patterns

Four pattern detectors run on the CFG:

Dead Nodes — Unreachable code after return/raise/break/continue
Redundant Paths — Duplicate conditional branches (structurally equal via AST hashing)
Bottleneck Nodes — High betweenness centrality (>0.7) indicating over-centralized code
Deep Chains — Execution paths deeper than threshold (default: 8)

Step 5: Select Passes (Knapsack)

Available optimization passes are evaluated. Each has:

cost — Risk of introducing bugs (0.0-1.0)
benefit — Expected improvement (0.0-1.0)

A 0/1 Knapsack dynamic programming solver selects the optimal subset within the user's risk budget.

Step 6: Optimize

Selected passes transform the AST:

Apply each pass sequentially
Validate output with ast.parse() after each pass
Skip any pass that produces invalid code
Post-process for readability

Data Flow Graph (DFG)

In addition to the CFG, GraphOptim builds a Data Flow Graph tracking variable definitions and usages. This enables:

Detection of unused variables
Understanding data dependencies between operations
Future dead data flow elimination

Architecture

graphoptim/
├── parser/          ← Source → Graph
│   ├── cfg_builder  ← Control Flow Graph
│   ├── dfg_builder  ← Data Flow Graph
│   └── ast_utils    ← AST helpers
├── analyzer/        ← Graph → Insights
│   ├── metrics      ← 7 graph metrics
│   ├── patterns     ← Pattern detectors
│   └── reporter     ← Scored reports
├── optimizer/       ← Insights → Better Code
│   ├── passes/      ← Individual optimizations
│   │   ├── dead_code
│   │   ├── path_shortener
│   │   ├── centrality
│   │   └── knapsack  ← Pass selection
│   └── rewriter     ← Orchestration
├── benchmark/       ← Empirical validation
├── cli.py           ← Command-line interface
└── config.py        ← Configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How It Works

How It Works

Pipeline Overview

Step 1: Parse

Step 2: Build CFG

Step 3: Extract Metrics

Step 4: Detect Patterns

Step 5: Select Passes (Knapsack)

Step 6: Optimize

Data Flow Graph (DFG)

Architecture

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally