Skip to content

How It Works

Dianel edited this page Apr 9, 2026 · 1 revision

How It Works

GraphOptim treats code as a graph problem. This page explains the full pipeline.

Pipeline Overview

Source Code → AST → Control Flow Graph (CFG) → Weighted DiGraph → Analysis → Optimization

Step 1: Parse

Python source code is parsed into an Abstract Syntax Tree (AST) using Python's built-in ast module.

Step 2: Build CFG

The AST is transformed into a Control Flow Graph — a directed graph where:

  • Nodes = Basic blocks (sequences of statements with no branches)
  • Edges = Control flow transitions (if/else branches, loop entries/exits, etc.)
  • Edge weights = Estimated computational cost of the transition
┌─────────────┐
│ Entry Block  │
│  x = input() │
└──────┬──────┘
       │
┌──────▼──────┐
│  if x > 0   │──── False ────┐
└──────┬──────┘               │
       │ True                  │
┌──────▼──────┐        ┌──────▼──────┐
│  return x   │        │  return -x  │
└──────┬──────┘        └──────┬──────┘
       │                      │
┌──────▼──────────────────────▼──────┐
│              Exit Block             │
└────────────────────────────────────┘

Step 3: Extract Metrics

Seven graph-theoretic metrics are extracted from the CFG:

# Metric Formula What It Detects
1 Cyclomatic Complexity E - N + 2P Overall branching density
2 Dead Nodes in_degree=0 and not entry Unreachable code blocks
3 CFG Diameter Longest shortest path Deep execution chains
4 Avg Shortest Path Mean over all pairs Execution efficiency
5 Max Betweenness Centrality Node bottleneck score Over-centralized functions
6 Node/Edge Ratio E/N Branching density
7 Lines of Code Non-blank lines Verbosity

Step 4: Detect Patterns

Four pattern detectors run on the CFG:

  1. Dead Nodes — Unreachable code after return/raise/break/continue
  2. Redundant Paths — Duplicate conditional branches (structurally equal via AST hashing)
  3. Bottleneck Nodes — High betweenness centrality (>0.7) indicating over-centralized code
  4. Deep Chains — Execution paths deeper than threshold (default: 8)

Step 5: Select Passes (Knapsack)

Available optimization passes are evaluated. Each has:

  • cost — Risk of introducing bugs (0.0-1.0)
  • benefit — Expected improvement (0.0-1.0)

A 0/1 Knapsack dynamic programming solver selects the optimal subset within the user's risk budget.

Step 6: Optimize

Selected passes transform the AST:

  1. Apply each pass sequentially
  2. Validate output with ast.parse() after each pass
  3. Skip any pass that produces invalid code
  4. Post-process for readability

Data Flow Graph (DFG)

In addition to the CFG, GraphOptim builds a Data Flow Graph tracking variable definitions and usages. This enables:

  • Detection of unused variables
  • Understanding data dependencies between operations
  • Future dead data flow elimination

Architecture

graphoptim/
├── parser/          ← Source → Graph
│   ├── cfg_builder  ← Control Flow Graph
│   ├── dfg_builder  ← Data Flow Graph
│   └── ast_utils    ← AST helpers
├── analyzer/        ← Graph → Insights
│   ├── metrics      ← 7 graph metrics
│   ├── patterns     ← Pattern detectors
│   └── reporter     ← Scored reports
├── optimizer/       ← Insights → Better Code
│   ├── passes/      ← Individual optimizations
│   │   ├── dead_code
│   │   ├── path_shortener
│   │   ├── centrality
│   │   └── knapsack  ← Pass selection
│   └── rewriter     ← Orchestration
├── benchmark/       ← Empirical validation
├── cli.py           ← Command-line interface
└── config.py        ← Configuration

Clone this wiki locally