Skip to content

⚡️ Speed up function find_node_with_highest_degree by 5,727%#284

Open
codeflash-ai[bot] wants to merge 1 commit intopython-onlyfrom
codeflash/optimize-find_node_with_highest_degree-mlumboqb
Open

⚡️ Speed up function find_node_with_highest_degree by 5,727%#284
codeflash-ai[bot] wants to merge 1 commit intopython-onlyfrom
codeflash/optimize-find_node_with_highest_degree-mlumboqb

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

📄 5,727% (57.27x) speedup for find_node_with_highest_degree in src/algorithms/graph.py

⏱️ Runtime : 160 milliseconds 2.74 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 5726% speedup (from 160ms to 2.74ms) by eliminating a critical algorithmic inefficiency in how incoming connections are counted.

Key Optimization:

The original code uses a nested loop structure that scans all connections for every node to count incoming edges:

for node in nodes:                           # O(N) nodes
    for src, targets in connections.items(): # O(E) edges - repeated N times!
        if node in targets:                  # O(T) target list check

This creates O(N × E × T) complexity. The line profiler shows these two nested loops consuming 99.1% of total runtime (48.3% + 50.8%).

The optimized code precomputes incoming connection counts in a single upfront pass:

incoming_counts: dict[str, int] = {}
for src, targets in connections.items():     # O(E) - done once
    for target in set(targets):              # O(T) per source
        incoming_counts[target] = incoming_counts.get(target, 0) + 1

Then during the node iteration, incoming degree lookup becomes O(1):

degree += incoming_counts.get(node, 0)  # Simple dict lookup

Why This Works:

  • Algorithmic improvement: Changes from O(N × E × T) to O(E × T + N), which is dramatically faster when graphs have many nodes
  • Single-pass aggregation: Incoming connections are counted once and cached, rather than recomputed for each node
  • Deduplication: Using set(targets) ensures duplicate targets in a source's list are counted only once per source (matching the original behavior where if node in targets would only increment once per source)

Impact on Workloads:

The test results show the optimization excels with larger graphs:

  • Small graphs (2-3 nodes): 8-27% slower due to setup overhead
  • Medium graphs (50-500 nodes): 258-4524% faster
  • Large graphs (1000 nodes): 10826% faster

The performance benefit scales with graph size because the precomputation cost (O(E × T)) is amortized across all nodes, while the original O(N × E × T) cost grows multiplicatively with the number of nodes being analyzed.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 43 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from src.algorithms.graph import find_node_with_highest_degree

def test_simple_tie_returns_first_node():
    # three nodes with equal total degree (each has degree 2).
    nodes = ["A", "B", "C"]
    connections = {
        "A": ["B", "C"],  # A -> B, A -> C  (outgoing 2)
        "B": ["C"],       # B -> C          (outgoing 1)
        "C": []           # C has no outgoing
    }
    # Degrees computed by the function:
    # A: outgoing 2, incoming 0 => 2
    # B: outgoing 1, incoming 1 (from A) => 2
    # C: outgoing 0, incoming 2 (from A and B) => 2
    # All tie at 2, so the function should return the first node in 'nodes' list: "A"
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 2.25μs -> 2.46μs (8.46% slower)

def test_clear_winner_is_identified():
    # Construct a small directed graph where 'b' clearly has highest degree.
    nodes = ["a", "b", "c"]
    connections = {
        "a": ["b"],        # a -> b
        "b": ["a", "c"],   # b -> a, b -> c
        "c": ["b"]         # c -> b
    }
    # Degrees:
    # a: outgoing 1 + incoming 1 (from b) = 2
    # b: outgoing 2 + incoming 2 (from a and c) = 4  <-- highest
    # c: outgoing 1 + incoming 1 (from b) = 2
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.83μs -> 2.50μs (26.7% slower)

def test_empty_nodes_returns_none():
    # No nodes to inspect, function should return None (max_degree_node initialized to None).
    nodes = []
    connections = {"some": ["thing"]}
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 334ns -> 1.21μs (72.4% slower)

def test_empty_connections_counts_zero_and_returns_first():
    # When connections is empty, every node has degree 0 (no incoming or outgoing)
    nodes = ["x", "y", "z"]
    connections = {}
    # All degrees equal (0) so the first node should be returned.
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.42μs -> 1.29μs (9.60% faster)

def test_unlisted_nodes_in_connections_are_handled_for_incoming_edges():
    # connections contains nodes not in the nodes list; incoming edges from those sources
    # should still be counted for nodes that are present in 'nodes'.
    nodes = ["A", "B"]
    connections = {
        "A": ["X", "B"],  # A points to X and B (outgoing counts both)
        "X": ["A"]        # X (not in nodes list) points to A -> contributes to A's incoming
    }
    # Degrees:
    # A: outgoing 2 (X and B) + incoming 1 (from X) = 3
    # B: outgoing 0 + incoming 1 (from A) = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.67μs -> 2.12μs (21.6% slower)

def test_self_loop_counts_as_outgoing_and_incoming():
    # Self-loop should be counted in outgoing (len list) and in incoming (found in targets iteration).
    nodes = ["S", "A"]
    connections = {
        "S": ["S", "A"],  # self-loop S->S and S->A
        "A": []
    }
    # Degrees:
    # S: outgoing 2, incoming 1 (from itself) => total 3
    # A: outgoing 0, incoming 1 (from S) => total 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 1.50μs -> 1.96μs (23.4% slower)

def test_duplicate_outgoing_entries_affect_outgoing_but_incoming_counts_once_per_source():
    # If outgoing target list has duplicates, outgoing degree uses len(list) and so counts duplicates,
    # but incoming is computed with "if node in targets" which counts at most once per source.
    nodes = ["A", "B", "C"]
    connections = {
        "A": ["B", "B", "C"],  # A has duplicate entries for B
        "B": [],
        "C": []
    }
    # Degrees:
    # A: outgoing 3 (B,B,C) + incoming 0 => 3
    # B: outgoing 0 + incoming 1 (from A) => 1  (duplicates in A's list counted only once for incoming)
    # C: outgoing 0 + incoming 1 (from A) => 1
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 2.04μs -> 2.29μs (10.9% slower)

def test_connections_none_raises_attribute_error():
    # Passing None for connections will cause an AttributeError when function tries to call .get
    nodes = ["a"]
    connections = None
    with pytest.raises(AttributeError):
        find_node_with_highest_degree(nodes, connections) # 1.67μs -> 875ns (90.4% faster)

def test_nodes_none_raises_type_error():
    # Passing None for nodes will cause a TypeError when trying to iterate over nodes
    nodes = None
    connections = {}
    with pytest.raises(TypeError):
        find_node_with_highest_degree(nodes, connections) # 1.21μs -> 1.00μs (20.8% faster)

def test_large_scale_winner_with_many_incoming_edges():
    # Build 1000 nodes and construct connections so that 'n500' receives an incoming edge
    # from almost every other node. This ensures 'n500' has the highest degree.
    N = 1000
    nodes = [f"n{i}" for i in range(N)]
    connections = {}
    # Basic outgoing pattern: each node points to the next two nodes (wrap-around)
    for i in range(N):
        connections[nodes[i]] = [nodes[(i + 1) % N], nodes[(i + 2) % N]]

    # Now make nearly every node also point to n500 to give it many incoming edges.
    target = "n500"
    for i in range(N):
        if nodes[i] == target:
            continue
        # append an extra outgoing edge to n500; this increases incoming count for n500 by 1 per source
        connections[nodes[i]].append(target)

    # Now n500 should be the node with highest degree (N-1 incoming + its own 2 outgoing)
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 49.9ms -> 456μs (10826% faster)

def test_large_scale_tie_breaker_prefers_earlier_node():
    # Create 1000 nodes where nodes[0] and nodes[1] have the exact same highest degree,
    # and every other node points to both of them. The function should return nodes[0]
    # because when degrees tie it keeps the first encountered node with max degree.
    N = 1000
    nodes = [f"t{i}" for i in range(N)]
    connections = {}

    # Ensure keys exist for t0 and t1 (with empty outgoing)
    connections["t0"] = []
    connections["t1"] = []

    # For every other node, point to both t0 and t1 so they get equal incoming degree
    for i in range(2, N):
        connections[nodes[i]] = ["t0", "t1"]

    # Both t0 and t1 receive N-2 incoming edges, others have smaller degree.
    # Tie should be resolved by returning "t0" because it appears earlier in 'nodes'.
    codeflash_output = find_node_with_highest_degree(nodes, connections) # 35.6ms -> 301μs (11713% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from src.algorithms.graph import find_node_with_highest_degree

def test_single_node_no_connections():
    """Test with a single node that has no connections."""
    nodes = ["A"]
    connections = {}
    # Node A has degree 0 (no outgoing or incoming connections)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 916ns -> 958ns (4.38% slower)

def test_two_nodes_one_connection():
    """Test with two nodes where one has a single outgoing connection."""
    nodes = ["A", "B"]
    connections = {"A": ["B"]}
    # Node A has degree 1 (one outgoing to B)
    # Node B has degree 1 (one incoming from A)
    # A comes first in nodes list, so it should be returned
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.46μs -> 1.79μs (18.6% slower)

def test_three_nodes_simple_chain():
    """Test with three nodes in a simple chain A -> B -> C."""
    nodes = ["A", "B", "C"]
    connections = {"A": ["B"], "B": ["C"]}
    # A: degree 1 (one outgoing)
    # B: degree 2 (one outgoing to C, one incoming from A)
    # C: degree 1 (one incoming)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.92μs -> 2.17μs (11.6% slower)

def test_hub_node_multiple_connections():
    """Test with a hub node connected to multiple others."""
    nodes = ["hub", "node1", "node2", "node3"]
    connections = {"hub": ["node1", "node2", "node3"]}
    # hub: degree 3 (three outgoing connections)
    # node1, node2, node3: each degree 1 (one incoming from hub)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 2.04μs -> 2.38μs (14.0% slower)

def test_bidirectional_connection():
    """Test with bidirectional connections between two nodes."""
    nodes = ["A", "B"]
    connections = {"A": ["B"], "B": ["A"]}
    # A: degree 2 (one outgoing to B, one incoming from B)
    # B: degree 2 (one outgoing to A, one incoming from A)
    # A comes first in nodes list, so it should be returned
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.54μs -> 1.83μs (16.0% slower)

def test_self_loop():
    """Test with a node that connects to itself."""
    nodes = ["A", "B"]
    connections = {"A": ["A"]}
    # A: degree 2 (one outgoing self-loop, one incoming self-loop)
    # B: degree 0
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.38μs -> 1.79μs (23.3% slower)

def test_all_nodes_equal_degree():
    """Test when all nodes have equal degree."""
    nodes = ["A", "B", "C"]
    connections = {"A": ["B"], "B": ["C"], "C": ["A"]}
    # A: degree 2 (one outgoing to B, one incoming from C)
    # B: degree 2 (one outgoing to C, one incoming from A)
    # C: degree 2 (one outgoing to A, one incoming from B)
    # A comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.96μs -> 2.29μs (14.5% slower)

def test_node_with_multiple_incoming_same_source():
    """Test when a node appears multiple times in another node's connection list."""
    nodes = ["A", "B"]
    connections = {"A": ["B", "B", "B"]}
    # A: degree 3 (three outgoing connections to B)
    # B: degree 3 (three incoming connections from A)
    # A comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.50μs -> 1.79μs (16.3% slower)

def test_empty_nodes_list():
    """Test with an empty nodes list."""
    nodes = []
    connections = {}
    # No nodes to evaluate, should return None
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 333ns -> 583ns (42.9% slower)

def test_nodes_not_in_connections():
    """Test when nodes list contains entries not present in connections dict."""
    nodes = ["A", "B", "C"]
    connections = {}
    # All nodes have degree 0
    # A comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.33μs -> 1.25μs (6.64% faster)

def test_connections_reference_nodes_not_in_list():
    """Test when connections reference nodes not in the nodes list."""
    nodes = ["A", "B"]
    connections = {"X": ["A", "B"], "A": ["Y"]}
    # A: degree 1 (one outgoing to Y) + 1 (one incoming from X) = 2
    # B: degree 0 (no outgoing) + 1 (one incoming from X) = 1
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.54μs -> 2.12μs (27.5% slower)

def test_node_name_with_special_characters():
    """Test with node names containing special characters."""
    nodes = ["node-1", "node_2", "node.3"]
    connections = {"node-1": ["node_2", "node.3"]}
    # node-1: degree 2
    # node_2: degree 1
    # node.3: degree 1
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 2.04μs -> 2.21μs (7.52% slower)

def test_node_name_case_sensitivity():
    """Test that node names are case-sensitive."""
    nodes = ["A", "a"]
    connections = {"A": ["a"], "a": ["A"]}
    # A: degree 2 (one outgoing to a, one incoming from a)
    # a: degree 2 (one outgoing to A, one incoming from A)
    # A comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.46μs -> 1.96μs (25.5% slower)

def test_numeric_string_node_names():
    """Test with numeric string node names."""
    nodes = ["1", "2", "3"]
    connections = {"1": ["2", "3"], "2": ["3"]}
    # 1: degree 2
    # 2: degree 2 (one outgoing to 3, one incoming from 1)
    # 3: degree 2 (two incoming from 1 and 2)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.96μs -> 2.33μs (16.1% slower)

def test_single_node_with_self_loop_only():
    """Test with a single node that only has a self-loop."""
    nodes = ["A"]
    connections = {"A": ["A"]}
    # A: degree 2 (one outgoing self-loop, one incoming self-loop)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.04μs -> 1.46μs (28.5% slower)

def test_duplicate_nodes_in_list():
    """Test when nodes list contains duplicate entries."""
    nodes = ["A", "B", "A"]
    connections = {"A": ["B"]}
    # First A: degree 1
    # B: degree 1
    # Second A is a duplicate in the list but will be evaluated again
    # The first occurrence of A will be returned as it's found first
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.71μs -> 1.92μs (10.9% slower)

def test_node_with_empty_connection_list():
    """Test with explicit empty connection list for a node."""
    nodes = ["A", "B"]
    connections = {"A": [], "B": ["A"]}
    # A: degree 1 (one incoming from B)
    # B: degree 1 (one outgoing to A)
    # A comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.46μs -> 1.92μs (23.9% slower)

def test_very_long_node_names():
    """Test with very long node names."""
    long_name_1 = "a" * 1000
    long_name_2 = "b" * 1000
    nodes = [long_name_1, long_name_2]
    connections = {long_name_1: [long_name_2]}
    # long_name_1: degree 1
    # long_name_2: degree 1
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.38μs -> 1.79μs (23.3% slower)

def test_unicode_node_names():
    """Test with unicode characters in node names."""
    nodes = ["α", "β", "γ"]
    connections = {"α": ["β", "γ"], "β": ["γ"]}
    # α: degree 2
    # β: degree 2 (one outgoing to γ, one incoming from α)
    # γ: degree 2 (one incoming from β, one incoming from α)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 2.25μs -> 2.46μs (8.46% slower)

def test_whitespace_in_node_names():
    """Test with whitespace in node names."""
    nodes = ["node A", "node B", "node C"]
    connections = {"node A": ["node B", "node C"]}
    # node A: degree 2
    # node B: degree 1
    # node C: degree 1
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.88μs -> 2.08μs (9.99% slower)

def test_large_graph_star_topology():
    """Test with a large star graph (one hub connected to many nodes)."""
    hub = "hub"
    # Create 500 peripheral nodes
    nodes = [hub] + [f"node_{i}" for i in range(500)]
    # Hub connects to all other nodes
    connections = {hub: [f"node_{i}" for i in range(500)]}
    # hub: degree 500 (outgoing connections)
    # All other nodes: degree 1 (incoming from hub)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.12ms -> 109μs (925% faster)

def test_large_graph_chain_topology():
    """Test with a large chain graph (linear sequence of connections)."""
    num_nodes = 500
    nodes = [f"node_{i}" for i in range(num_nodes)]
    # Create connections: node_0 -> node_1 -> node_2 -> ... -> node_499
    connections = {}
    for i in range(num_nodes - 1):
        connections[f"node_{i}"] = [f"node_{i+1}"]
    # First node: degree 1 (one outgoing)
    # Middle nodes: degree 2 (one incoming, one outgoing)
    # Last node: degree 1 (one incoming)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 7.74ms -> 167μs (4524% faster)

def test_large_graph_complete_subgraph():
    """Test with a large complete graph (every node connects to every other)."""
    num_nodes = 50
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Each node connects to all other nodes
    for i in range(num_nodes):
        connections[f"node_{i}"] = [f"node_{j}" for j in range(num_nodes) if i != j]
    # Each node: degree = (num_nodes - 1) outgoing + (num_nodes - 1) incoming = 2*(num_nodes - 1)
    # All nodes have equal degree; node_0 comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 784μs -> 219μs (258% faster)

def test_large_graph_sparse_many_disconnected():
    """Test with a large sparse graph with many disconnected components."""
    num_nodes = 500
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Create many small disconnected pairs: 0->1, 2->3, 4->5, etc.
    for i in range(0, num_nodes - 1, 2):
        connections[f"node_{i}"] = [f"node_{i+1}"]
    # All connected nodes: degree 1
    # All unconnected nodes: degree 0
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 3.86ms -> 122μs (3048% faster)

def test_large_graph_highly_connected_node():
    """Test with a large graph where one node has very high degree."""
    num_nodes = 600
    highly_connected_node = "super_hub"
    nodes = [highly_connected_node] + [f"node_{i}" for i in range(num_nodes)]
    # super_hub connects to all other nodes and has many self-loops
    connections = {
        highly_connected_node: [f"node_{i}" for i in range(num_nodes)] + [highly_connected_node] * 100
    }
    # super_hub: degree 600 (outgoing) + 100 (self-loops counted as incoming) = 700
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 1.60ms -> 129μs (1133% faster)

def test_large_graph_two_hubs():
    """Test with a large graph containing two competing hub nodes."""
    num_nodes = 300
    hub1 = "hub_1"
    hub2 = "hub_2"
    periphery = [f"node_{i}" for i in range(num_nodes)]
    nodes = [hub1, hub2] + periphery
    # hub1 connects to first half of periphery
    connections = {
        hub1: periphery[:150],
        hub2: periphery[150:],
    }
    # hub1: degree 150 (outgoing)
    # hub2: degree 150 (outgoing)
    # hub1 comes first in nodes list
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 597μs -> 63.2μs (845% faster)

def test_large_graph_ring_topology():
    """Test with a large ring topology (circular connections)."""
    num_nodes = 400
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Create ring: each node connects to next node (last connects to first)
    for i in range(num_nodes):
        next_node = (i + 1) % num_nodes
        connections[f"node_{i}"] = [f"node_{next_node}"]
    # All nodes have degree 2 (one outgoing, one incoming)
    # node_0 comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 4.96ms -> 132μs (3644% faster)

def test_large_graph_multiple_edges_same_target():
    """Test with a large graph where nodes have multiple edges to the same target."""
    num_nodes = 100
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Each node (except last) connects to next node multiple times
    for i in range(num_nodes - 1):
        # Create 5 duplicate edges to the next node
        connections[f"node_{i}"] = [f"node_{i+1}"] * 5
    # node_0: degree 5 (outgoing)
    # node_1 to node_{n-2}: degree 10 (5 outgoing + 5 incoming)
    # node_{n-1}: degree 5 (5 incoming)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 762μs -> 38.6μs (1877% faster)

def test_large_graph_dense_middle_section():
    """Test with a large graph where a middle section is densely connected."""
    num_nodes = 200
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Create dense connections in middle 20 nodes
    middle_start = 90
    middle_end = 110
    for i in range(middle_start, middle_end):
        # Each middle node connects to all other middle nodes
        connections[f"node_{i}"] = [f"node_{j}" for j in range(middle_start, middle_end) if i != j]
    # Middle nodes have very high degree (degree 38 each)
    # node_90 (first middle node) comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 763μs -> 65.8μs (1061% faster)

def test_large_graph_random_distribution():
    """Test with a large graph with semi-random connection distribution."""
    num_nodes = 250
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Create connections where each node connects to a few random nodes
    for i in range(num_nodes):
        # Node i connects to nodes (i+1) through (i+5) mod num_nodes
        targets = [(i + j) % num_nodes for j in range(1, 6)]
        connections[f"node_{i}"] = [f"node_{t}" for t in targets if t != i]
    # Most nodes have degree similar to those they connect to
    # node_0 should have high degree due to being connected from multiple sources
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 3.94ms -> 215μs (1725% faster)

def test_very_large_single_component():
    """Test with a very large single connected component."""
    num_nodes = 1000
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # Create a connected component where each node connects to several others
    for i in range(num_nodes):
        targets = []
        # Connect to next 3 nodes cyclically
        for j in range(1, 4):
            targets.append(f"node_{(i + j) % num_nodes}")
        connections[f"node_{i}"] = targets
    # Each node has 3 outgoing connections and roughly 3 incoming connections
    # All have similar degrees; node_0 comes first
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 47.1ms -> 571μs (8150% faster)

def test_large_graph_power_law_distribution():
    """Test with a graph following a power-law distribution (scale-free)."""
    num_nodes = 300
    nodes = [f"node_{i}" for i in range(num_nodes)]
    connections = {}
    # First node connects to all others (preferential attachment effect)
    connections["node_0"] = [f"node_{i}" for i in range(1, num_nodes)]
    # Second node connects to first half (less preferred)
    connections["node_1"] = [f"node_{i}" for i in range(2, num_nodes // 2)]
    # Other nodes have minimal connections
    for i in range(2, 10):
        connections[f"node_{i}"] = [f"node_{i+1}"]
    # node_0 has highest degree (num_nodes - 1 = 299)
    codeflash_output = find_node_with_highest_degree(nodes, connections); result = codeflash_output # 789μs -> 92.3μs (755% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_node_with_highest_degree-mlumboqb and push.

Codeflash Static Badge

The optimized code achieves a **5726% speedup** (from 160ms to 2.74ms) by eliminating a critical algorithmic inefficiency in how incoming connections are counted.

**Key Optimization:**

The original code uses a nested loop structure that scans *all* connections for *every* node to count incoming edges:
```python
for node in nodes:                           # O(N) nodes
    for src, targets in connections.items(): # O(E) edges - repeated N times!
        if node in targets:                  # O(T) target list check
```

This creates O(N × E × T) complexity. The line profiler shows these two nested loops consuming 99.1% of total runtime (48.3% + 50.8%).

The optimized code **precomputes** incoming connection counts in a single upfront pass:
```python
incoming_counts: dict[str, int] = {}
for src, targets in connections.items():     # O(E) - done once
    for target in set(targets):              # O(T) per source
        incoming_counts[target] = incoming_counts.get(target, 0) + 1
```

Then during the node iteration, incoming degree lookup becomes O(1):
```python
degree += incoming_counts.get(node, 0)  # Simple dict lookup
```

**Why This Works:**

- **Algorithmic improvement**: Changes from O(N × E × T) to O(E × T + N), which is dramatically faster when graphs have many nodes
- **Single-pass aggregation**: Incoming connections are counted once and cached, rather than recomputed for each node
- **Deduplication**: Using `set(targets)` ensures duplicate targets in a source's list are counted only once per source (matching the original behavior where `if node in targets` would only increment once per source)

**Impact on Workloads:**

The test results show the optimization excels with larger graphs:
- Small graphs (2-3 nodes): 8-27% slower due to setup overhead
- Medium graphs (50-500 nodes): 258-4524% faster
- Large graphs (1000 nodes): 10826% faster

The performance benefit scales with graph size because the precomputation cost (O(E × T)) is amortized across all nodes, while the original O(N × E × T) cost grows multiplicatively with the number of nodes being analyzed.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 February 20, 2026 08:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants

Comments