Skip to content

Comments

⚡️ Speed up function find_last_node by 13,074%#271

Closed
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
codeflash/optimize-find_last_node-mldg7yhf
Closed

⚡️ Speed up function find_last_node by 13,074%#271
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
codeflash/optimize-find_last_node-mldg7yhf

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 8, 2026

📄 13,074% (130.74x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 25.8 milliseconds 196 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 131x speedup (13,073% faster) by fundamentally restructuring the algorithm to eliminate nested iteration. The original implementation uses a nested generator expression that results in O(n×m) complexity, where for each node it checks all edges. The optimized version reduces this to O(n+m) by pre-computing a set of source IDs.

Key optimization techniques:

  1. Set-based lookup instead of nested iteration: The original code calls all(e["source"] != n["id"] for e in edges) for each node, which means for every node, it iterates through all edges. The optimized version builds sources = {e["source"] for e in edges} once upfront, creating a hash set that enables O(1) membership checks via n["id"] not in sources.

  2. Early return path for empty edges: When there are no edges, the optimized code immediately returns the first node without any ID lookups, matching the original's lazy evaluation behavior while being more explicit and faster.

  3. Direct iteration over explicit loop: Replacing the generator expression with a straightforward for-loop eliminates the overhead of generator machinery and makes the control flow more efficient.

Performance across test cases:

The optimization shows dramatic improvements across all scenarios:

  • Small graphs (2-5 nodes): 71-129% faster
  • Medium graphs (100-300 nodes): 100-6,038% faster
  • Large graphs (500+ nodes): 16,450-23,607% faster

The speedup scales particularly well with graph size. For example:

  • test_large_scale_flow_find_last_node_performance (500 nodes): 4.50ms → 27.2μs (16,450% faster)
  • test_large_scale_grid_graph (900 nodes): 15.0ms → 63.5μs (23,607% faster)
  • test_large_scale_density_edge_case (100 nodes, dense): 874μs → 14.2μs (6,038% faster)

Why this matters:

This optimization is particularly valuable for graph algorithms that need to identify terminal nodes in flows or DAGs. The O(n+m) complexity means performance remains predictable even as graph size grows, whereas the original O(n×m) approach degrades rapidly with larger edge counts. The optimization maintains correctness across all edge cases including empty inputs, cycles, and type mismatches.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 38 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node


def test_basic_single_last_node():
    # Simple flow: A -> B. B has no outgoing edges and should be returned.
    a = {"id": "A"}
    b = {"id": "B"}
    nodes = [a, b]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 792ns (105% faster)


def test_empty_edges_returns_first_node():
    # When edges is empty, the implementation treats every node as having no outgoing edges,
    # so the first node in the nodes iterable should be returned.
    n1 = {"id": 1}
    n2 = {"id": 2}
    nodes = [n1, n2]
    edges = []  # no edges at all
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.00μs -> 583ns (71.5% faster)


def test_empty_nodes_returns_none():
    # If there are no nodes, nothing can be the "last node" and the function should return None.
    nodes = []
    edges = [{"source": "any"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 667ns -> 583ns (14.4% faster)


def test_multiple_last_nodes_returns_first_candidate():
    # If multiple nodes have no outgoing edges, the function should return the first matching one.
    # Here nodes order is important: node 'X' appears before 'Y' but both have no outgoing edges.
    x = {"id": "X"}
    y = {"id": "Y"}
    z = {"id": "Z"}
    nodes = [x, y, z]
    # Only Z has an outgoing edge in this flow; X and Y are both terminal candidates.
    edges = [{"source": "Z", "target": "something"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 667ns (87.4% faster)


def test_all_nodes_have_outgoing_returns_none():
    # If every node appears as an edge source, there is no terminal node and the result should be None.
    n1 = {"id": "a"}
    n2 = {"id": "b"}
    n3 = {"id": "c"}
    nodes = [n1, n2, n3]
    edges = [{"source": "a"}, {"source": "b"}, {"source": "c"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.00μs -> 959ns (109% faster)


def test_missing_edge_source_key_raises_keyerror():
    # If an edge dictionary lacks the 'source' key, accessing e["source"] should raise a KeyError.
    nodes = [{"id": "only"}]
    edges = [{"target": "only"}]  # missing 'source'
    with pytest.raises(KeyError):
        # The inner all(...) tries to access e["source"] and will raise KeyError.
        find_last_node(nodes, edges)  # 1.71μs -> 917ns (86.3% faster)


def test_type_mismatch_between_ids_and_sources_returns_first_node():
    # Equality is type-sensitive. If node ids are ints and edge sources are strings,
    # they will not compare equal; therefore, all(...) will be True and the first node is returned.
    n1 = {"id": 1}
    n2 = {"id": 2}
    nodes = [n1, n2]
    edges = [{"source": "1"}]  # string "1" is not equal to integer 1
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.42μs -> 750ns (88.8% faster)


def test_duplicate_node_ids_choose_first_occurrence():
    # When two nodes share the same id and edges are empty, the function should return
    # the first node in the list (first occurrence), not the second.
    first = {"id": "dup", "meta": "first"}
    second = {"id": "dup", "meta": "second"}
    nodes = [first, second]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.00μs -> 542ns (84.5% faster)


def test_edges_with_additional_keys_ignored_and_last_identified():
    # Edges may contain extra keys; only 'source' is relevant for the function.
    a = {"id": "A"}
    b = {"id": "B"}
    c = {"id": "C"}
    nodes = [a, b, c]
    # Edges contain many keys; only 'source' matters. A -> B, B -> C so C is terminal.
    edges = [
        {"source": "A", "weight": 1, "meta": None},
        {"source": "B", "label": "toC", "reversible": False},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.00μs -> 917ns (118% faster)


def test_large_scale_flow_find_last_node_performance_and_correctness():
    # Large-scale scenario with many nodes but within the allowed limit (under 1000).
    # We create N nodes and edges that make node with id N-1 the only terminal node.
    N = 500  # well under the 1000-element constraint
    nodes = [{"id": i} for i in range(N)]
    # Create edges for all nodes except the last one so the last node is terminal.
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.50ms -> 27.2μs (16450% faster)


def test_ordering_matters_when_multiple_candidates_present():
    # If two nodes are both terminal candidates, we must return the first one in the nodes list.
    node0 = {"id": "0"}
    node1 = {"id": "1"}
    node2 = {"id": "2"}
    # Make node0 and node2 terminal by only giving an outgoing edge from node1.
    nodes = [node0, node1, node2]
    edges = [{"source": "1"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 666ns (87.7% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from src.algorithms.graph import find_last_node


def test_basic_single_node_no_edges():
    """Test with a single node and no edges - should return that node."""
    nodes = [{"id": 1, "name": "node1"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 542ns (92.3% faster)


def test_basic_linear_flow_two_nodes():
    """Test with two nodes in a linear flow (one edge)."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 792ns (89.4% faster)


def test_basic_linear_flow_three_nodes():
    """Test with three nodes in a linear chain."""
    nodes = [
        {"id": 1, "name": "start"},
        {"id": 2, "name": "middle"},
        {"id": 3, "name": "end"},
    ]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 875ns (119% faster)


def test_basic_branching_flow():
    """Test with a branching flow where node 1 has multiple outgoing edges."""
    nodes = [
        {"id": 1, "name": "start"},
        {"id": 2, "name": "branch1"},
        {"id": 3, "name": "branch2"},
    ]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 833ns (85.0% faster)


def test_basic_diamond_flow():
    """Test with a diamond-shaped flow structure."""
    nodes = [
        {"id": 1, "name": "start"},
        {"id": 2, "name": "left"},
        {"id": 3, "name": "right"},
        {"id": 4, "name": "end"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 3},
        {"source": 2, "target": 4},
        {"source": 3, "target": 4},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.29μs -> 1.00μs (129% faster)


def test_edge_empty_nodes_list():
    """Test with an empty nodes list - should return None."""
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 667ns -> 583ns (14.4% faster)


def test_edge_multiple_nodes_no_edges():
    """Test with multiple nodes but no edges - all are leaf nodes."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 542ns (92.3% faster)


def test_edge_single_self_loop():
    """Test with a single node that has a self-loop edge."""
    nodes = [{"id": 1, "name": "node1"}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 709ns (70.4% faster)


def test_edge_cycle_in_flow():
    """Test with a cycle that includes all nodes."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 875ns (119% faster)


def test_edge_cycle_with_leaf():
    """Test with a cycle plus a leaf node outside the cycle."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "leaf"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 1},
        {"source": 1, "target": 3},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.92μs -> 917ns (109% faster)


def test_edge_duplicate_edges():
    """Test with duplicate edges between same nodes."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 833ns (90.0% faster)


def test_edge_edge_with_missing_target():
    """Test edge cases where edge might not reference existing nodes."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 999}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 750ns (100% faster)


def test_edge_node_with_extra_attributes():
    """Test nodes with various attributes beyond id and name."""
    nodes = [
        {"id": 1, "name": "start", "type": "input", "data": {"x": 10}},
        {"id": 2, "name": "end", "type": "output", "data": {"x": 20}, "extra": "value"},
    ]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 750ns (100% faster)


def test_edge_node_ids_as_strings():
    """Test with string node IDs instead of integers."""
    nodes = [{"id": "a", "name": "node_a"}, {"id": "b", "name": "node_b"}]
    edges = [{"source": "a", "target": "b"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 791ns (100% faster)


def test_edge_node_ids_mixed_types():
    """Test with mixed type node IDs (integers and strings)."""
    nodes = [{"id": 1, "name": "node1"}, {"id": "b", "name": "node_b"}]
    edges = [{"source": 1, "target": "b"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 750ns (117% faster)


def test_edge_complex_graph_structure():
    """Test with a complex graph with multiple paths and convergence."""
    nodes = [
        {"id": 1, "name": "start"},
        {"id": 2, "name": "path1_a"},
        {"id": 3, "name": "path1_b"},
        {"id": 4, "name": "path2_a"},
        {"id": 5, "name": "end"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 4},
        {"source": 2, "target": 3},
        {"source": 3, "target": 5},
        {"source": 4, "target": 5},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.75μs -> 1.08μs (154% faster)


def test_edge_multiple_leaf_nodes_order():
    """Test that with multiple leaf nodes, the first one is returned."""
    nodes = [
        {"id": 2, "name": "leaf2"},
        {"id": 1, "name": "node1"},
        {"id": 3, "name": "leaf3"},
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 667ns (81.3% faster)


def test_edge_nodes_with_none_values():
    """Test nodes that contain None values in fields."""
    nodes = [{"id": 1, "name": None}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 750ns (100% faster)


def test_large_scale_linear_chain():
    """Test with a large linear chain of 500 nodes."""
    num_nodes = 500
    nodes = [{"id": i, "name": f"node_{i}"} for i in range(num_nodes)]
    edges = [{"source": i, "target": i + 1} for i in range(num_nodes - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.49ms -> 27.1μs (16474% faster)


def test_large_scale_star_topology():
    """Test with a star topology: one central node connected to 200 leaf nodes."""
    nodes = [{"id": 0, "name": "center"}]
    for i in range(1, 201):
        nodes.append({"id": i, "name": f"leaf_{i}"})

    edges = [{"source": 0, "target": i} for i in range(1, 201)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 9.12μs -> 4.50μs (103% faster)


def test_large_scale_complete_binary_tree():
    """Test with a complete binary tree structure (127 nodes, 6 levels)."""
    # Create a binary tree with depth 6
    nodes = [{"id": i, "name": f"node_{i}"} for i in range(127)]
    edges = []

    # Create parent-child relationships
    for i in range(63):
        left_child = 2 * i + 1
        right_child = 2 * i + 2
        edges.append({"source": i, "target": left_child})
        edges.append({"source": i, "target": right_child})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 159μs -> 5.71μs (2699% faster)


def test_large_scale_wide_branching():
    """Test with wide branching: 1 root node branches to 300 children."""
    nodes = [{"id": 0, "name": "root"}]
    for i in range(1, 301):
        nodes.append({"id": i, "name": f"child_{i}"})

    edges = [{"source": 0, "target": i} for i in range(1, 301)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 12.7μs -> 6.29μs (102% faster)


def test_large_scale_grid_graph():
    """Test with a 30x30 grid graph structure."""
    size = 30
    nodes = []
    node_id = 0

    # Create grid nodes
    for i in range(size):
        for j in range(size):
            nodes.append({"id": node_id, "name": f"node_{i}_{j}"})
            node_id += 1

    edges = []
    # Connect horizontally
    for i in range(size):
        for j in range(size - 1):
            source = i * size + j
            target = i * size + (j + 1)
            edges.append({"source": source, "target": target})

    # Connect vertically
    for i in range(size - 1):
        for j in range(size):
            source = i * size + j
            target = (i + 1) * size + j
            edges.append({"source": source, "target": target})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 15.0ms -> 63.5μs (23607% faster)


def test_large_scale_multiple_disconnected_components():
    """Test with multiple disconnected graph components."""
    nodes = []
    edges = []

    # Create 10 disconnected linear chains
    node_id = 0
    for chain in range(10):
        chain_length = 50
        for i in range(chain_length):
            nodes.append({"id": node_id, "name": f"chain_{chain}_node_{i}"})
            if i > 0:
                edges.append({"source": node_id - 1, "target": node_id})
            node_id += 1

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 73.5μs -> 14.6μs (404% faster)


def test_large_scale_density_edge_case():
    """Test with a dense graph (many edges relative to nodes)."""
    num_nodes = 100
    nodes = [{"id": i, "name": f"node_{i}"} for i in range(num_nodes)]
    edges = []

    # Create edges from each node to several forward nodes
    for i in range(num_nodes):
        for j in range(i + 1, min(i + 6, num_nodes)):
            edges.append({"source": i, "target": j})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 874μs -> 14.2μs (6038% faster)


def test_large_scale_sparse_edges():
    """Test with a sparse graph (few edges for many nodes)."""
    num_nodes = 500
    nodes = [{"id": i, "name": f"node_{i}"} for i in range(num_nodes)]
    # Create a very sparse set of edges
    edges = [
        {"source": 0, "target": 100},
        {"source": 100, "target": 200},
        {"source": 200, "target": 300},
    ]

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.67μs -> 959ns (73.8% faster)


def test_large_scale_performance():
    """Test performance with 250 nodes and 500 edges."""
    num_nodes = 250
    nodes = [{"id": i, "name": f"node_{i}"} for i in range(num_nodes)]

    # Create a random-like structure without cycles
    edges = []
    for i in range(num_nodes // 2):
        # Each node connects to 2 other nodes
        target1 = (i + 1) % num_nodes
        target2 = (i + 2) % num_nodes
        if target1 != target2:
            edges.append({"source": i, "target": target1})
            edges.append({"source": i, "target": target2})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 577μs -> 10.3μs (5516% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mldg7yhf and push.

Codeflash Static Badge

The optimized code achieves a **131x speedup** (13,073% faster) by fundamentally restructuring the algorithm to eliminate nested iteration. The original implementation uses a nested generator expression that results in O(n×m) complexity, where for each node it checks all edges. The optimized version reduces this to O(n+m) by pre-computing a set of source IDs.

**Key optimization techniques:**

1. **Set-based lookup instead of nested iteration**: The original code calls `all(e["source"] != n["id"] for e in edges)` for each node, which means for every node, it iterates through all edges. The optimized version builds `sources = {e["source"] for e in edges}` once upfront, creating a hash set that enables O(1) membership checks via `n["id"] not in sources`.

2. **Early return path for empty edges**: When there are no edges, the optimized code immediately returns the first node without any ID lookups, matching the original's lazy evaluation behavior while being more explicit and faster.

3. **Direct iteration over explicit loop**: Replacing the generator expression with a straightforward for-loop eliminates the overhead of generator machinery and makes the control flow more efficient.

**Performance across test cases:**

The optimization shows dramatic improvements across all scenarios:
- **Small graphs** (2-5 nodes): 71-129% faster
- **Medium graphs** (100-300 nodes): 100-6,038% faster  
- **Large graphs** (500+ nodes): 16,450-23,607% faster

The speedup scales particularly well with graph size. For example:
- `test_large_scale_flow_find_last_node_performance` (500 nodes): 4.50ms → 27.2μs (16,450% faster)
- `test_large_scale_grid_graph` (900 nodes): 15.0ms → 63.5μs (23,607% faster)
- `test_large_scale_density_edge_case` (100 nodes, dense): 874μs → 14.2μs (6,038% faster)

**Why this matters:**

This optimization is particularly valuable for graph algorithms that need to identify terminal nodes in flows or DAGs. The O(n+m) complexity means performance remains predictable even as graph size grows, whereas the original O(n×m) approach degrades rapidly with larger edge counts. The optimization maintains correctness across all edge cases including empty inputs, cycles, and type mismatches.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 February 8, 2026 07:55
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 8, 2026
@KRRT7 KRRT7 closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant