Skip to content

Comments

⚡️ Speed up function find_last_node by 19,810%#286

Open
codeflash-ai[bot] wants to merge 1 commit intopython-onlyfrom
codeflash/optimize-find_last_node-mlurhqw4
Open

⚡️ Speed up function find_last_node by 19,810%#286
codeflash-ai[bot] wants to merge 1 commit intopython-onlyfrom
codeflash/optimize-find_last_node-mlurhqw4

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

📄 19,810% (198.10x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 130 milliseconds 655 microseconds (best of 155 runs)

📝 Explanation and details

The optimized code achieves a 198x runtime speedup (from 130ms to 655μs) by replacing an O(N×M) nested iteration with an O(N+M) set-based lookup algorithm.

Key Optimization:

The original implementation uses a nested generator expression that checks all(e["source"] != n["id"] for e in edges) for each node. This means for every node, it must iterate through all edges to verify none of them have that node as a source. With N nodes and M edges, this becomes O(N×M) comparisons.

The optimized version first detects whether edges is re-iterable (like a list) or a single-pass iterator (like a generator). For re-iterable collections—the common case—it precomputes a set of all source IDs in O(M) time using sources = {e["source"] for e in edges}. Then it checks each node's ID against this set in O(1) time, making the overall complexity O(N+M).

Why This Works:

Python sets provide O(1) average-case membership testing via hash lookups. By converting the M edge sources into a set once, each of the N node checks becomes a fast hash lookup instead of iterating through all M edges. The performance gain is dramatic when both N and M are large.

Test Case Performance:

The speedup is most pronounced in tests with large inputs:

  • test_large_chain_of_1000_nodes_returns_last_node: 34.8ms → 114μs (303x faster)
  • test_very_large_graph_1000_nodes_complete_chain: 26.7ms → 46.4μs (573x faster)
  • test_large_linear_chain_500_nodes: 6.80ms → 25.0μs (270x faster)

Even small inputs benefit substantially (2-4x speedup) due to reduced constant-factor overhead from avoiding nested iteration.

Iterator Preservation:

The optimization correctly preserves behavior for single-pass iterators by detecting them with if it is edges and falling back to sequential consumption that mirrors the original's iterator advancement pattern, ensuring functional correctness across all input types.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1054 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from copy import deepcopy  # to check immutability of inputs

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

def test_single_node_no_edges_returns_that_node():
    # one node and no edges -> that node should be returned
    node = {"id": "A"}
    nodes = [node]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.29μs -> 667ns (244% faster)

def test_multiple_nodes_edges_make_last_unique():
    # three nodes where only the last node has no outgoing edges
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    # A and B are sources in edges, C is not -> C should be returned
    edges = [{"source": "A", "target": "X"}, {"source": "B", "target": "Y"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 4.04μs -> 1.00μs (304% faster)

def test_edges_empty_returns_first_node():
    # when there are no edges, the first node in nodes should be returned
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.00μs -> 708ns (182% faster)

def test_multiple_last_candidates_returns_first_in_order():
    # If multiple nodes have no outgoing edges, the function should return the first such node encountered
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    # Only A is a source; B and C are both candidates; B comes before C -> B expected
    edges = [{"source": "A", "target": "Z"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.04μs -> 834ns (265% faster)

def test_empty_nodes_returns_none():
    # no nodes -> there is nothing to return, expect None
    nodes = []
    edges = [{"source": "anything", "target": "nothing"}]
    codeflash_output = find_last_node(nodes, edges) # 1.33μs -> 625ns (113% faster)

def test_all_nodes_are_sources_returns_none():
    # every node appears as a source in at least one edge -> no last node -> None
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A"}, {"source": "B"}, {"source": "A"}]
    codeflash_output = find_last_node(nodes, edges) # 3.00μs -> 959ns (213% faster)

def test_edge_missing_source_key_raises_keyerror():
    # If an edge dict lacks 'source', accessing e['source'] will raise KeyError
    nodes = [{"id": "A"}]
    edges = [{"target": "A"}]  # missing 'source'
    with pytest.raises(KeyError):
        find_last_node(nodes, edges) # 4.00μs -> 1.08μs (269% faster)

def test_edges_reference_unknown_ids_still_finds_last_node():
    # Edges may reference sources not present in nodes; they should not prevent finding a node without outgoing edges
    nodes = [{"id": "X"}, {"id": "Y"}]
    edges = [{"source": "Z"}]  # Z not in nodes
    # Since none of the nodes are sources in edges, the first node should be returned
    codeflash_output = find_last_node(nodes, edges) # 2.67μs -> 833ns (220% faster)

def test_special_character_and_none_ids():
    # Use a mix of id types (empty string, special chars, None) — comparisons should work via equality
    nodes = [{"id": ""}, {"id": "Ω"}, {"id": None}]
    # Make the empty string and Ω be sources so the first candidate is the None id node
    edges = [{"source": ""}, {"source": "Ω"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 4.29μs -> 1.08μs (296% faster)

def test_non_string_ids_and_numeric_ids():
    # numeric ids should be supported equally (they are compared with ==)
    nodes = [{"id": 0}, {"id": 1}, {"id": 2}]
    edges = [{"source": 0}, {"source": 1}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.62μs -> 1.04μs (248% faster)

def test_function_does_not_mutate_inputs():
    # ensure inputs are not modified by the function
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A"}]
    nodes_copy = deepcopy(nodes)
    edges_copy = deepcopy(edges)
    codeflash_output = find_last_node(nodes, edges); _ = codeflash_output # 3.21μs -> 833ns (285% faster)

def test_large_chain_of_1000_nodes_returns_last_node():
    # create 1000 nodes named n0..n999
    nodes = [{"id": f"n{i}"} for i in range(1000)]
    # create edges where every node except the last appears as a source
    edges = [{"source": f"n{i}", "target": f"n{i+1}"} for i in range(999)]
    # The only node without outgoing edges should be the last node n999
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 34.8ms -> 114μs (30352% faster)

def test_repeat_calls_consistent_over_1000_iterations():
    # call the function many times to check determinism and acceptable performance
    nodes = [{"id": f"n{i}"} for i in range(1000)]
    # no edges -> should always return the first node in the same way
    edges = []
    for _ in range(1000):
        codeflash_output = find_last_node(nodes, edges); res = codeflash_output # 871μs -> 214μs (307% faster)

def test_large_number_of_edges_with_duplicates():
    # create many nodes and many edges (some duplicated) to stress the all(...) check
    nodes = [{"id": f"id{i}"} for i in range(1000)]
    # make only id999 not be a source; include many duplicate edges to increase work
    edges = [{"source": f"id{i}"} for i in range(999)] + [{"source": "unknown"}] * 500
    # add some duplicates of the existing sources to increase iteration cost
    edges += [{"source": "id0"}] * 200
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 33.5ms -> 132μs (25131% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from src.algorithms.graph import find_last_node

def test_single_node_no_edges():
    """Test finding last node when there is a single node with no edges."""
    nodes = [{"id": 1, "label": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.33μs -> 625ns (273% faster)

def test_linear_chain_two_nodes():
    """Test finding last node in a simple linear chain of two nodes."""
    nodes = [{"id": 1, "label": "A"}, {"id": 2, "label": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.21μs -> 916ns (250% faster)

def test_linear_chain_three_nodes():
    """Test finding last node in a linear chain of three nodes."""
    nodes = [
        {"id": 1, "label": "Start"},
        {"id": 2, "label": "Middle"},
        {"id": 3, "label": "End"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.58μs -> 1.04μs (244% faster)

def test_branching_graph_one_last_node():
    """Test finding last node in a branching graph that converges to one node."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"},
        {"id": 4, "label": "D"}
    ]
    edges = [
        {"source": 1, "target": 3},
        {"source": 2, "target": 3},
        {"source": 3, "target": 4}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.92μs -> 1.08μs (262% faster)

def test_multiple_last_nodes_returns_first():
    """Test that when multiple nodes have no outgoing edges, the first is returned."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"}
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.75μs -> 833ns (230% faster)

def test_node_with_self_loop_is_not_last():
    """Test that a node with a self-loop is not considered a last node."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"}
    ]
    edges = [
        {"source": 1, "target": 1},
        {"source": 1, "target": 2}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.92μs -> 958ns (204% faster)

def test_complex_graph_structure():
    """Test finding last node in a more complex graph structure."""
    nodes = [
        {"id": "start", "label": "Start"},
        {"id": "process_a", "label": "Process A"},
        {"id": "process_b", "label": "Process B"},
        {"id": "merge", "label": "Merge"},
        {"id": "end", "label": "End"}
    ]
    edges = [
        {"source": "start", "target": "process_a"},
        {"source": "start", "target": "process_b"},
        {"source": "process_a", "target": "merge"},
        {"source": "process_b", "target": "merge"},
        {"source": "merge", "target": "end"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 5.50μs -> 1.42μs (288% faster)

def test_empty_nodes_list():
    """Test behavior when nodes list is empty."""
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 1.21μs -> 583ns (107% faster)

def test_empty_edges_list_multiple_nodes():
    """Test behavior when edges list is empty but nodes exist."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"}
    ]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.04μs -> 708ns (188% faster)

def test_all_nodes_have_outgoing_edges():
    """Test behavior when all nodes have at least one outgoing edge."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1}  # Creates a cycle
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.46μs -> 1.00μs (246% faster)

def test_nodes_with_numeric_id():
    """Test with nodes that have numeric IDs."""
    nodes = [
        {"id": 100, "label": "X"},
        {"id": 200, "label": "Y"},
        {"id": 300, "label": "Z"}
    ]
    edges = [
        {"source": 100, "target": 200},
        {"source": 200, "target": 300}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.33μs -> 1.00μs (233% faster)

def test_nodes_with_string_id():
    """Test with nodes that have string IDs."""
    nodes = [
        {"id": "node_a", "label": "A"},
        {"id": "node_b", "label": "B"},
        {"id": "node_c", "label": "C"}
    ]
    edges = [
        {"source": "node_a", "target": "node_b"},
        {"source": "node_b", "target": "node_c"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.79μs -> 959ns (295% faster)

def test_nodes_with_mixed_id_types():
    """Test with nodes that have mixed numeric and string IDs."""
    nodes = [
        {"id": 1, "label": "First"},
        {"id": "second", "label": "Second"},
        {"id": 3.0, "label": "Third"}
    ]
    edges = [
        {"source": 1, "target": "second"},
        {"source": "second", "target": 3.0}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 4.17μs -> 1.12μs (270% faster)

def test_node_with_extra_attributes():
    """Test that nodes with additional attributes are handled correctly."""
    nodes = [
        {"id": 1, "label": "A", "type": "input", "color": "red"},
        {"id": 2, "label": "B", "type": "process", "weight": 5},
        {"id": 3, "label": "C", "type": "output", "size": "large"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.29μs -> 958ns (244% faster)

def test_edge_with_extra_attributes():
    """Test that edges with additional attributes are handled correctly."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"}
    ]
    edges = [
        {"source": 1, "target": 2, "weight": 10, "label": "connection"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.75μs -> 833ns (230% faster)

def test_large_id_values():
    """Test with nodes that have very large ID values."""
    nodes = [
        {"id": 999999999999, "label": "A"},
        {"id": 1000000000000, "label": "B"}
    ]
    edges = [{"source": 999999999999, "target": 1000000000000}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.92μs -> 917ns (218% faster)

def test_negative_id_values():
    """Test with nodes that have negative ID values."""
    nodes = [
        {"id": -1, "label": "A"},
        {"id": -2, "label": "B"},
        {"id": -3, "label": "C"}
    ]
    edges = [
        {"source": -1, "target": -2},
        {"source": -2, "target": -3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.33μs -> 1.46μs (129% faster)

def test_zero_id_value():
    """Test with a node that has ID zero."""
    nodes = [
        {"id": 0, "label": "Zero"},
        {"id": 1, "label": "One"}
    ]
    edges = [{"source": 0, "target": 1}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.79μs -> 875ns (219% faster)

def test_none_id_value():
    """Test with a node that has None as ID."""
    nodes = [
        {"id": None, "label": "None Node"},
        {"id": 1, "label": "Regular Node"}
    ]
    edges = [{"source": 1, "target": None}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.42μs -> 750ns (222% faster)

def test_boolean_id_value():
    """Test with nodes that have boolean ID values."""
    nodes = [
        {"id": True, "label": "True Node"},
        {"id": False, "label": "False Node"}
    ]
    edges = [{"source": False, "target": True}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.25μs -> 750ns (200% faster)

def test_special_characters_in_labels():
    """Test with nodes that have special characters in labels."""
    nodes = [
        {"id": 1, "label": "Node-A!@#$%"},
        {"id": 2, "label": "Node\nB\t\t"},
        {"id": 3, "label": "Node 'C' \"quoted\""}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.67μs -> 958ns (283% faster)

def test_unicode_characters_in_labels():
    """Test with nodes that have unicode characters in labels."""
    nodes = [
        {"id": 1, "label": "開始"},
        {"id": 2, "label": "処理"},
        {"id": 3, "label": "終了"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.50μs -> 958ns (265% faster)

def test_empty_string_label():
    """Test with nodes that have empty string labels."""
    nodes = [
        {"id": 1, "label": ""},
        {"id": 2, "label": "B"},
        {"id": 3, "label": ""}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.38μs -> 916ns (268% faster)

def test_duplicate_node_ids_in_different_positions():
    """Test behavior when nodes list contains entries with duplicate IDs."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 2, "label": "B_duplicate"}
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.71μs -> 792ns (242% faster)

def test_single_node_with_self_loop():
    """Test a single node that has a self-loop."""
    nodes = [{"id": 1, "label": "A"}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.08μs -> 750ns (178% faster)

def test_single_node_no_self_loop():
    """Test a single node without any edges."""
    nodes = [{"id": 1, "label": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 1.79μs -> 667ns (169% faster)

def test_multiple_edges_from_same_source():
    """Test when one node has multiple outgoing edges."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"},
        {"id": 4, "label": "D"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 3},
        {"source": 1, "target": 4}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.04μs -> 958ns (217% faster)

def test_multiple_edges_to_same_target():
    """Test when multiple edges point to the same target."""
    nodes = [
        {"id": 1, "label": "A"},
        {"id": 2, "label": "B"},
        {"id": 3, "label": "C"}
    ]
    edges = [
        {"source": 1, "target": 3},
        {"source": 2, "target": 3}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 3.38μs -> 917ns (268% faster)

def test_diamond_shaped_graph():
    """Test a diamond-shaped graph structure."""
    nodes = [
        {"id": 1, "label": "Top"},
        {"id": 2, "label": "Left"},
        {"id": 3, "label": "Right"},
        {"id": 4, "label": "Bottom"}
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 3},
        {"source": 2, "target": 4},
        {"source": 3, "target": 4}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 4.21μs -> 1.08μs (288% faster)

def test_large_linear_chain_100_nodes():
    """Test finding last node in a large linear chain of 100 nodes."""
    # Create 100 nodes with numeric IDs
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    # Create edges forming a linear chain
    edges = [{"source": i, "target": i + 1} for i in range(1, 100)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 316μs -> 6.50μs (4765% faster)

def test_large_linear_chain_500_nodes():
    """Test finding last node in a large linear chain of 500 nodes."""
    # Create 500 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 501)]
    # Create edges forming a linear chain
    edges = [{"source": i, "target": i + 1} for i in range(1, 500)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 6.80ms -> 25.0μs (27042% faster)

def test_large_linear_chain_1000_nodes():
    """Test finding last node in a very large linear chain of 1000 nodes."""
    # Create 1000 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 1001)]
    # Create edges forming a linear chain
    edges = [{"source": i, "target": i + 1} for i in range(1, 1000)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 26.1ms -> 46.3μs (56231% faster)

def test_large_branching_graph_100_nodes():
    """Test finding last node in a large branching graph with 100 nodes."""
    # Create a graph where node 1 branches to nodes 2-50, and all of them
    # converge to node 51, which then branches to nodes 52-100
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    edges = []
    # Node 1 connects to nodes 2-50
    for i in range(2, 51):
        edges.append({"source": 1, "target": i})
    # Nodes 2-50 all connect to node 51
    for i in range(2, 51):
        edges.append({"source": i, "target": 51})
    # Node 51 connects to nodes 52-100
    for i in range(52, 101):
        edges.append({"source": 51, "target": i})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 231μs -> 5.88μs (3832% faster)

def test_large_sparse_graph_100_nodes():
    """Test finding last node in a large sparse graph with 100 nodes and few edges."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    # Create a sparse set of edges (only connect certain nodes)
    edges = []
    for i in range(1, 11):
        edges.append({"source": i, "target": i + 50})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 9.75μs -> 1.46μs (568% faster)

def test_large_fully_connected_sources():
    """Test when many nodes have outgoing edges (dense graph)."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    # Nodes 1-99 each connect to node 100
    edges = [{"source": i, "target": 100} for i in range(1, 100)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 313μs -> 6.21μs (4953% faster)

def test_many_nodes_no_edges_returns_first():
    """Test with many nodes and no edges; should return the first node."""
    # Create 500 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 501)]
    # No edges
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.08μs -> 708ns (194% faster)

def test_large_multi_branch_convergence():
    """Test a large graph with multiple branches that converge to a single node."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    edges = []
    # Create 10 branches, each with nodes converging to node 95
    for branch in range(10):
        start = 1 + branch
        for i in range(3):
            current_node = start + i * 10
            next_node = start + (i + 1) * 10 if i < 2 else 95
            edges.append({"source": current_node, "target": next_node})
    # Node 95 connects to node 100
    edges.append({"source": 95, "target": 100})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 39.5μs -> 2.88μs (1275% faster)

def test_large_graph_first_node_elimination():
    """Test that node search correctly eliminates nodes with any outgoing edge."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    edges = []
    # Create a complex edge pattern where many nodes have at least one edge
    for i in range(1, 99):
        edges.append({"source": i, "target": i + 1})
    # Now add an edge from node 99 back to node 1 (creating a cycle)
    edges.append({"source": 99, "target": 1})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 308μs -> 6.25μs (4834% faster)

def test_performance_with_many_edges_from_single_node():
    """Test performance when a single node has many outgoing edges."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    edges = []
    # Node 1 connects to all other nodes
    for i in range(2, 101):
        edges.append({"source": 1, "target": i})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 8.83μs -> 3.08μs (186% faster)

def test_very_large_graph_1000_nodes_complete_chain():
    """Test with 1000 nodes in a complete linear chain."""
    # Create 1000 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1000)]
    # Create a complete linear chain from node 0 to node 999
    edges = [{"source": i, "target": i + 1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 26.7ms -> 46.4μs (57319% faster)

def test_alternating_pattern_large_graph():
    """Test a large graph with alternating connection pattern."""
    # Create 100 nodes
    nodes = [{"id": i, "label": f"Node_{i}"} for i in range(1, 101)]
    edges = []
    # Even nodes connect to the next even node
    for i in range(2, 100, 2):
        edges.append({"source": i, "target": i + 2})
    # Odd nodes (except the last) connect to the next odd node
    for i in range(1, 99, 2):
        edges.append({"source": i, "target": i + 2})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 358μs -> 6.46μs (5456% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mlurhqw4 and push.

Codeflash Static Badge

The optimized code achieves a **198x runtime speedup** (from 130ms to 655μs) by replacing an O(N×M) nested iteration with an O(N+M) set-based lookup algorithm.

**Key Optimization:**

The original implementation uses a nested generator expression that checks `all(e["source"] != n["id"] for e in edges)` for each node. This means for every node, it must iterate through *all* edges to verify none of them have that node as a source. With N nodes and M edges, this becomes O(N×M) comparisons.

The optimized version first detects whether `edges` is re-iterable (like a list) or a single-pass iterator (like a generator). For re-iterable collections—the common case—it **precomputes a set of all source IDs** in O(M) time using `sources = {e["source"] for e in edges}`. Then it checks each node's ID against this set in O(1) time, making the overall complexity O(N+M).

**Why This Works:**

Python sets provide O(1) average-case membership testing via hash lookups. By converting the M edge sources into a set once, each of the N node checks becomes a fast hash lookup instead of iterating through all M edges. The performance gain is dramatic when both N and M are large.

**Test Case Performance:**

The speedup is most pronounced in tests with large inputs:
- `test_large_chain_of_1000_nodes_returns_last_node`: 34.8ms → 114μs (303x faster)
- `test_very_large_graph_1000_nodes_complete_chain`: 26.7ms → 46.4μs (573x faster) 
- `test_large_linear_chain_500_nodes`: 6.80ms → 25.0μs (270x faster)

Even small inputs benefit substantially (2-4x speedup) due to reduced constant-factor overhead from avoiding nested iteration.

**Iterator Preservation:**

The optimization correctly preserves behavior for single-pass iterators by detecting them with `if it is edges` and falling back to sequential consumption that mirrors the original's iterator advancement pattern, ensuring functional correctness across all input types.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 February 20, 2026 10:43
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants