Skip to content

Comments

⚡️ Speed up function find_last_node by 16,136%#277

Closed
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
codeflash/optimize-find_last_node-mlfeu145
Closed

⚡️ Speed up function find_last_node by 16,136%#277
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
codeflash/optimize-find_last_node-mlfeu145

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 9, 2026

📄 16,136% (161.36x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 57.0 milliseconds 351 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 162x speedup (from 57ms to 351μs) by eliminating a nested loop anti-pattern that caused quadratic time complexity.

Key Algorithmic Improvement:

The original implementation uses a nested comprehension: for each node, it scans through all edges to check if that node is a source. This creates O(n × m) complexity where n = number of nodes and m = number of edges.

The optimized version preprocesses edges once into a set of source IDs (sources = {e["source"] for e in edges}), then performs O(1) set membership checks for each node. This reduces complexity to O(n + m).

Why This Matters:

The test results demonstrate dramatic improvements in real-world scenarios:

  • Linear chains (1000 nodes): 17.5ms → 47.6μs (367x faster) - The original code performed ~1 million edge checks; the optimized version does just 1000 set lookups
  • Dense graphs (100 nodes, 4950 edges): 11.1ms → 85.8μs (129x faster) - Avoided 495,000 edge comparisons
  • Early termination cases: When the sink appears early in the node list, the optimized version benefits immediately from the O(1) lookup without rescanning edges

Performance Characteristics:

The optimization excels when:

  • Edge count is large (more comparisons avoided)
  • The target node appears late in the list or there are many nodes
  • Even with small inputs (10-20 nodes), speedups range from 80-200% due to the efficiency of set operations vs repeated iterations

The single upfront cost of building the set (typically <50μs for 1000 edges) is amortized across all node checks, making this universally faster for any non-trivial graph.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 53 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node


def test_returns_none_when_nodes_empty():
    # When there are no nodes, there is nothing to return; expected result is None.
    nodes = []
    edges = [{"source": 1}, {"source": 2}]
    codeflash_output = find_last_node(nodes, edges)  # 667ns -> 500ns (33.4% faster)


def test_returns_first_node_when_edges_empty():
    # With no edges, every node has no outgoing edges; the function should return
    # the first node in the provided nodes iterable.
    a = {"id": "A"}
    b = {"id": "B"}
    nodes = [a, b]
    edges = []
    # Identity is preserved because the function returns the original node object.
    codeflash_output = find_last_node(nodes, edges)  # 1.04μs -> 542ns (92.3% faster)


def test_returns_unique_sink_in_chain():
    # For a linear chain 0->1->2->3, only node 3 has no outgoing edge; it should be returned.
    nodes = [{"id": i} for i in range(4)]
    edges = [{"source": 0}, {"source": 1}, {"source": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.21μs -> 916ns (141% faster)


def test_returns_first_sink_when_multiple_sinks_exist():
    # If multiple nodes have no outgoing edges, the function should return the first such node
    # according to the nodes iterable order.
    nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
    # Only 'a' is a source, so 'b' and 'c' are sinks; 'b' is first sink in nodes order.
    edges = [{"source": "a"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 708ns (124% faster)


def test_type_strict_equality_between_ids_and_sources():
    # Equality is strict: integer 1 is not equal to string "1".
    nodes = [{"id": 1}, {"id": "1"}]
    edges = [{"source": "1"}]
    # The edge source "1" matches only the second node, so the first node (id: 1) has no outgoing edges.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.29μs -> 667ns (93.6% faster)


def test_missing_source_key_in_edges_raises_keyerror():
    # If an edge dict lacks the "source" key, the function should attempt to access it and raise KeyError.
    nodes = [{"id": 1}]
    edges = [{"src": 1}]  # wrong key name
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 1.42μs -> 750ns (88.9% faster)


def test_missing_id_key_in_node_raises_keyerror_when_edges_nonempty():
    # If a node dict lacks the "id" key and edges is non-empty, comparing e["source"] with n["id"]
    # will access n["id"] and raise KeyError.
    nodes = [{"no_id": "x"}]
    edges = [{"source": "x"}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 1.46μs -> 791ns (84.5% faster)


def test_handles_none_as_id_and_source_matching():
    # If node id is None and there is an edge with source None, that node should be considered a source
    # and thus not returned. If there is no such edge, the None-id node can be returned.
    node_none = {"id": None}
    node_ok = {"id": "ok"}
    nodes = [node_none, node_ok]
    edges = [{"source": None}]
    # node_none has an outgoing edge (source None), so it is not returned; next candidate is node_ok.
    codeflash_output = find_last_node(nodes, edges)  # 1.71μs -> 708ns (141% faster)

    # If edges do not include source None, the node with id None is a sink and should be returned.
    edges_empty = [{"source": "something"}]
    codeflash_output = find_last_node(
        [node_none], edges_empty
    )  # 625ns -> 375ns (66.7% faster)


def test_large_chain_of_1000_nodes_returns_the_last_node():
    # Create 1000 nodes in a simple chain 0->1->2->...->998->999.
    size = 1000
    nodes = [{"id": i} for i in range(size)]
    # edges: sources are 0..998
    edges = [{"source": i} for i in range(size - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 17.5ms -> 47.6μs (36691% faster)


def test_large_with_multiple_sinks_returns_first_sink_quickly():
    # Create many nodes but only the later half have outgoing edges; the first sink should be found
    # near the start and returned without scanning unnecessarily.
    size = 1000
    # Make nodes 0..999
    nodes = [{"id": i} for i in range(size)]
    # Create edges only for nodes starting at 500 (i.e., nodes 500..999 are sources).
    edges = [{"source": i} for i in range(500, size)]
    # The first node with no outgoing edge is node 0, so it must be returned immediately.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 19.0μs -> 11.7μs (62.5% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from src.algorithms.graph import find_last_node


def test_basic_single_node_no_edges():
    """Test with a single node and no edges."""
    nodes = [{"id": 1, "name": "node1"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 541ns (92.4% faster)


def test_basic_single_node_single_edge():
    """Test with a single node that is a source in an edge."""
    nodes = [{"id": 1, "name": "node1"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.17μs -> 625ns (86.7% faster)


def test_basic_two_nodes_linear_flow():
    """Test with two nodes in a linear flow (node1 -> node2)."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 708ns (112% faster)


def test_basic_three_nodes_linear_flow():
    """Test with three nodes in a linear flow (1 -> 2 -> 3)."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 792ns (131% faster)


def test_basic_multiple_sources_one_sink():
    """Test with multiple source nodes converging to one sink node."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 750ns (144% faster)


def test_basic_node_with_string_ids():
    """Test with string IDs instead of integers."""
    nodes = [{"id": "start", "name": "node1"}, {"id": "end", "name": "node2"}]
    edges = [{"source": "start", "target": "end"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 708ns (118% faster)


def test_basic_returns_first_last_node_when_multiple_exist():
    """Test that the function returns the first node in the list that has no outgoing edges."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 708ns (106% faster)


def test_edge_empty_nodes_list():
    """Test with an empty nodes list."""
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 667ns -> 375ns (77.9% faster)


def test_edge_empty_edges_list():
    """Test with empty edges list (all nodes are potential last nodes)."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 541ns (92.6% faster)


def test_edge_both_empty():
    """Test with both nodes and edges empty."""
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 625ns -> 417ns (49.9% faster)


def test_edge_single_node_with_self_loop():
    """Test with a single node that has a self-loop edge."""
    nodes = [{"id": 1, "name": "node1"}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.12μs -> 625ns (80.0% faster)


def test_edge_node_with_extra_fields():
    """Test nodes with additional fields beyond id and name."""
    nodes = [
        {"id": 1, "name": "node1", "type": "start", "color": "red"},
        {"id": 2, "name": "node2", "type": "end", "color": "green"},
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 709ns (106% faster)


def test_edge_edge_with_extra_fields():
    """Test edges with additional fields beyond source and target."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2, "weight": 5, "label": "connection"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 750ns (100% faster)


def test_edge_zero_id():
    """Test with node IDs of 0."""
    nodes = [{"id": 0, "name": "node0"}, {"id": 1, "name": "node1"}]
    edges = [{"source": 0, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 750ns (100% faster)


def test_edge_negative_ids():
    """Test with negative integer IDs."""
    nodes = [{"id": -1, "name": "node_neg1"}, {"id": -2, "name": "node_neg2"}]
    edges = [{"source": -1, "target": -2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 834ns (79.9% faster)


def test_edge_float_ids():
    """Test with float IDs."""
    nodes = [{"id": 1.5, "name": "node1"}, {"id": 2.5, "name": "node2"}]
    edges = [{"source": 1.5, "target": 2.5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 834ns (89.8% faster)


def test_edge_tuple_ids():
    """Test with tuple IDs."""
    nodes = [{"id": (0, 0), "name": "node1"}, {"id": (1, 1), "name": "node2"}]
    edges = [{"source": (0, 0), "target": (1, 1)}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 792ns (105% faster)


def test_edge_all_nodes_are_sources():
    """Test where all nodes are sources in some edge."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 750ns (111% faster)


def test_edge_cyclic_graph():
    """Test with a cyclic graph (1 -> 2 -> 3 -> 1)."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 792ns (137% faster)


def test_edge_disconnected_components():
    """Test with disconnected graph components."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 2, "name": "node2"},
        {"id": 3, "name": "node3"},
        {"id": 4, "name": "node4"},
    ]
    edges = [{"source": 1, "target": 2}, {"source": 3, "target": 4}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 792ns (99.9% faster)


def test_edge_special_characters_in_string_ids():
    """Test with special characters in string IDs."""
    nodes = [{"id": "node@1", "name": "node1"}, {"id": "node#2", "name": "node2"}]
    edges = [{"source": "node@1", "target": "node#2"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 750ns (111% faster)


def test_edge_unicode_ids():
    """Test with unicode string IDs."""
    nodes = [{"id": "节点1", "name": "node1"}, {"id": "节点2", "name": "node2"}]
    edges = [{"source": "节点1", "target": "节点2"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 750ns (106% faster)


def test_edge_empty_string_id():
    """Test with empty string as node ID."""
    nodes = [{"id": "", "name": "node1"}, {"id": "node2", "name": "node2"}]
    edges = [{"source": "", "target": "node2"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 667ns (131% faster)


def test_edge_whitespace_string_id():
    """Test with whitespace string as node ID."""
    nodes = [{"id": " ", "name": "node1"}, {"id": "node2", "name": "node2"}]
    edges = [{"source": " ", "target": "node2"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 708ns (118% faster)


def test_edge_very_large_ids():
    """Test with very large integer IDs."""
    nodes = [{"id": 10**18, "name": "node1"}, {"id": 10**18 + 1, "name": "node2"}]
    edges = [{"source": 10**18, "target": 10**18 + 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 750ns (106% faster)


def test_edge_duplicate_nodes_with_same_id():
    """Test behavior with duplicate nodes having the same ID."""
    nodes = [
        {"id": 1, "name": "node1"},
        {"id": 1, "name": "node1_duplicate"},
        {"id": 2, "name": "node2"},
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 791ns (105% faster)


def test_edge_node_missing_id_field():
    """Test with a node missing the 'id' field (should cause KeyError)."""
    nodes = [{"name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"source": 2, "target": 1}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 1.58μs -> 792ns (99.9% faster)


def test_edge_edge_missing_source_field():
    """Test with an edge missing the 'source' field (should cause KeyError)."""
    nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
    edges = [{"target": 2}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 1.46μs -> 709ns (106% faster)


def test_large_scale_100_nodes_linear_chain():
    """Test with 100 nodes in a linear chain."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(100)]
    edges = [{"source": i, "target": i + 1} for i in range(99)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 201μs -> 6.38μs (3063% faster)


def test_large_scale_1000_nodes_linear_chain():
    """Test with 1000 nodes in a linear chain."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(1000)]
    edges = [{"source": i, "target": i + 1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 17.6ms -> 48.0μs (36627% faster)


def test_large_scale_1000_nodes_star_topology():
    """Test with 1000 nodes converging to a single sink (star topology)."""
    sink_id = 0
    nodes = [{"id": sink_id, "name": f"node{sink_id}"}]
    nodes.extend([{"id": i, "name": f"node{i}"} for i in range(1, 1001)])
    edges = [{"source": i, "target": sink_id} for i in range(1, 1001)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 36.0μs -> 19.6μs (83.6% faster)


def test_large_scale_many_sources_one_sink():
    """Test with 500 source nodes all pointing to one sink."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(501)]
    sink_id = 500
    edges = [{"source": i, "target": sink_id} for i in range(500)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.41ms -> 25.4μs (17289% faster)


def test_large_scale_branching_tree_structure():
    """Test with a binary tree structure (powers of 2 nodes)."""
    # Create a binary tree with 127 nodes (7 levels, 2^7 - 1 nodes)
    nodes = [{"id": i, "name": f"node{i}"} for i in range(127)]
    edges = []
    for i in range(63):  # Internal nodes
        left_child = 2 * i + 1
        right_child = 2 * i + 2
        edges.append({"source": i, "target": left_child})
        edges.append({"source": i, "target": right_child})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 159μs -> 5.58μs (2764% faster)


def test_large_scale_diamond_topology_repeated():
    """Test with repeated diamond topology patterns."""
    # Create 10 diamond patterns in sequence
    nodes = [{"id": i, "name": f"node{i}"} for i in range(40)]
    edges = []
    for d in range(10):
        offset = d * 4
        top = offset
        left = offset + 1
        right = offset + 2
        bottom = offset + 3

        edges.append({"source": top, "target": left})
        edges.append({"source": top, "target": right})
        edges.append({"source": left, "target": bottom})
        edges.append({"source": right, "target": bottom})

        if d < 9:
            edges.append({"source": bottom, "target": offset + 4})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 45.8μs -> 3.42μs (1241% faster)


def test_large_scale_sparse_graph_100_nodes():
    """Test with a sparse graph of 100 nodes with few edges."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(100)]
    # Only connect a few nodes, leaving most as potential last nodes
    edges = [{"source": 0, "target": 50}, {"source": 25, "target": 75}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 833ns (95.1% faster)


def test_large_scale_dense_graph_100_nodes():
    """Test with a dense graph of 100 nodes."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(100)]
    edges = []
    # Create edges from every node to every other node with higher ID
    for i in range(100):
        for j in range(i + 1, 100):
            edges.append({"source": i, "target": j})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 11.1ms -> 85.8μs (12867% faster)


def test_large_scale_500_nodes_multiple_sinks():
    """Test with 500 nodes with multiple disconnected components each having a sink."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(500)]
    edges = []

    # Create 5 disconnected linear chains of 100 nodes each
    for chain in range(5):
        offset = chain * 100
        for i in range(99):
            edges.append({"source": offset + i, "target": offset + i + 1})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 213μs -> 15.1μs (1318% faster)


def test_large_scale_long_chain_with_lateral_edges():
    """Test with a long linear chain plus additional lateral edges."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(200)]
    edges = []

    # Main linear chain
    for i in range(199):
        edges.append({"source": i, "target": i + 1})

    # Add lateral edges from even nodes to nearby nodes
    for i in range(0, 198, 2):
        if i + 2 < 200:
            edges.append({"source": i, "target": i + 2})

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 748μs -> 12.3μs (5990% faster)


def test_large_scale_nodes_with_many_fields():
    """Test with 100 nodes, each having 10 additional fields."""
    nodes = [
        {
            "id": i,
            "name": f"node{i}",
            "field1": f"value1_{i}",
            "field2": f"value2_{i}",
            "field3": f"value3_{i}",
            "field4": f"value4_{i}",
            "field5": f"value5_{i}",
            "field6": f"value6_{i}",
            "field7": f"value7_{i}",
            "field8": f"value8_{i}",
            "field9": f"value9_{i}",
            "field10": f"value10_{i}",
        }
        for i in range(100)
    ]
    edges = [{"source": i, "target": i + 1} for i in range(99)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 204μs -> 6.92μs (2862% faster)


def test_large_scale_edges_with_many_fields():
    """Test with 100 nodes and edges having 8 additional fields each."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(100)]
    edges = [
        {
            "source": i,
            "target": i + 1,
            "weight": i * 10,
            "label": f"edge_{i}",
            "color": "red" if i % 2 == 0 else "blue",
            "style": "solid",
            "width": i % 5,
            "data": {"x": i},
            "extra": f"extra_{i}",
        }
        for i in range(99)
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 204μs -> 6.38μs (3106% faster)


def test_large_scale_worst_case_last_node_is_first():
    """Test scenario where the last node is first in the list."""
    # Create a graph where node 0 is the only one without outgoing edges
    nodes = [{"id": 0, "name": "node0"}]
    nodes.extend([{"id": i, "name": f"node{i}"} for i in range(1, 100)])

    edges = [{"source": i, "target": i - 1} for i in range(1, 100)]

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 5.08μs -> 3.17μs (60.5% faster)


def test_large_scale_worst_case_last_node_is_last():
    """Test scenario where the last node is last in the list (linear chain)."""
    nodes = [{"id": i, "name": f"node{i}"} for i in range(500)]
    edges = [{"source": i, "target": i + 1} for i in range(499)]

    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.47ms -> 25.9μs (17148% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mlfeu145 and push.

Codeflash Static Badge

The optimized code achieves a **162x speedup** (from 57ms to 351μs) by eliminating a nested loop anti-pattern that caused quadratic time complexity.

**Key Algorithmic Improvement:**

The original implementation uses a nested comprehension: for each node, it scans through *all* edges to check if that node is a source. This creates O(n × m) complexity where n = number of nodes and m = number of edges.

The optimized version **preprocesses edges once** into a set of source IDs (`sources = {e["source"] for e in edges}`), then performs O(1) set membership checks for each node. This reduces complexity to O(n + m).

**Why This Matters:**

The test results demonstrate dramatic improvements in real-world scenarios:
- **Linear chains** (1000 nodes): 17.5ms → 47.6μs (**367x faster**) - The original code performed ~1 million edge checks; the optimized version does just 1000 set lookups
- **Dense graphs** (100 nodes, 4950 edges): 11.1ms → 85.8μs (**129x faster**) - Avoided 495,000 edge comparisons
- **Early termination cases**: When the sink appears early in the node list, the optimized version benefits immediately from the O(1) lookup without rescanning edges

**Performance Characteristics:**

The optimization excels when:
- Edge count is large (more comparisons avoided)
- The target node appears late in the list or there are many nodes
- Even with small inputs (10-20 nodes), speedups range from 80-200% due to the efficiency of set operations vs repeated iterations

The single upfront cost of building the set (typically <50μs for 1000 edges) is amortized across all node checks, making this universally faster for any non-trivial graph.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 February 9, 2026 16:52
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 9, 2026
@KRRT7 KRRT7 closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant