Skip to content

Comments

⚡️ Speed up function find_last_node by 3,916%#264

Closed
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-find_last_node-mkq33xmo
Closed

⚡️ Speed up function find_last_node by 3,916%#264
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-find_last_node-mkq33xmo

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 22, 2026

📄 3,916% (39.16x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 7.62 milliseconds 190 microseconds (best of 162 runs)

📝 Explanation and details

The optimized code achieves a 39x speedup by replacing the O(N×M) nested iteration pattern with an O(N+M) algorithm using set-based lookups.

Key optimization: The original code uses a generator expression with nested all() that iterates through ALL edges for EVERY node, resulting in quadratic complexity. For each of the N nodes, it checks all M edges to see if any edge has that node as a source - this is O(N×M).

The optimized version builds a set of all source IDs in a single pass through the edges (O(M)), then checks each node against this set in O(1) time (O(N) total). This reduces the overall complexity from O(N×M) to O(N+M).

Performance characteristics based on test results:

  1. Small graphs (few nodes/edges): The optimization shows minor slowdowns (0-48% slower) due to the overhead of set construction and the isinstance check. The original's lazy evaluation is more efficient when early-exit is likely.

  2. Large graphs: Massive speedups are achieved:

    • 500-node chain: 137x faster (4.42ms → 31.9μs)
    • 100-node chain: 21x faster (200μs → 9.08μs)
    • Dense 50-node graph: 47x faster (1.44ms → 29.4μs)
  3. Sparse graphs with early matches: Moderate speedups (23-45%) as the first node often has no outgoing edges, benefiting less from set lookup.

Special handling: The code preserves the original's semantics for edge cases:

  • When edges is empty, it returns the first node without accessing n["id"] (avoiding KeyError on nodes missing the 'id' key)
  • When edges is an Iterator (single-pass), it falls back to the original algorithm to avoid consuming the iterator for set construction

Impact: This optimization is particularly valuable in graph analysis pipelines where find_last_node is called repeatedly on graphs with hundreds of nodes/edges, which appears common based on the test suite's focus on large graph scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 47 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node


def test_single_node_no_edges_returns_that_node():
    # Single node and no edges -> that node is the last node
    nodes = [{"id": 1, "label": "only"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.00μs -> 1.38μs (27.3% slower)


def test_two_nodes_one_edge_from_first_to_second_returns_second():
    # Two nodes with an edge from first -> second. The second has no outgoing edges.
    n1 = {"id": "A"}
    n2 = {"id": "B"}
    nodes = [n1, n2]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.50μs (5.60% faster)


def test_multiple_last_candidates_returns_first_in_nodes_order():
    # If multiple nodes have no outgoing edges, function returns the first such node
    n1 = {"id": "x"}  # has outgoing edge
    n2 = {"id": "y"}  # no outgoing edges
    n3 = {"id": "z"}  # no outgoing edges
    nodes = [n1, n2, n3]
    edges = [{"source": "x", "target": "y"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.42μs (11.7% faster)


def test_nodes_as_tuple_iterable_works_the_same_as_list():
    # nodes can be any iterable; tuples should work as well
    n1 = {"id": 1}
    n2 = {"id": 2}
    nodes = (n1, n2)  # tuple instead of list
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.71μs -> 1.71μs (0.000% faster)


def test_no_nodes_returns_none():
    # Empty nodes iterable -> nothing to return
    nodes = []
    edges = [{"source": "irrelevant"}]
    codeflash_output = find_last_node(nodes, edges)  # 667ns -> 1.17μs (42.8% slower)


def test_no_edges_returns_first_node():
    # If there are no edges, every node has no outgoing edges -> should return the first node
    n1 = {"id": "first"}
    n2 = {"id": "second"}
    nodes = [n1, n2]
    edges = []
    codeflash_output = find_last_node(nodes, edges)  # 1.04μs -> 1.25μs (16.7% slower)


def test_type_sensitivity_between_ids_and_edge_sources():
    # The function compares equality directly; different types are not equal.
    # If node id is int 1 but edge source is string "1", they are considered different.
    node = {"id": 1}
    nodes = [node]
    edges = [{"source": "1", "target": 2}]
    # Because "1" != 1, the node is considered to have no outgoing edges and should be returned.
    codeflash_output = find_last_node(nodes, edges)  # 1.38μs -> 1.38μs (0.000% faster)


def test_missing_source_key_in_edge_raises_keyerror():
    # If an edge dict lacks the "source" key, the generator will try to access e["source"] and raise KeyError.
    nodes = [{"id": "a"}]
    edges = [{"target": "a"}]  # missing "source"
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 1.67μs -> 1.92μs (13.0% slower)


def test_duplicate_node_ids_behavior_and_edge_exclusion():
    # When multiple nodes share the same id, a single outgoing-edge for that id excludes all nodes with that id.
    shared_id = "dupe"
    n1 = {"id": shared_id, "index": 1}
    n2 = {"id": shared_id, "index": 2}
    n3 = {"id": "other", "index": 3}
    nodes = [n1, n2, n3]
    edges = [{"source": shared_id, "target": "other"}]
    # Both n1 and n2 are considered as having outgoing edges (same id), so the last node should be n3.
    codeflash_output = find_last_node(nodes, edges)  # 1.83μs -> 1.79μs (2.29% faster)


def test_large_chain_structure_last_node_at_end():
    # Create a chain of 500 nodes where each node i has an edge to i+1.
    # The last node (id 499) should be the only node without outgoing edges.
    size = 500  # well under 1000
    nodes = [{"id": i} for i in range(size)]
    edges = [{"source": i, "target": i + 1} for i in range(size - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 4.42ms -> 31.9μs (13761% faster)


def test_large_many_candidates_returns_first_candidate_early_in_list():
    # Create 600 nodes. Attach outgoing edges only to nodes in the latter half.
    # The first node in the first half should be returned since it has no outgoing edges.
    size = 600
    nodes = [{"id": i} for i in range(size)]
    # edges only from nodes 300..599 pointing to something (they have outgoing edges)
    edges = [{"source": i, "target": (i + 1) % size} for i in range(300, size)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 12.2μs -> 9.83μs (23.7% faster)


def test_inputs_not_mutated_by_function():
    # Ensure that find_last_node does not mutate the provided nodes or edges.
    nodes = [{"id": "a"}, {"id": "b"}]
    edges = [{"source": "a", "target": "b"}]
    # Keep copies for comparison
    nodes_copy = [n.copy() for n in nodes]
    edges_copy = [e.copy() for e in edges]
    codeflash_output = find_last_node(nodes, edges)
    _ = codeflash_output  # 1.54μs -> 1.46μs (5.76% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from src.algorithms.graph import find_last_node


def test_single_node_no_edges():
    """Test with a single node and no edges - should return that node."""
    nodes = [{"id": 1, "name": "NodeA"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 1.25μs (16.7% slower)


def test_linear_chain_two_nodes():
    """Test with two nodes in a linear chain - should return the last node."""
    nodes = [{"id": 1, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 1.50μs (0.000% faster)


def test_linear_chain_three_nodes():
    """Test with three nodes in a linear chain - should return the last node."""
    nodes = [
        {"id": 1, "name": "NodeA"},
        {"id": 2, "name": "NodeB"},
        {"id": 3, "name": "NodeC"},
    ]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.50μs (19.5% faster)


def test_multiple_sources_single_sink():
    """Test with multiple nodes having edges pointing to a single sink node."""
    nodes = [
        {"id": 1, "name": "NodeA"},
        {"id": 2, "name": "NodeB"},
        {"id": 3, "name": "NodeC"},
    ]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.58μs (13.2% faster)


def test_node_with_additional_attributes():
    """Test that the function returns the complete node object with all attributes."""
    nodes = [
        {"id": 1, "name": "NodeA", "value": 100, "type": "start"},
        {"id": 2, "name": "NodeB", "value": 200, "type": "end"},
    ]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.42μs -> 1.46μs (2.81% slower)


def test_first_node_is_last_node():
    """Test when the first node in the list has no outgoing edges."""
    nodes = [{"id": 5, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 2, "target": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 1.38μs (9.09% slower)


def test_empty_nodes_list():
    """Test with an empty nodes list - should return None."""
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 667ns -> 1.29μs (48.3% slower)


def test_empty_edges_list():
    """Test when all nodes have no outgoing edges - should return the first node."""
    nodes = [{"id": 1, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 1.25μs (16.7% slower)


def test_all_nodes_have_outgoing_edges():
    """Test when every node has at least one outgoing edge - should return None (no sink)."""
    nodes = [
        {"id": 1, "name": "NodeA"},
        {"id": 2, "name": "NodeB"},
        {"id": 3, "name": "NodeC"},
    ]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},  # Creates a cycle
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.71μs (9.78% faster)


def test_node_with_only_id_attribute():
    """Test with minimal node structure containing only id."""
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 1.42μs (5.93% faster)


def test_string_node_ids():
    """Test with string-type node IDs instead of integers."""
    nodes = [{"id": "nodeA", "name": "First"}, {"id": "nodeB", "name": "Last"}]
    edges = [{"source": "nodeA", "target": "nodeB"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.58μs (0.000% faster)


def test_mixed_type_node_ids():
    """Test with different types of node IDs (string and integer)."""
    nodes = [{"id": 1, "name": "First"}, {"id": "last", "name": "Second"}]
    edges = [{"source": 1, "target": "last"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.58μs -> 1.58μs (0.000% faster)


def test_node_id_zero():
    """Test with zero as a node ID (edge case for truthy/falsy checks)."""
    nodes = [{"id": 0, "name": "NodeA"}, {"id": 1, "name": "NodeB"}]
    edges = [{"source": 0, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.50μs (2.80% slower)


def test_node_id_negative():
    """Test with negative node IDs."""
    nodes = [{"id": -5, "name": "NodeA"}, {"id": -1, "name": "NodeB"}]
    edges = [{"source": -5, "target": -1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.42μs (2.89% faster)


def test_self_loop_edge():
    """Test with a self-loop edge where a node points to itself."""
    nodes = [{"id": 1, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.42μs -> 1.42μs (0.071% faster)


def test_duplicate_edges():
    """Test with duplicate edges (same source and target)."""
    nodes = [{"id": 1, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 1.50μs (2.80% faster)


def test_edge_with_extra_attributes():
    """Test that edges with extra attributes don't affect the result."""
    nodes = [{"id": 1, "name": "NodeA"}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 1, "target": 2, "weight": 5, "label": "connects"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.42μs (2.89% faster)


def test_unreachable_nodes():
    """Test with nodes that have no incoming or outgoing edges."""
    nodes = [
        {"id": 1, "name": "NodeA"},
        {"id": 2, "name": "NodeB"},
        {"id": 3, "name": "NodeC"},
    ]
    edges = [{"source": 1, "target": 2}]
    # Node 3 is unreachable, but it has no outgoing edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.38μs (6.04% faster)


def test_nodes_with_none_values():
    """Test with nodes containing None as attribute values."""
    nodes = [{"id": 1, "name": None}, {"id": 2, "name": "NodeB"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.46μs -> 1.33μs (9.30% faster)


def test_empty_string_as_node_id():
    """Test with empty string as a node ID."""
    nodes = [{"id": "", "name": "Empty"}, {"id": "last", "name": "Last"}]
    edges = [{"source": "", "target": "last"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 1.50μs (2.80% faster)


def test_boolean_node_id():
    """Test with boolean values as node IDs."""
    nodes = [{"id": True, "name": "NodeA"}, {"id": False, "name": "NodeB"}]
    edges = [{"source": True, "target": False}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 1.46μs (2.88% faster)


def test_float_node_id():
    """Test with float values as node IDs."""
    nodes = [{"id": 1.5, "name": "NodeA"}, {"id": 2.5, "name": "NodeB"}]
    edges = [{"source": 1.5, "target": 2.5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 2.12μs (23.5% slower)


def test_multiple_last_nodes_returns_first():
    """Test scenario where multiple nodes have no outgoing edges."""
    nodes = [
        {"id": 1, "name": "NodeA"},
        {"id": 2, "name": "NodeB"},
        {"id": 3, "name": "NodeC"},
    ]
    edges = [{"source": 1, "target": 2}]
    # Both Node 2 and Node 3 have no outgoing edges
    # The function should return the first one found
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.42μs -> 1.50μs (5.53% slower)


def test_large_linear_chain():
    """Test with a large linear chain of nodes (100 nodes)."""
    num_nodes = 100
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = [{"source": i, "target": i + 1} for i in range(num_nodes - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 200μs -> 9.08μs (2102% faster)


def test_large_fan_in_topology():
    """Test with large fan-in topology (many sources to one sink)."""
    num_sources = 100
    sink_id = 1000
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_sources)] + [
        {"id": sink_id, "name": "Sink"}
    ]
    edges = [{"source": i, "target": sink_id} for i in range(num_sources)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 201μs -> 9.04μs (2131% faster)


def test_large_complete_graph_no_sink():
    """Test with a large complete-like graph where every node has outgoing edges."""
    num_nodes = 50
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = []
    for i in range(num_nodes):
        for j in range(i + 1, num_nodes):
            edges.append({"source": i, "target": j})
    # In a complete graph, only node with highest ID has no outgoing edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.44ms -> 29.4μs (4788% faster)


def test_large_sparse_graph():
    """Test with a large sparse graph (many nodes, few edges)."""
    num_nodes = 500
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = [
        {"source": 0, "target": 100},
        {"source": 100, "target": 250},
        {"source": 250, "target": 499},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.71μs -> 1.75μs (2.40% slower)


def test_large_nodes_with_complex_attributes():
    """Test with many nodes, each containing multiple attributes."""
    num_nodes = 100
    nodes = []
    for i in range(num_nodes):
        nodes.append(
            {
                "id": i,
                "name": f"Node{i}",
                "value": i * 100,
                "type": "process",
                "metadata": {"level": i % 10, "status": "active"},
            }
        )
    edges = [{"source": i, "target": i + 1} for i in range(num_nodes - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 198μs -> 8.42μs (2260% faster)


def test_large_edges_with_many_connections():
    """Test with a large number of edges connecting a moderate number of nodes."""
    num_nodes = 50
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = []
    # Create edges: each node i connects to nodes i+1 through i+5 (cyclic)
    for i in range(num_nodes):
        for j in range(1, 6):
            target = (i + j) % num_nodes
            if target != i:  # Avoid self-loops in this test
                edges.append({"source": i, "target": target})
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 225μs -> 9.83μs (2192% faster)


def test_wide_fan_out_topology():
    """Test with a single source node connecting to many target nodes."""
    num_targets = 100
    nodes = [{"id": 0, "name": "Source"}] + [
        {"id": i, "name": f"Node{i}"} for i in range(1, num_targets + 1)
    ]
    edges = [{"source": 0, "target": i} for i in range(1, num_targets + 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 5.38μs -> 3.71μs (45.0% faster)


def test_deep_nested_chain():
    """Test with a very deep linear chain (200 nodes)."""
    num_nodes = 200
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = [{"source": i, "target": i + 1} for i in range(num_nodes - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 736μs -> 13.7μs (5289% faster)


def test_large_number_of_unreachable_nodes():
    """Test with many unreachable nodes (disconnected from main graph)."""
    connected_nodes = 20
    unreachable_nodes = 80
    nodes = [
        {"id": i, "name": f"Node{i}"}
        for i in range(connected_nodes + unreachable_nodes)
    ]
    edges = [{"source": i, "target": i + 1} for i in range(connected_nodes - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 12.1μs -> 3.25μs (272% faster)


def test_graph_with_mixed_edge_types():
    """Test with edges containing various attribute combinations."""
    num_nodes = 75
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(num_nodes)]
    edges = []
    for i in range(num_nodes - 1):
        if i % 3 == 0:
            edges.append({"source": i, "target": i + 1, "weight": 1})
        elif i % 3 == 1:
            edges.append({"source": i, "target": i + 1, "weight": 2, "label": "edge"})
        else:
            edges.append(
                {
                    "source": i,
                    "target": i + 1,
                    "weight": 3,
                    "label": "edge",
                    "color": "red",
                }
            )
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 117μs -> 6.79μs (1627% faster)


def test_large_ids_numeric_range():
    """Test with very large numeric IDs."""
    nodes = [
        {"id": 1000000, "name": "NodeA"},
        {"id": 2000000, "name": "NodeB"},
        {"id": 3000000, "name": "NodeC"},
    ]
    edges = [
        {"source": 1000000, "target": 2000000},
        {"source": 2000000, "target": 3000000},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.92μs (2.14% slower)


def test_complex_dag_structure():
    """Test with a directed acyclic graph (DAG) with multiple paths."""
    nodes = [{"id": i, "name": f"Node{i}"} for i in range(10)]
    edges = [
        {"source": 0, "target": 1},
        {"source": 0, "target": 2},
        {"source": 1, "target": 3},
        {"source": 2, "target": 3},
        {"source": 1, "target": 4},
        {"source": 3, "target": 5},
        {"source": 4, "target": 5},
    ]
    # Nodes 6, 7, 8, 9 and 5 have no outgoing edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 3.17μs -> 2.17μs (46.1% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mkq33xmo and push.

Codeflash Static Badge

The optimized code achieves a **39x speedup** by replacing the O(N×M) nested iteration pattern with an O(N+M) algorithm using set-based lookups.

**Key optimization:** The original code uses a generator expression with nested `all()` that iterates through ALL edges for EVERY node, resulting in quadratic complexity. For each of the N nodes, it checks all M edges to see if any edge has that node as a source - this is O(N×M).

The optimized version builds a set of all source IDs in a single pass through the edges (O(M)), then checks each node against this set in O(1) time (O(N) total). This reduces the overall complexity from O(N×M) to O(N+M).

**Performance characteristics based on test results:**

1. **Small graphs (few nodes/edges):** The optimization shows minor slowdowns (0-48% slower) due to the overhead of set construction and the isinstance check. The original's lazy evaluation is more efficient when early-exit is likely.

2. **Large graphs:** Massive speedups are achieved:
   - 500-node chain: **137x faster** (4.42ms → 31.9μs)
   - 100-node chain: **21x faster** (200μs → 9.08μs)  
   - Dense 50-node graph: **47x faster** (1.44ms → 29.4μs)

3. **Sparse graphs with early matches:** Moderate speedups (23-45%) as the first node often has no outgoing edges, benefiting less from set lookup.

**Special handling:** The code preserves the original's semantics for edge cases:
- When `edges` is empty, it returns the first node without accessing `n["id"]` (avoiding KeyError on nodes missing the 'id' key)
- When `edges` is an Iterator (single-pass), it falls back to the original algorithm to avoid consuming the iterator for set construction

**Impact:** This optimization is particularly valuable in graph analysis pipelines where `find_last_node` is called repeatedly on graphs with hundreds of nodes/edges, which appears common based on the test suite's focus on large graph scenarios.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 January 22, 2026 23:30
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
@KRRT7 KRRT7 closed this Jan 25, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-find_last_node-mkq33xmo branch January 25, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant