⚡️ Speed up function `find_last_node` by 9,219% by codeflash-ai[bot] · Pull Request #269 · codeflash-ai/optimize-me

codeflash-ai · 2026-02-08T06:58:35Z

📄 9,219% (92.19x) speedup for `find_last_node` in `src/algorithms/graph.py`

⏱️ Runtime : 27.1 milliseconds → 291 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 92x speedup (9218%) by eliminating redundant comparisons through preprocessing. The key optimization transforms an O(n*m) nested loop into O(n+m) linear operations.

Primary Optimization:
Instead of checking all(e["source"] != n["id"] for e in edges) for every node (which iterates through all edges for each node), the optimized code:

Pre-builds a set of source IDs: sources = {e["source"] for e in edges} (O(m) operation)
Uses set membership testing: n["id"] not in sources (O(1) lookup per node)

This is particularly effective for graphs with many edges, as demonstrated by the test results:

Large-scale test (1000 nodes, 999 edges): 18.0ms → 89.5μs (20000% faster)
Sparse edges test (500 nodes, 250 edges): 11.4μs → 8.25μs (38.4% faster)
Small graphs show minor regression due to preprocessing overhead: empty cases are 36-48% slower

Why Set-Based Lookup is Faster:

Set construction and membership testing in Python are highly optimized hash table operations
The nested loop approach requires m comparisons per node, totaling n*m comparisons
Set approach does m hash insertions once, then n constant-time lookups

Error Handling Preservation:
The optimized code carefully preserves the original's lazy evaluation semantics:

Falls back to the original nested check if sources are unhashable (TypeError during set construction)
Falls back for individual nodes if accessing n["id"] fails (KeyError/TypeError)
This ensures the function only raises exceptions when the original code would, maintaining backward compatibility

Trade-offs:
The optimization excels when edges are numerous or nodes are iterated extensively. Small graphs (< 10 elements) see slight slowdowns (5-21%) due to set construction overhead, but this is negligible compared to the massive gains on realistic workloads where this function would typically be called on larger graph structures.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 42 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node


def test_returns_node_with_no_outgoing_edges():
    # Basic scenario: one node has an outgoing edge, the other doesn't.
    nodes = [{"id": "n1"}, {"id": "n2"}]  # two nodes in order
    edges = [{"source": "n1"}]  # n1 has an outgoing edge, n2 does not
    # Expect the function to return the node object that has no outgoing edges.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.67μs -> 1.58μs (5.31% faster)


def test_no_edges_returns_first_node():
    # If there are no edges, every node has "no outgoing edges", so the first node
    # in the provided nodes list should be returned.
    nodes = [{"id": "first"}, {"id": "second"}]
    edges = []  # empty edge list
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.04μs -> 1.25μs (16.6% slower)


def test_empty_nodes_returns_none():
    # If there are no nodes at all, function should return None regardless of edges.
    nodes = []
    edges = [{"source": "anything"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 625ns -> 1.21μs (48.3% slower)


def test_nodes_with_duplicate_ids_returns_first_occurrence():
    # When multiple nodes share the same id and none of them have outgoing edges,
    # the implementation should return the first matching node.
    node_a = {"id": 1}
    node_b = {"id": 1}  # duplicate id
    nodes = [node_a, node_b]
    edges = [{"source": 2}]  # unrelated edge; both nodes have no outgoing edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 1.38μs (12.1% slower)


def test_edges_missing_source_raises_key_error():
    # If an edge dict is missing the 'source' key, accessing e["source"] should
    # raise a KeyError during evaluation.
    nodes = [{"id": "a"}]
    edges = [{}]  # missing 'source'
    with pytest.raises(KeyError):
        codeflash_output = find_last_node(nodes, edges)
        _ = codeflash_output  # 1.62μs -> 1.17μs (39.2% faster)


def test_edges_reference_unknown_nodes_returns_first_node_without_outgoing():
    # If edges reference node ids that are not present among nodes, that does not
    # create outgoing links for existing nodes. The first node without outgoing
    # edges should be returned.
    nodes = [{"id": "x"}, {"id": "y"}]
    edges = [{"source": "unknown"}]  # references no node in `nodes`
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.33μs -> 1.50μs (11.1% slower)


def test_multiple_candidates_returns_first_candidate():
    # If more than one node has no outgoing edges, the function should return
    # the first such node in the order provided.
    nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
    edges = [{"source": "b"}]  # only 'b' has an outgoing edge
    # Both 'a' and 'c' are valid last nodes; the implementation should pick 'a'.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 1.38μs (9.09% slower)


def test_large_scale_identifies_last_node_in_1000_nodes():
    # Build 1000 nodes with integer ids 0..999.
    # Make edges such that nodes 0..998 have outgoing edges (source == their id).
    # Node 999 will be the only node without outgoing edges and should be returned.
    size = 1000  # at the upper bound allowed by the prompt
    nodes = [{"id": i} for i in range(size)]
    # Create edges for sources 0 through 998 (so node 999 has no outgoing)
    edges = [{"source": i} for i in range(size - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.0ms -> 89.5μs (20000% faster)


def test_large_scale_with_sparse_edges_returns_first_non_source_node():
    # Another large-ish case: create 500 nodes and make only some nodes sources.
    # Ensure that the first node that is not a source is returned.
    size = 500
    nodes = [{"id": i} for i in range(size)]
    # Make every even node a source (0,2,4,... become outgoing), odd nodes are candidates.
    edges = [{"source": i} for i in range(0, size, 2)]
    # The first node not in edges' sources is node with id 1 (the first odd).
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 11.4μs -> 8.25μs (38.4% faster)


def test_ordering_sensitivity_and_identity():
    # Two nodes without outgoing edges; ordering must determine which is returned.
    node_first = {"id": "first_candidate"}
    node_second = {"id": "second_candidate"}
    nodes = [node_first, node_second]
    # Provide an edge from some other id so neither node has outgoing edges.
    edges = [{"source": "other"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 1.42μs (14.6% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest
from src.algorithms.graph import find_last_node


class TestFindLastNodeBasic:
    """Basic test cases for find_last_node function."""

    def test_single_node_no_edges(self):
        """Test with a single node and no edges."""
        nodes = [{"id": 1, "name": "node1"}]
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.04μs -> 1.33μs (21.9% slower)

    def test_single_node_with_outgoing_edge(self):
        """Test with a single node that has an outgoing edge."""
        nodes = [{"id": 1, "name": "node1"}]
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.21μs -> 1.46μs (17.1% slower)

    def test_two_nodes_linear_chain(self):
        """Test with two nodes in a linear chain."""
        nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.54μs -> 1.46μs (5.76% faster)

    def test_three_nodes_linear_chain(self):
        """Test with three nodes in a linear chain."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": 2, "name": "node2"},
            {"id": 3, "name": "node3"},
        ]
        edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 1.62μs (15.4% faster)

    def test_multiple_nodes_single_last_node(self):
        """Test identifying the single last node among multiple nodes."""
        nodes = [
            {"id": "A", "label": "Start"},
            {"id": "B", "label": "Middle"},
            {"id": "C", "label": "End"},
        ]
        edges = [
            {"source": "A", "target": "B"},
            {"source": "B", "target": "C"},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.04μs -> 1.67μs (22.5% faster)


class TestFindLastNodeEdgeCases:
    """Edge case test cases for find_last_node function."""

    def test_empty_nodes_list(self):
        """Test with an empty nodes list."""
        nodes = []
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 666ns -> 1.04μs (36.1% slower)

    def test_empty_nodes_with_edges(self):
        """Test with empty nodes list but edges present."""
        nodes = []
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 625ns -> 1.12μs (44.4% slower)

    def test_all_nodes_have_outgoing_edges(self):
        """Test where all nodes have outgoing edges (cyclic graph)."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": 2, "name": "node2"},
            {"id": 3, "name": "node3"},
        ]
        edges = [
            {"source": 1, "target": 2},
            {"source": 2, "target": 3},
            {"source": 3, "target": 1},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.96μs -> 1.75μs (11.9% faster)

    def test_multiple_last_nodes_returns_first(self):
        """Test when multiple nodes have no outgoing edges (should return first)."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": 2, "name": "node2"},
            {"id": 3, "name": "node3"},
        ]
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.54μs -> 1.62μs (5.11% slower)

    def test_node_with_self_loop(self):
        """Test with a node that has a self-loop edge."""
        nodes = [{"id": 1, "name": "node1"}, {"id": 2, "name": "node2"}]
        edges = [{"source": 1, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.50μs -> 1.46μs (2.81% faster)

    def test_node_with_self_loop_only_node(self):
        """Test with a single node that has a self-loop."""
        nodes = [{"id": 1, "name": "node1"}]
        edges = [{"source": 1, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.12μs -> 1.42μs (20.6% slower)

    def test_nodes_with_extra_attributes(self):
        """Test nodes with various additional attributes."""
        nodes = [
            {"id": 1, "name": "start", "type": "input", "value": 100},
            {"id": 2, "name": "end", "type": "output", "value": 200},
        ]
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.50μs -> 1.50μs (0.000% faster)

    def test_string_node_ids(self):
        """Test with string-based node IDs."""
        nodes = [
            {"id": "start_node", "label": "Start"},
            {"id": "end_node", "label": "End"},
        ]
        edges = [{"source": "start_node", "target": "end_node"}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.58μs -> 1.54μs (2.79% faster)

    def test_mixed_type_node_ids(self):
        """Test with mixed type node IDs (strings and integers)."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": "node2", "name": "node2"},
            {"id": 3, "name": "node3"},
        ]
        edges = [{"source": 1, "target": "node2"}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.62μs -> 1.46μs (11.4% faster)

    def test_diamond_graph_structure(self):
        """Test with a diamond-shaped graph (multiple paths to one node)."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": 2, "name": "node2"},
            {"id": 3, "name": "node3"},
            {"id": 4, "name": "node4"},
        ]
        edges = [
            {"source": 1, "target": 2},
            {"source": 1, "target": 3},
            {"source": 2, "target": 4},
            {"source": 3, "target": 4},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.29μs -> 1.83μs (24.9% faster)

    def test_multiple_disconnected_components(self):
        """Test with multiple disconnected graph components."""
        nodes = [
            {"id": 1, "name": "node1"},
            {"id": 2, "name": "node2"},
            {"id": 3, "name": "node3"},
            {"id": 4, "name": "node4"},
        ]
        edges = [
            {"source": 1, "target": 2},
            {"source": 3, "target": 4},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.62μs -> 1.58μs (2.65% faster)

To edit these changes git checkout codeflash/optimize-find_last_node-mlde6cg3 and push.

The optimized code achieves a **92x speedup (9218%)** by eliminating redundant comparisons through preprocessing. The key optimization transforms an O(n*m) nested loop into O(n+m) linear operations. **Primary Optimization:** Instead of checking `all(e["source"] != n["id"] for e in edges)` for every node (which iterates through all edges for each node), the optimized code: 1. Pre-builds a set of source IDs: `sources = {e["source"] for e in edges}` (O(m) operation) 2. Uses set membership testing: `n["id"] not in sources` (O(1) lookup per node) This is particularly effective for graphs with many edges, as demonstrated by the test results: - **Large-scale test (1000 nodes, 999 edges):** 18.0ms → 89.5μs (**20000% faster**) - **Sparse edges test (500 nodes, 250 edges):** 11.4μs → 8.25μs (38.4% faster) - Small graphs show minor regression due to preprocessing overhead: empty cases are 36-48% slower **Why Set-Based Lookup is Faster:** - Set construction and membership testing in Python are highly optimized hash table operations - The nested loop approach requires m comparisons per node, totaling n*m comparisons - Set approach does m hash insertions once, then n constant-time lookups **Error Handling Preservation:** The optimized code carefully preserves the original's lazy evaluation semantics: - Falls back to the original nested check if sources are unhashable (TypeError during set construction) - Falls back for individual nodes if accessing `n["id"]` fails (KeyError/TypeError) - This ensures the function only raises exceptions when the original code would, maintaining backward compatibility **Trade-offs:** The optimization excels when edges are numerous or nodes are iterated extensively. Small graphs (< 10 elements) see slight slowdowns (5-21%) due to set construction overhead, but this is negligible compared to the massive gains on realistic workloads where this function would typically be called on larger graph structures.

codeflash-ai bot requested a review from KRRT7 February 8, 2026 06:58

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Feb 8, 2026

KRRT7 closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

⚡️ Speed up function `find_last_node` by 9,219%#269

⚡️ Speed up function `find_last_node` by 9,219%#269
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
codeflash/optimize-find_last_node-mlde6cg3

codeflash-ai bot commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

codeflash-ai bot commented Feb 8, 2026

📄 9,219% (92.19x) speedup for find_last_node in src/algorithms/graph.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 9,219% (92.19x) speedup for `find_last_node` in `src/algorithms/graph.py`