⚡️ Speed up function find_last_node by 13,461%#281
Open
codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
Open
⚡️ Speed up function find_last_node by 13,461%#281codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
find_last_node by 13,461%#281codeflash-ai[bot] wants to merge 1 commit intooptimizefrom
Conversation
The optimized code achieves a **135x speedup** (from 43.3ms to 320μs) by fundamentally changing the algorithmic complexity from O(N×M) to O(N+M), where N is the number of nodes and M is the number of edges. **Key Optimization:** The original code uses nested iteration: for each node, it scans through *all* edges to check if that node appears as a source. This results in O(N×M) comparisons - catastrophic for larger graphs. The optimized version pre-builds a set of all source IDs in O(M) time, then checks each node's membership in O(1) time, yielding O(N+M) total complexity. The set lookup (`n["id"] not in sources`) is orders of magnitude faster than repeatedly iterating through the edges list. **Why This Works:** 1. **Set-based membership testing**: Python's hash-based sets provide O(1) average-case lookup versus O(M) linear search through edges 2. **Single-pass edge traversal**: Instead of scanning edges N times (once per node), we scan them once to build the set 3. **Early termination preserved**: The function still returns immediately upon finding the first terminal node **Performance Characteristics:** - **Small graphs** (few nodes/edges): 47-95% faster due to reduced overhead - **Linear chains** (N nodes, N-1 edges): 2,840-37,284% faster - the difference grows dramatically with scale - **Dense graphs**: 2,054-5,541% faster when many edges exist - **Sparse graphs** (many terminal nodes): Returns quickly as the first terminal is found early The optimization includes a fallback branch for single-pass iterators (preserving original semantics), though the fast path handles typical use cases with re-iterable collections like lists, which is why the test results show such dramatic improvements. This optimization is especially valuable when `find_last_node` is called repeatedly or on graphs with hundreds of nodes/edges, transforming an O(N×M) bottleneck into a nearly linear-time operation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 13,461% (134.61x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
43.3 milliseconds→320 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 135x speedup (from 43.3ms to 320μs) by fundamentally changing the algorithmic complexity from O(N×M) to O(N+M), where N is the number of nodes and M is the number of edges.
Key Optimization:
The original code uses nested iteration: for each node, it scans through all edges to check if that node appears as a source. This results in O(N×M) comparisons - catastrophic for larger graphs.
The optimized version pre-builds a set of all source IDs in O(M) time, then checks each node's membership in O(1) time, yielding O(N+M) total complexity. The set lookup (
n["id"] not in sources) is orders of magnitude faster than repeatedly iterating through the edges list.Why This Works:
Performance Characteristics:
The optimization includes a fallback branch for single-pass iterators (preserving original semantics), though the fast path handles typical use cases with re-iterable collections like lists, which is why the test results show such dramatic improvements.
This optimization is especially valuable when
find_last_nodeis called repeatedly or on graphs with hundreds of nodes/edges, transforming an O(N×M) bottleneck into a nearly linear-time operation.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mlrz2na9and push.