⚡️ Speed up method TreeSitterAnalyzer.has_return_statement by 81% in PR #1561 (add/support_react)#1615
Conversation
This optimization achieves an **80% speedup** (965μs → 534μs) by replacing recursive tree traversal with an iterative stack-based approach and removing unnecessary operations.
## Key Optimizations
**1. Iterative Stack-Based Traversal (Primary Speedup)**
The original `_node_has_return` used recursive calls with Python's call stack, which is expensive due to:
- Function call overhead (frame creation/destruction)
- Parameter passing on each recursive call
- Generator expressions with `any()` creating iterator overhead
The optimized version uses an explicit stack (`stack = [node]`) to traverse the AST iteratively. This eliminates:
- ~2000+ recursive function calls in typical runs (line profiler shows 2037 hits on the recursive version)
- Generator allocation overhead from `any(self._node_has_return(child) for child in node.children)`
**2. Removed Unused `source.encode("utf8")` Call**
The original code encoded the source string to bytes but never used `source_bytes`. This operation cost ~47μs per call (0.6% of total time) and was completely unnecessary.
**3. Performance Characteristics by Test Case**
- **Large bodies (1000+ nodes)**: ~195% faster — iterative approach shines with deep/wide trees by avoiding stack frame overhead
- **Simple cases**: 9-34% faster — reduced overhead even for shallow trees
- **Trade-off cases**: 15-25% slower on trivial 2-3 node trees — stack setup overhead marginally exceeds recursive call cost for extremely small inputs
The optimization is particularly effective for real-world JavaScript/TypeScript code which often contains large function bodies with many statements, where the 195% speedup on large bodies demonstrates the practical value. The minor regression on trivial 2-3 node cases is negligible since production code rarely has such tiny functions, and the overall 80% speedup confirms this optimization benefits typical workloads.
The iterative approach also provides more predictable performance and avoids potential stack overflow issues with extremely deep nesting, making it more robust for production use.
⚡️ Codeflash found optimizations for this PR📄 14% (0.14x) speedup for
|
PR Review SummaryPrek ChecksFixed 1 issue:
mypy: No type errors found. Code ReviewNo critical issues found. The optimization is straightforward and correct:
The logic for handling function-type nodes (traversing only their body children) is preserved identically in the iterative version. Test Coverage
* Overall project coverage: 79% (unchanged by this PR since the modified file has no test coverage on either branch). Last updated: 2026-02-20 |
⚡️ This pull request contains optimizations for PR #1561
If you approve this dependent PR, these changes will be merged into the original PR branch
add/support_react.📄 81% (0.81x) speedup for
TreeSitterAnalyzer.has_return_statementincodeflash/languages/javascript/treesitter_utils.py⏱️ Runtime :
965 microseconds→534 microseconds(best of12runs)📝 Explanation and details
This optimization achieves an 80% speedup (965μs → 534μs) by replacing recursive tree traversal with an iterative stack-based approach and removing unnecessary operations.
Key Optimizations
1. Iterative Stack-Based Traversal (Primary Speedup)
The original
_node_has_returnused recursive calls with Python's call stack, which is expensive due to:any()creating iterator overheadThe optimized version uses an explicit stack (
stack = [node]) to traverse the AST iteratively. This eliminates:any(self._node_has_return(child) for child in node.children)2. Removed Unused
source.encode("utf8")CallThe original code encoded the source string to bytes but never used
source_bytes. This operation cost ~47μs per call (0.6% of total time) and was completely unnecessary.3. Performance Characteristics by Test Case
The optimization is particularly effective for real-world JavaScript/TypeScript code which often contains large function bodies with many statements, where the 195% speedup on large bodies demonstrates the practical value. The minor regression on trivial 2-3 node cases is negligible since production code rarely has such tiny functions, and the overall 80% speedup confirms this optimization benefits typical workloads.
The iterative approach also provides more predictable performance and avoids potential stack overflow issues with extremely deep nesting, making it more robust for production use.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1561-2026-02-20T17.17.41and push.