Skip to content

Comments

⚡️ Speed up method JavaScriptSupport._find_and_extract_body by 12% in PR #1561 (add/support_react)#1610

Open
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
codeflash/optimize-pr1561-2026-02-20T14.01.06
Open

⚡️ Speed up method JavaScriptSupport._find_and_extract_body by 12% in PR #1561 (add/support_react)#1610
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
codeflash/optimize-pr1561-2026-02-20T14.01.06

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1561

If you approve this dependent PR, these changes will be merged into the original PR branch add/support_react.

This PR will be automatically closed if the original PR is merged.


📄 12% (0.12x) speedup for JavaScriptSupport._find_and_extract_body in codeflash/languages/javascript/support.py

⏱️ Runtime : 4.84 milliseconds 4.31 milliseconds (best of 122 runs)

📝 Explanation and details

I replaced the recursive node search and repeated string decodes with an iterative DFS that compares raw byte slices to a single pre-encoded target_bytes. This eliminates many temporary str allocations and function-call overhead from recursion, reducing memory churn and CPU time while preserving exact behavior and return values. I also kept all signatures, comments (updated one to match the iterative approach), and coding style constraints intact.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 104 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 86.7%
🌀 Click to see Generated Regression Tests
from typing import Any

# imports
import pytest  # used for our unit tests
from codeflash.languages.javascript.support import JavaScriptSupport

# Helper test-only classes to simulate tree-sitter Tree/Node structures.
# These are minimal, focused on the attributes/access pattern used by
# JavaScriptSupport._find_and_extract_body. They are defined here only
# to provide controlled, deterministic parse results for the tests.
class _Node:
    def __init__(self, type: str, children: list[" _Node"] | None = None, field_map: dict[str, " _Node"] | None = None, start_byte: int = 0, end_byte: int = 0):
        # Node type string (e.g. "method_definition", "function_declaration", etc.)
        self.type = type
        # List of child nodes (order matters for recursion in tests)
        self.children = children or []
        # Map of field name to node (used by child_by_field_name)
        self._field_map = field_map or {}
        # Byte offsets into the source bytes for this node.
        self.start_byte = start_byte
        self.end_byte = end_byte

    def child_by_field_name(self, name: str) -> Any:
        # Return the node mapped to the given field name, or None.
        return self._field_map.get(name)

class _Tree:
    def __init__(self, root_node: _Node):
        # tree.root_node is used by the function under test
        self.root_node = root_node

class FakeAnalyzer:
    """A minimal analyzer replacement that exposes parse(source_bytes) -> Tree."""

    def __init__(self, tree: _Tree):
        # store the tree that parse() will return
        self._tree = tree

    def parse(self, source_bytes: bytes) -> _Tree:
        # Return the prepared tree regardless of input. Tests set node
        # start/end bytes relative to the source bytes they pass.
        return self._tree

# Utility to compute byte-span for a substring in a bytes object.
def _span_of(sub: str, source_bytes: bytes):
    # find the first occurrence of the substring in the source bytes
    start = source_bytes.find(sub.encode("utf8"))
    if start == -1:
        raise ValueError(f"substring {sub!r} not found in source")
    end = start + len(sub.encode("utf8"))
    return start, end

def test_finds_method_definition_body_basic():
    # Create a simple class with a method named "foo" and ensure the body is returned.
    source = "class C { foo() { return 1; } }"
    source_bytes = source.encode("utf8")

    # compute spans for the name "foo" and the method body "{ return 1; }"
    name_start, name_end = _span_of("foo", source_bytes)
    body_start, body_end = _span_of("{ return 1; }", source_bytes)

    # construct nodes: name node, body node, method_definition node, and root
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    method_node = _Node(
        type="method_definition",
        children=[name_node, body_node],
        field_map={"name": name_node, "body": body_node},
        start_byte=name_start,
        end_byte=body_end,
    )
    root = _Node(type="program", children=[method_node])

    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()  # real instance of the class under test

    # Call the internal function to extract the body of "foo"
    codeflash_output = js._find_and_extract_body(source, "foo", analyzer); result = codeflash_output # 4.24μs -> 4.14μs (2.44% faster)

def test_finds_function_declaration_body_basic():
    # Standard function declaration should be found and its body returned.
    source = "function bar() { console.log('x'); }"
    source_bytes = source.encode("utf8")

    # compute spans for "bar" and its body
    name_start, name_end = _span_of("bar", source_bytes)
    body_start, body_end = _span_of("{ console.log('x'); }", source_bytes)

    # build nodes similar to above but with type "function_declaration"
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(
        type="function_declaration",
        children=[name_node, body_node],
        field_map={"name": name_node, "body": body_node},
        start_byte=name_start,
        end_byte=body_end,
    )
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    codeflash_output = js._find_and_extract_body(source, "bar", analyzer); result = codeflash_output # 3.64μs -> 3.68μs (1.09% slower)

def test_finds_arrow_function_assigned_to_variable():
    # Arrow function assigned to a variable via lexical_declaration should be found.
    source = "const baz = (x) => { return x; }"
    source_bytes = source.encode("utf8")

    # compute spans for variable name "baz" and arrow function body
    name_start, name_end = _span_of("baz", source_bytes)
    body_start, body_end = _span_of("{ return x; }", source_bytes)

    # name node for the variable declarator
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    # body of the arrow function
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    # arrow function node; when the variable declarator's value_node.type == "arrow_function"
    arrow_node = _Node(
        type="arrow_function",
        children=[body_node],
        field_map={"body": body_node},
        start_byte=body_start,
        end_byte=body_end,
    )
    # variable_declarator node that holds name and value
    var_decl = _Node(
        type="variable_declarator",
        children=[name_node, arrow_node],
        field_map={"name": name_node, "value": arrow_node},
    )
    # lexical_declaration containing the variable_declarator
    lexical = _Node(type="lexical_declaration", children=[var_decl])
    root = _Node(type="program", children=[lexical])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    codeflash_output = js._find_and_extract_body(source, "baz", analyzer); result = codeflash_output # 3.93μs -> 4.10μs (4.12% slower)

def test_returns_none_when_function_not_present():
    # If the target function name does not exist, the result must be None.
    source = "function somethingElse() { }"
    # build a simple tree with non-matching function
    source_bytes = source.encode("utf8")
    name_start, name_end = _span_of("somethingElse", source_bytes)
    body_start, body_end = _span_of("{ }", source_bytes)
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(type="function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    # search for an absent name -> expect None
    codeflash_output = js._find_and_extract_body(source, "absent", analyzer) # 3.93μs -> 4.16μs (5.56% slower)

def test_returns_none_when_body_missing():
    # If the function node exists but has no body node, function should return None.
    source = "function lonely() /* no body here */"
    source_bytes = source.encode("utf8")
    name_start, name_end = _span_of("lonely", source_bytes)

    # create a function node with a name but no body and no statement_block child
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    func_node = _Node(type="function_declaration", children=[name_node], field_map={"name": name_node})
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    # Even though the function is present, there's no body to extract -> None
    codeflash_output = js._find_and_extract_body(source, "lonely", analyzer) # 3.28μs -> 3.40μs (3.53% slower)

def test_exact_name_matching_not_substring():
    # Ensure exact name matching: searching for "a" does not match "aa".
    source = "function aa() { return 2; }"
    source_bytes = source.encode("utf8")
    name_start, name_end = _span_of("aa", source_bytes)
    body_start, body_end = _span_of("{ return 2; }", source_bytes)
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(type="function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    # Searching for "a" must not match "aa"
    codeflash_output = js._find_and_extract_body(source, "a", analyzer) # 3.62μs -> 3.93μs (7.89% slower)
    # Searching for "aa" must match
    codeflash_output = js._find_and_extract_body(source, "aa", analyzer) # 2.01μs -> 2.36μs (14.8% slower)

def test_function_name_with_special_characters():
    # Function names can include underscores/dollar signs in JS; ensure handled properly.
    source = "function $weird_name_123() { /* body */ }"
    source_bytes = source.encode("utf8")
    name_start, name_end = _span_of("$weird_name_123", source_bytes)
    body_start, body_end = _span_of("{ /* body */ }", source_bytes)
    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(type="function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    codeflash_output = js._find_and_extract_body(source, "$weird_name_123", analyzer) # 3.38μs -> 3.47μs (2.62% slower)

def test_large_chain_of_nested_nodes_with_target_at_end():
    # Build a deeply nested structure (1000 nodes) where the target function is
    # present at the very end to test recursion/stacking behavior.
    target_name = "deepFunc"
    # Create a body at the end
    body_text = "{ /* deep body */ }"
    # Build a source string that contains a lot of filler and the final function body
    filler = "var x = 0;\n" * 200  # repetitive filler
    source = filler + f"function {target_name}() {body_text}"
    source_bytes = source.encode("utf8")

    # compute spans for name and body in the source
    name_start, name_end = _span_of(target_name, source_bytes)
    body_start, body_end = _span_of(body_text, source_bytes)

    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(type="function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})

    # create 1000 wrapper nodes that do not match any target type; the last wrapper holds the function
    depth = 1000
    current = func_node
    for i in range(depth):
        wrapper = _Node(type="wrapper_node", children=[current])
        current = wrapper
    root = _Node(type="program", children=[current])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    codeflash_output = js._find_and_extract_body(source, target_name, analyzer); result = codeflash_output

def test_large_source_and_indices_slice_correctly():
    # Build a large source (10k characters) and ensure slicing using start/end bytes works.
    large_prefix = "/* big comment */\n" * 500  # ~11k characters combined with subsequent text
    body_text = "{ return 'ok'; }"
    source = large_prefix + "function bigName() " + body_text + "\n" + "/* trailing */"
    source_bytes = source.encode("utf8")

    name_start, name_end = _span_of("bigName", source_bytes)
    body_start, body_end = _span_of(body_text, source_bytes)

    name_node = _Node(type="identifier", start_byte=name_start, end_byte=name_end)
    body_node = _Node(type="statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _Node(type="function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _Node(type="program", children=[func_node])
    tree = _Tree(root)
    analyzer = FakeAnalyzer(tree)

    js = JavaScriptSupport()
    codeflash_output = js._find_and_extract_body(source, "bigName", analyzer); result = codeflash_output # 6.29μs -> 4.21μs (49.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1561-2026-02-20T14.01.06 and push.

Codeflash Static Badge

I replaced the recursive node search and repeated string decodes with an iterative DFS that compares raw byte slices to a single pre-encoded target_bytes. This eliminates many temporary str allocations and function-call overhead from recursion, reducing memory churn and CPU time while preserving exact behavior and return values. I also kept all signatures, comments (updated one to match the iterative approach), and coding style constraints intact.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026

# Recurse into children (stack push)
for child in n.children:
stack.append(child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The conversion from recursion to iterative stack.pop() + stack.extend(n.children) changes traversal order from left-to-right DFS to right-to-left DFS. The original recursive version would find the first (leftmost) matching function in source order, while this iterative version will find a different one if there are duplicate function names.

In practice this is unlikely to cause issues since function names are typically unique within a file, but worth being aware of. If strict left-to-right order is needed, use stack.pop(0) (BFS) or reverse children before pushing: stack.extend(reversed(n.children)).

Comment on lines +1334 to +1335
for child in n.children:
stack.append(child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruff PERF402: This for child in n.children: stack.append(child) loop should use stack.extend(n.children) instead.

Suggested change
for child in n.children:
stack.append(child)
stack.extend(n.children)

@claude
Copy link
Contributor

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

1 issue found and fix staged (pending commit):

  • PERF402 in codeflash/languages/javascript/support.py:1334-1335: for child in n.children: stack.append(child) should be stack.extend(n.children). Fix applied locally but requires commit approval.
  • Ruff format also flagged a trailing blank line, auto-fixed.

Mypy

  • 2 pre-existing arg-type errors in support.py (lines 50, 2493) related to register_language decorator — not introduced by this PR.

Code Review

This PR converts find_function_node from recursive DFS to iterative stack-based traversal and switches from string comparison to bytes comparison.

1 minor concern:

  • Traversal order change: stack.pop() + stack.extend(n.children) gives right-to-left DFS instead of the original left-to-right recursive DFS. If there are duplicate function names, this could return a different match. Low risk in practice since function names are typically unique, but can be fixed with stack.extend(reversed(n.children)) if needed.

No critical bugs, security issues, or breaking API changes found.

Test Coverage

File Stmts Miss Cover
codeflash/languages/javascript/support.py 1058 323 69%

Changed lines (1302-1336): 100% covered — all 25 executable statements in the modified find_function_node function are exercised by existing tests.

8 pre-existing test failures in tests/test_tracer.py (unrelated to this PR — Tracer API mismatch).


Last updated: 2026-02-20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants