Skip to content

Comments

⚡️ Speed up function extract_init_stub_from_class by 70% in PR #1524 (fixes-for-core-unstructured-experimental)#1529

Merged
KRRT7 merged 2 commits intofixes-for-core-unstructured-experimentalfrom
codeflash/optimize-pr1524-2026-02-18T14.38.26
Feb 18, 2026
Merged

⚡️ Speed up function extract_init_stub_from_class by 70% in PR #1524 (fixes-for-core-unstructured-experimental)#1529
KRRT7 merged 2 commits intofixes-for-core-unstructured-experimentalfrom
codeflash/optimize-pr1524-2026-02-18T14.38.26

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 18, 2026

⚡️ This pull request contains optimizations for PR #1524

If you approve this dependent PR, these changes will be merged into the original PR branch fixes-for-core-unstructured-experimental.

This PR will be automatically closed if the original PR is merged.


📄 70% (0.70x) speedup for extract_init_stub_from_class in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 7.02 milliseconds 4.13 milliseconds (best of 41 runs)

📝 Explanation and details

The optimized code achieves a 70% runtime speedup (from 7.02ms to 4.13ms) through three key improvements:

1. Faster Class Discovery via Deque-Based BFS (Primary Speedup)

The original code uses ast.walk() which recursively traverses the entire AST tree even after finding the target class. The line profiler shows this taking 20.5ms (71% of time).

The optimized version replaces this with an explicit BFS using collections.deque, which stops immediately upon finding the target class. The profiler shows this reduces traversal time to 9.95ms - cutting the search overhead by >50%.

This is especially impactful when:

  • The target class appears early in the module (eliminates unnecessary traversal)
  • The module contains many classes (test shows 7-10% faster on modules with 100-1000 classes)
  • The function is called frequently (shown by the 108% speedup on 1000 repeated calls)

2. Explicit Loops Replace Generator Overhead

The original code uses any() with a generator expression and min() with a generator to check decorators and find minimum line numbers. These create function call and generator overhead.

The optimized version uses explicit for loops with early breaks:

  • Decorator checking: Directly iterates and breaks on first match
  • Min line number: Uses explicit comparison instead of min() generator

The profiler shows decorator processing time reduced from ~1.4ms to ~0.3ms, and min line calculation from 69μs to 28μs.

3. Conditional Flag Pattern for Relevance Checking

Instead of evaluating both conditions in a compound expression, the optimized version uses an is_relevant flag with early exits, reducing redundant checks.

Impact on Workloads

Based on function_references, this function is called from:

  • enrich_testgen_context: Used in test generation workflows where it may process many classes
  • Benchmark tests: Indicates this is in a performance-critical path

The optimization particularly benefits:

  • Large codebases: 89-90% faster on classes with 100+ methods or 50+ properties
  • Repeated calls: 108% faster when called 1000 times in sequence
  • Early matches: Up to 88% faster when target class is found quickly
  • Deep nesting: 57% faster for nested classes

The annotated tests show consistent 50-108% speedups across most scenarios, with minimal gains (6-10%) only when processing very large files where string slicing dominates runtime.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1052 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast  # used to parse source into an AST.Module for the function under test
import textwrap  # used to build neatly-indented multi-line source strings

import pytest  # used for our unit tests
from codeflash.languages.python.context.code_context_extractor import \
    extract_init_stub_from_class

def test_returns_none_when_class_missing():
    # Given a module with a different class name
    src = textwrap.dedent(
        """
        class NotTarget:
            def __init__(self):
                self.x = 1
        """
    )
    # Parse into a real ast.Module (no mocks)
    module_tree = ast.parse(src)
    # When asked for a class name that does not exist
    codeflash_output = extract_init_stub_from_class("Target", src, module_tree); result = codeflash_output # 20.2μs -> 16.2μs (24.4% faster)

def test_extracts_simple_init_method():
    # Basic class with a simple __init__; we verify exact snippet returned
    src = textwrap.dedent(
        """\
        class A:
            def __init__(self, x):
                self.x = x

            def other(self):
                pass
        """
    )
    module_tree = ast.parse(src)
    # Call the function under test
    codeflash_output = extract_init_stub_from_class("A", src, module_tree); result = codeflash_output # 14.1μs -> 7.51μs (88.1% faster)
    # The expected snippet should include the class header and only the init block lines
    expected = "class A:\n" + "    def __init__(self, x):\n" + "        self.x = x"

def test_extracts_decorated_property_name_and_attribute_forms():
    # Two classes demonstrating the two decorator AST patterns:
    # 1) decorator is a Name (property)
    # 2) decorator is an Attribute (something.property)
    src = textwrap.dedent(
        """\
        class PropName:
            @property
            def value(self):
                return 1

        class PropAttr:
            @foo.property
            def value(self):
                return 2
        """
    )
    module_tree = ast.parse(src)
    # For class PropName, the returned snippet should start at the decorator line
    codeflash_output = extract_init_stub_from_class("PropName", src, module_tree); res1 = codeflash_output # 16.3μs -> 7.80μs (108% faster)
    expected1 = "class PropName:\n" + "    @property\n" + "    def value(self):\n" + "        return 1"

    # For class PropAttr, the decorator is an Attribute and should also be captured
    codeflash_output = extract_init_stub_from_class("PropAttr", src, module_tree); res2 = codeflash_output # 10.7μs -> 6.48μs (64.6% faster)
    expected2 = "class PropAttr:\n" + "    @foo.property\n" + "    def value(self):\n" + "        return 2"

def test_empty_module_returns_none():
    # An empty source string should parse to a Module with no classes -> None
    src = ""
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("AnyClass", src, module_tree) # 5.38μs -> 3.05μs (76.6% faster)

def test_class_without_relevant_methods_returns_none():
    # A class that has no __init__, __post_init__ or @property methods should yield None
    src = textwrap.dedent(
        """\
        class Empty:
            x = 1

            def some_method(self):
                return 2
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("Empty", src, module_tree) # 11.4μs -> 5.54μs (105% faster)

def test_multiple_classes_same_name_uses_first_occurrence():
    # Create two top-level classes with the same name; ensure the function picks the first one.
    src = textwrap.dedent(
        """\
        class Dup:
            def __init__(self):
                self.x = 'first'

        # later redefinition of the same class name
        class Dup:
            def __init__(self):
                self.x = 'second'
                self.y = 2
        """
    )
    module_tree = ast.parse(src)
    # The implementation stops at the first matching ClassDef encountered in ast.walk.
    # We expect the snippet from the first (simpler) definition.
    codeflash_output = extract_init_stub_from_class("Dup", src, module_tree); res = codeflash_output # 13.2μs -> 7.68μs (71.3% faster)
    expected = "class Dup:\n" + "    def __init__(self):\n" + "        self.x = 'first'"

def test_decorated_method_with_multiple_decorators_uses_top_decorator_line():
    # Multiple decorators; the snippet should start at the earliest decorator line
    src = textwrap.dedent(
        """\
        class MultiDec:
            @dec_one
            @dec_two
            def some_prop(self):
                return 42
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("MultiDec", src, module_tree); res = codeflash_output # 12.5μs -> 5.88μs (113% faster)
    expected = (
        "class MultiDec:\n"
        + "    @dec_one\n"
        + "    @dec_two\n"
        + "    def some_prop(self):\n"
        + "        return 42"
    )

def test_async_init_is_recognized_and_extracted():
    # An async function named __init__ should be recognized (AsyncFunctionDef is checked)
    src = textwrap.dedent(
        """\
        class AsyncInit:
            async def __init__(self):
                await do_something()
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("AsyncInit", src, module_tree); res = codeflash_output # 11.8μs -> 6.52μs (80.3% faster)
    expected = "class AsyncInit:\n" + "    async def __init__(self):\n" + "        await do_something()"

def test_large_number_of_classes_extracts_target_quickly():
    # Create a module with 1000 classes named C0..C999, each with a simple __init__.
    N = 1000
    lines = []
    for i in range(N):
        # build each class block; keep it small to keep this test fast but large enough to simulate scale
        lines.append(f"class C{i}:")
        lines.append("    def __init__(self):")
        lines.append(f"        self.x = {i}")
        lines.append("")  # blank line between classes
    src = "\n".join(lines)
    module_tree = ast.parse(src)

    # Pick a middle class to ensure traversal through many nodes
    target = "C512"
    codeflash_output = extract_init_stub_from_class(target, src, module_tree); res = codeflash_output # 883μs -> 823μs (7.33% faster)
    # Expected snippet for the target class
    expected = f"class {target}:\n" + "    def __init__(self):\n" + "        self.x = 512"

def test_large_number_of_methods_within_class_extracts_init_correctly():
    # Build a single class with many methods to ensure slicing of lines works across many lines.
    method_count = 1000
    block = ["class ManyMethods:"]
    # put the __init__ somewhere in the middle to ensure indexing is correct
    block.append("    def first(self):")
    block.append("        pass")
    block.append("")
    block.append("    def __init__(self):")
    block.append("        self.value = 123")
    block.append("")
    # add many other methods to increase body size
    for i in range(method_count):
        block.append(f"    def m_{i}(self):")
        block.append("        return None")
        block.append("")
    src = "\n".join(block)
    module_tree = ast.parse(src)

    codeflash_output = extract_init_stub_from_class("ManyMethods", src, module_tree); res = codeflash_output # 528μs -> 277μs (90.5% faster)
    expected = "class ManyMethods:\n" + "    def __init__(self):\n" + "        self.value = 123"
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import ast

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    extract_init_stub_from_class

def test_simple_class_with_init():
    """Test extraction of __init__ from a simple class."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.4μs -> 7.08μs (88.7% faster)

def test_class_with_init_and_post_init():
    """Test extraction when class has both __init__ and __post_init__."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
    
    def __post_init__(self):
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.0μs -> 7.74μs (80.2% faster)

def test_class_with_property_decorator():
    """Test extraction of methods decorated with @property."""
    source = """class MyClass:
    def __init__(self):
        self._x = 1
    
    @property
    def x(self):
        return self._x
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 16.4μs -> 8.38μs (95.3% faster)

def test_class_with_only_property():
    """Test extraction when class has only property methods and no __init__."""
    source = """class MyClass:
    @property
    def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.3μs -> 7.21μs (97.8% faster)

def test_nonexistent_class_returns_none():
    """Test that requesting a nonexistent class returns None."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("NonexistentClass", source, tree); result = codeflash_output # 14.7μs -> 11.7μs (24.8% faster)

def test_class_without_init_or_property_returns_none():
    """Test that class without __init__, __post_init__, or @property returns None."""
    source = """class MyClass:
    def some_method(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.0μs -> 5.09μs (116% faster)

def test_class_name_in_result():
    """Test that the class name appears in the result."""
    source = """class TestClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TestClass", source, tree); result = codeflash_output # 11.6μs -> 6.30μs (84.1% faster)

def test_multiple_properties():
    """Test extraction of class with multiple property decorators."""
    source = """class MyClass:
    @property
    def x(self):
        return self._x
    
    @property
    def y(self):
        return self._y
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 16.4μs -> 8.64μs (90.4% faster)

def test_init_with_parameters():
    """Test extraction of __init__ with multiple parameters."""
    source = """class MyClass:
    def __init__(self, a, b, c=None):
        self.a = a
        self.b = b
        self.c = c
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.0μs -> 6.71μs (79.0% faster)

def test_init_with_multiline_body():
    """Test extraction of __init__ with multiline body."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
        self.y = 2
        self.z = 3
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.54μs (76.6% faster)

def test_attribute_property_decorator():
    """Test extraction with attribute-style property decorator."""
    source = """class MyClass:
    @some_module.property
    def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.2μs -> 7.07μs (101% faster)

def test_init_with_decorator():
    """Test extraction of __init__ that itself has a decorator."""
    source = """class MyClass:
    @some_decorator
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.5μs -> 6.45μs (93.8% faster)

def test_result_format():
    """Test that result follows expected format."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.20μs (81.4% faster)
    lines = result.split('\n')

def test_empty_class_body():
    """Test handling of class with empty body (pass statement)."""
    source = """class MyClass:
    pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 9.52μs -> 4.58μs (108% faster)

def test_class_case_sensitive_name():
    """Test that class name matching is case-sensitive."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("myclass", source, tree); result = codeflash_output # 14.1μs -> 11.2μs (25.7% faster)

def test_nested_class():
    """Test extraction from nested classes."""
    source = """class Outer:
    class Inner:
        def __init__(self):
            self.x = 1
"""
    tree = ast.parse(source)
    # Should find Inner class (ast.walk finds all classes)
    codeflash_output = extract_init_stub_from_class("Inner", source, tree); result = codeflash_output # 13.1μs -> 8.32μs (57.0% faster)

def test_multiple_classes_same_module():
    """Test extraction when module has multiple classes."""
    source = """class FirstClass:
    def some_method(self):
        pass

class SecondClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("SecondClass", source, tree); result = codeflash_output # 13.6μs -> 8.87μs (53.8% faster)

def test_class_with_init_and_other_methods():
    """Test that other methods are excluded from extraction."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
    
    def other_method(self):
        return self.x * 2
    
    def another_method(self):
        return self.x + 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.8μs -> 7.62μs (81.0% faster)

def test_post_init_without_init():
    """Test extraction of __post_init__ when there's no __init__."""
    source = """class MyClass:
    def __post_init__(self):
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.2μs -> 6.32μs (76.7% faster)

def test_async_init():
    """Test extraction of async __init__ if present."""
    source = """class MyClass:
    async def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.4μs -> 6.20μs (83.7% faster)

def test_property_with_setter():
    """Test extraction of property with setter (both have @property-like decorators)."""
    source = """class MyClass:
    @property
    def x(self):
        return self._x
    
    @x.setter
    def x(self, value):
        self._x = value
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 15.1μs -> 7.99μs (89.2% faster)

def test_init_with_string_multiline():
    """Test extraction of __init__ containing multiline strings."""
    source = '''class MyClass:
    def __init__(self):
        """
        This is a docstring
        """
        self.x = 1
'''
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.8μs -> 6.57μs (79.6% faster)

def test_class_with_class_attributes():
    """Test that class attributes (not methods) don't affect extraction."""
    source = """class MyClass:
    class_var = 42
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.67μs (74.0% faster)

def test_multiple_decorators_on_init():
    """Test __init__ with multiple decorators."""
    source = """class MyClass:
    @decorator1
    @decorator2
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.7μs -> 6.96μs (82.7% faster)

def test_class_with_staticmethod():
    """Test that staticmethod is excluded from extraction."""
    source = """class MyClass:
    @staticmethod
    def static_method():
        return 1
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.3μs -> 7.33μs (80.9% faster)

def test_class_with_classmethod():
    """Test that classmethod is excluded from extraction."""
    source = """class MyClass:
    @classmethod
    def class_method(cls):
        return cls()
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.3μs -> 7.30μs (82.0% faster)

def test_init_with_annotations():
    """Test extraction of __init__ with type annotations."""
    source = """class MyClass:
    def __init__(self, x: int, y: str) -> None:
        self.x = x
        self.y = y
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.63μs (74.8% faster)

def test_property_with_annotations():
    """Test extraction of property with return type annotations."""
    source = """class MyClass:
    @property
    def x(self) -> int:
        return self._x
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.9μs -> 6.94μs (101% faster)

def test_whitespace_preservation():
    """Test that indentation and whitespace are preserved in extracted stub."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.30μs (79.3% faster)

def test_special_characters_in_init_body():
    """Test extraction of __init__ with special characters in body."""
    source = r"""class MyClass:
    def __init__(self):
        self.pattern = r"\d+"
        self.escaped = "test\\nvalue"
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.5μs -> 6.50μs (76.6% faster)

def test_init_with_default_none():
    """Test __init__ with None as default parameter."""
    source = """class MyClass:
    def __init__(self, value=None):
        self.value = value
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.50μs (73.7% faster)

def test_init_with_list_default():
    """Test __init__ with list as default parameter."""
    source = """class MyClass:
    def __init__(self, items=[]):
        self.items = items
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.1μs -> 6.20μs (79.2% faster)

def test_property_on_async_method():
    """Test property decorator on async method (if supported by AST)."""
    source = """class MyClass:
    @property
    async def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.0μs -> 6.93μs (102% faster)

def test_init_with_varargs():
    """Test __init__ with *args."""
    source = """class MyClass:
    def __init__(self, *args):
        self.args = args
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.2μs -> 6.05μs (84.3% faster)

def test_init_with_kwargs():
    """Test __init__ with **kwargs."""
    source = """class MyClass:
    def __init__(self, **kwargs):
        self.kwargs = kwargs
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.4μs -> 6.28μs (81.0% faster)

def test_init_with_args_and_kwargs():
    """Test __init__ with both *args and **kwargs."""
    source = """class MyClass:
    def __init__(self, a, *args, **kwargs):
        self.a = a
        self.args = args
        self.kwargs = kwargs
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.29μs (80.0% faster)

def test_large_class_with_many_methods():
    """Test extraction from class with many non-relevant methods."""
    # Create a class with 100 regular methods and 1 __init__
    methods = "\n    ".join([f"def method_{i}(self):\n        return {i}" for i in range(100)])
    source = f"""class MyClass:
    def __init__(self):
        self.x = 1
    
    {methods}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 66.8μs -> 35.4μs (89.0% faster)

def test_class_with_many_properties():
    """Test extraction from class with many property methods."""
    # Create multiple property decorators
    properties = "\n    ".join([
        f"@property\n    def prop_{i}(self):\n        return {i}"
        for i in range(50)
    ])
    source = f"""class MyClass:
    {properties}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 92.5μs -> 48.9μs (89.2% faster)

def test_module_with_many_classes():
    """Test extraction when module contains many classes."""
    # Create 100 classes, extract from one in the middle
    classes = "\n".join([
        f"""class Class{i}:
    def method(self):
        pass
"""
        for i in range(50)
    ])
    target_class = """class TargetClass:
    def __init__(self):
        self.x = 1
"""
    rest_classes = "\n".join([
        f"""class Class{i+50}:
    def method(self):
        pass
"""
        for i in range(50)
    ])
    source = f"{classes}\n{target_class}\n{rest_classes}"
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TargetClass", source, tree); result = codeflash_output # 101μs -> 91.7μs (10.5% faster)

def test_init_with_long_body():
    """Test extraction of __init__ with very long body (1000 lines)."""
    body_lines = "\n        ".join([f"self.attr_{i} = {i}" for i in range(1000)])
    source = f"""class MyClass:
    def __init__(self):
        {body_lines}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 63.0μs -> 57.3μs (10.0% faster)

def test_extraction_performance_many_calls():
    """Test extraction performance with multiple repeated calls."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    # Call extraction 1000 times
    results = []
    for _ in range(1000):
        codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 4.28ms -> 2.06ms (108% faster)
        results.append(result)

def test_large_source_file():
    """Test extraction from a very large source file."""
    # Create a large source file with 500 classes
    classes = []
    for i in range(500):
        if i == 250:  # Target class at middle
            classes.append("""class TargetClass:
    def __init__(self, param1, param2):
        self.p1 = param1
        self.p2 = param2
""")
        else:
            classes.append(f"""class Class{i}:
    def method(self):
        return {i}
""")
    source = "\n".join(classes)
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TargetClass", source, tree); result = codeflash_output # 438μs -> 410μs (6.70% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1524-2026-02-18T14.38.26 and push.

Codeflash Static Badge

The optimized code achieves a **70% runtime speedup** (from 7.02ms to 4.13ms) through three key improvements:

## 1. **Faster Class Discovery via Deque-Based BFS (Primary Speedup)**
The original code uses `ast.walk()` which recursively traverses the entire AST tree even after finding the target class. The line profiler shows this taking 20.5ms (71% of time).

The optimized version replaces this with an explicit BFS using `collections.deque`, which stops immediately upon finding the target class. The profiler shows this reduces traversal time to 9.95ms - **cutting the search overhead by >50%**.

This is especially impactful when:
- The target class appears early in the module (eliminates unnecessary traversal)
- The module contains many classes (test shows 7-10% faster on modules with 100-1000 classes)
- The function is called frequently (shown by the 108% speedup on 1000 repeated calls)

## 2. **Explicit Loops Replace Generator Overhead**
The original code uses `any()` with a generator expression and `min()` with a generator to check decorators and find minimum line numbers. These create function call and generator overhead.

The optimized version uses explicit `for` loops with early breaks:
- Decorator checking: Directly iterates and breaks on first match
- Min line number: Uses explicit comparison instead of `min()` generator

The profiler shows decorator processing time reduced from ~1.4ms to ~0.3ms, and min line calculation from 69μs to 28μs.

## 3. **Conditional Flag Pattern for Relevance Checking**
Instead of evaluating both conditions in a compound expression, the optimized version uses an `is_relevant` flag with early exits, reducing redundant checks.

## Impact on Workloads
Based on `function_references`, this function is called from:
- `enrich_testgen_context`: Used in test generation workflows where it may process many classes
- Benchmark tests: Indicates this is in a performance-critical path

The optimization particularly benefits:
- **Large codebases**: 89-90% faster on classes with 100+ methods or 50+ properties
- **Repeated calls**: 108% faster when called 1000 times in sequence
- **Early matches**: Up to 88% faster when target class is found quickly
- **Deep nesting**: 57% faster for nested classes

The annotated tests show consistent 50-108% speedups across most scenarios, with minimal gains (6-10%) only when processing very large files where string slicing dominates runtime.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 18, 2026
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 18, 2026
@KRRT7 KRRT7 merged commit 2364096 into fixes-for-core-unstructured-experimental Feb 18, 2026
26 of 27 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1524-2026-02-18T14.38.26 branch February 18, 2026 14:44
@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

PR Review Summary

Prek Checks

Fixed — 2 issues auto-fixed and committed:

  • I001 (unsorted-imports): Sorted deque, defaultdictdefaultdict, deque
  • PLR1730 (if-stmt-min-max): Replaced manual if d.lineno < m: m = d.lineno with m = min(m, d.lineno)

Additionally fixed 6 mypy type errors in extract_init_stub_from_class:

  • Typed BFS deque as deque[ast.AST] (was inferred as deque[Module])
  • Renamed loop variable to avoid shadowing causing false attribute errors

All prek and mypy checks pass after fixes.

Code Review

No critical bugs or security vulnerabilities found.

Notable observations:

  • The Path import was moved to TYPE_CHECKING block — this is safe because the file uses from __future__ import annotations
  • build_testgen_context has a new optional function_to_optimize parameter — backward compatible
  • All removed functions (safe_relative_to, is_project_path, extract_init_stub, resolve_transitive_type_deps, etc.) are only used internally in this file; no external callers found
  • The enrich_testgen_context behavior changed: removed runtime introspection of external libraries (Steps 2 & 3 using importlib/inspect), replaced with AST-based approach via extract_parameter_type_constructors. This is a deliberate architectural change.

Test Coverage

File Stmts (main) Stmts (PR) Coverage (main) Coverage (PR) Delta
code_context_extractor.py 620 554 85% 91% +6%
test_code_context_extractor.py 1044 1052 98% 98% 0%
Total 1664 1606 93% 95% +2%

Coverage improved from 85% → 91% for the main source file. Tests were updated to match the new API (removed functions replaced with new test cases). 8 pre-existing test failures in test_tracer.py are unrelated to this PR.


Last updated: 2026-02-18T14:57Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant