⚡️ Speed up function `extract_init_stub_from_class` by 70% in PR #1524 (`fixes-for-core-unstructured-experimental`) by codeflash-ai[bot] · Pull Request #1529 · codeflash-ai/codeflash

codeflash-ai · 2026-02-18T14:38:32Z

⚡️ This pull request contains optimizations for PR #1524

If you approve this dependent PR, these changes will be merged into the original PR branch fixes-for-core-unstructured-experimental.

This PR will be automatically closed if the original PR is merged.

📄 70% (0.70x) speedup for `extract_init_stub_from_class` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 7.02 milliseconds → 4.13 milliseconds (best of 41 runs)

📝 Explanation and details

The optimized code achieves a 70% runtime speedup (from 7.02ms to 4.13ms) through three key improvements:

1. Faster Class Discovery via Deque-Based BFS (Primary Speedup)

The original code uses ast.walk() which recursively traverses the entire AST tree even after finding the target class. The line profiler shows this taking 20.5ms (71% of time).

The optimized version replaces this with an explicit BFS using collections.deque, which stops immediately upon finding the target class. The profiler shows this reduces traversal time to 9.95ms - cutting the search overhead by >50%.

This is especially impactful when:

The target class appears early in the module (eliminates unnecessary traversal)
The module contains many classes (test shows 7-10% faster on modules with 100-1000 classes)
The function is called frequently (shown by the 108% speedup on 1000 repeated calls)

2. Explicit Loops Replace Generator Overhead

The original code uses any() with a generator expression and min() with a generator to check decorators and find minimum line numbers. These create function call and generator overhead.

The optimized version uses explicit for loops with early breaks:

Decorator checking: Directly iterates and breaks on first match
Min line number: Uses explicit comparison instead of min() generator

The profiler shows decorator processing time reduced from ~1.4ms to ~0.3ms, and min line calculation from 69μs to 28μs.

3. Conditional Flag Pattern for Relevance Checking

Instead of evaluating both conditions in a compound expression, the optimized version uses an is_relevant flag with early exits, reducing redundant checks.

Impact on Workloads

Based on function_references, this function is called from:

enrich_testgen_context: Used in test generation workflows where it may process many classes
Benchmark tests: Indicates this is in a performance-critical path

The optimization particularly benefits:

Large codebases: 89-90% faster on classes with 100+ methods or 50+ properties
Repeated calls: 108% faster when called 1000 times in sequence
Early matches: Up to 88% faster when target class is found quickly
Deep nesting: 57% faster for nested classes

The annotated tests show consistent 50-108% speedups across most scenarios, with minimal gains (6-10%) only when processing very large files where string slicing dominates runtime.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 1052 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import ast  # used to parse source into an AST.Module for the function under test
import textwrap  # used to build neatly-indented multi-line source strings

import pytest  # used for our unit tests
from codeflash.languages.python.context.code_context_extractor import \
    extract_init_stub_from_class

def test_returns_none_when_class_missing():
    # Given a module with a different class name
    src = textwrap.dedent(
        """
        class NotTarget:
            def __init__(self):
                self.x = 1
        """
    )
    # Parse into a real ast.Module (no mocks)
    module_tree = ast.parse(src)
    # When asked for a class name that does not exist
    codeflash_output = extract_init_stub_from_class("Target", src, module_tree); result = codeflash_output # 20.2μs -> 16.2μs (24.4% faster)

def test_extracts_simple_init_method():
    # Basic class with a simple __init__; we verify exact snippet returned
    src = textwrap.dedent(
        """\
        class A:
            def __init__(self, x):
                self.x = x

            def other(self):
                pass
        """
    )
    module_tree = ast.parse(src)
    # Call the function under test
    codeflash_output = extract_init_stub_from_class("A", src, module_tree); result = codeflash_output # 14.1μs -> 7.51μs (88.1% faster)
    # The expected snippet should include the class header and only the init block lines
    expected = "class A:\n" + "    def __init__(self, x):\n" + "        self.x = x"

def test_extracts_decorated_property_name_and_attribute_forms():
    # Two classes demonstrating the two decorator AST patterns:
    # 1) decorator is a Name (property)
    # 2) decorator is an Attribute (something.property)
    src = textwrap.dedent(
        """\
        class PropName:
            @property
            def value(self):
                return 1

        class PropAttr:
            @foo.property
            def value(self):
                return 2
        """
    )
    module_tree = ast.parse(src)
    # For class PropName, the returned snippet should start at the decorator line
    codeflash_output = extract_init_stub_from_class("PropName", src, module_tree); res1 = codeflash_output # 16.3μs -> 7.80μs (108% faster)
    expected1 = "class PropName:\n" + "    @property\n" + "    def value(self):\n" + "        return 1"

    # For class PropAttr, the decorator is an Attribute and should also be captured
    codeflash_output = extract_init_stub_from_class("PropAttr", src, module_tree); res2 = codeflash_output # 10.7μs -> 6.48μs (64.6% faster)
    expected2 = "class PropAttr:\n" + "    @foo.property\n" + "    def value(self):\n" + "        return 2"

def test_empty_module_returns_none():
    # An empty source string should parse to a Module with no classes -> None
    src = ""
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("AnyClass", src, module_tree) # 5.38μs -> 3.05μs (76.6% faster)

def test_class_without_relevant_methods_returns_none():
    # A class that has no __init__, __post_init__ or @property methods should yield None
    src = textwrap.dedent(
        """\
        class Empty:
            x = 1

            def some_method(self):
                return 2
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("Empty", src, module_tree) # 11.4μs -> 5.54μs (105% faster)

def test_multiple_classes_same_name_uses_first_occurrence():
    # Create two top-level classes with the same name; ensure the function picks the first one.
    src = textwrap.dedent(
        """\
        class Dup:
            def __init__(self):
                self.x = 'first'

        # later redefinition of the same class name
        class Dup:
            def __init__(self):
                self.x = 'second'
                self.y = 2
        """
    )
    module_tree = ast.parse(src)
    # The implementation stops at the first matching ClassDef encountered in ast.walk.
    # We expect the snippet from the first (simpler) definition.
    codeflash_output = extract_init_stub_from_class("Dup", src, module_tree); res = codeflash_output # 13.2μs -> 7.68μs (71.3% faster)
    expected = "class Dup:\n" + "    def __init__(self):\n" + "        self.x = 'first'"

def test_decorated_method_with_multiple_decorators_uses_top_decorator_line():
    # Multiple decorators; the snippet should start at the earliest decorator line
    src = textwrap.dedent(
        """\
        class MultiDec:
            @dec_one
            @dec_two
            def some_prop(self):
                return 42
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("MultiDec", src, module_tree); res = codeflash_output # 12.5μs -> 5.88μs (113% faster)
    expected = (
        "class MultiDec:\n"
        + "    @dec_one\n"
        + "    @dec_two\n"
        + "    def some_prop(self):\n"
        + "        return 42"
    )

def test_async_init_is_recognized_and_extracted():
    # An async function named __init__ should be recognized (AsyncFunctionDef is checked)
    src = textwrap.dedent(
        """\
        class AsyncInit:
            async def __init__(self):
                await do_something()
        """
    )
    module_tree = ast.parse(src)
    codeflash_output = extract_init_stub_from_class("AsyncInit", src, module_tree); res = codeflash_output # 11.8μs -> 6.52μs (80.3% faster)
    expected = "class AsyncInit:\n" + "    async def __init__(self):\n" + "        await do_something()"

def test_large_number_of_classes_extracts_target_quickly():
    # Create a module with 1000 classes named C0..C999, each with a simple __init__.
    N = 1000
    lines = []
    for i in range(N):
        # build each class block; keep it small to keep this test fast but large enough to simulate scale
        lines.append(f"class C{i}:")
        lines.append("    def __init__(self):")
        lines.append(f"        self.x = {i}")
        lines.append("")  # blank line between classes
    src = "\n".join(lines)
    module_tree = ast.parse(src)

    # Pick a middle class to ensure traversal through many nodes
    target = "C512"
    codeflash_output = extract_init_stub_from_class(target, src, module_tree); res = codeflash_output # 883μs -> 823μs (7.33% faster)
    # Expected snippet for the target class
    expected = f"class {target}:\n" + "    def __init__(self):\n" + "        self.x = 512"

def test_large_number_of_methods_within_class_extracts_init_correctly():
    # Build a single class with many methods to ensure slicing of lines works across many lines.
    method_count = 1000
    block = ["class ManyMethods:"]
    # put the __init__ somewhere in the middle to ensure indexing is correct
    block.append("    def first(self):")
    block.append("        pass")
    block.append("")
    block.append("    def __init__(self):")
    block.append("        self.value = 123")
    block.append("")
    # add many other methods to increase body size
    for i in range(method_count):
        block.append(f"    def m_{i}(self):")
        block.append("        return None")
        block.append("")
    src = "\n".join(block)
    module_tree = ast.parse(src)

    codeflash_output = extract_init_stub_from_class("ManyMethods", src, module_tree); res = codeflash_output # 528μs -> 277μs (90.5% faster)
    expected = "class ManyMethods:\n" + "    def __init__(self):\n" + "        self.value = 123"
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import ast

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    extract_init_stub_from_class

def test_simple_class_with_init():
    """Test extraction of __init__ from a simple class."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.4μs -> 7.08μs (88.7% faster)

def test_class_with_init_and_post_init():
    """Test extraction when class has both __init__ and __post_init__."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
    
    def __post_init__(self):
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.0μs -> 7.74μs (80.2% faster)

def test_class_with_property_decorator():
    """Test extraction of methods decorated with @property."""
    source = """class MyClass:
    def __init__(self):
        self._x = 1
    
    @property
    def x(self):
        return self._x
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 16.4μs -> 8.38μs (95.3% faster)

def test_class_with_only_property():
    """Test extraction when class has only property methods and no __init__."""
    source = """class MyClass:
    @property
    def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.3μs -> 7.21μs (97.8% faster)

def test_nonexistent_class_returns_none():
    """Test that requesting a nonexistent class returns None."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("NonexistentClass", source, tree); result = codeflash_output # 14.7μs -> 11.7μs (24.8% faster)

def test_class_without_init_or_property_returns_none():
    """Test that class without __init__, __post_init__, or @property returns None."""
    source = """class MyClass:
    def some_method(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.0μs -> 5.09μs (116% faster)

def test_class_name_in_result():
    """Test that the class name appears in the result."""
    source = """class TestClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TestClass", source, tree); result = codeflash_output # 11.6μs -> 6.30μs (84.1% faster)

def test_multiple_properties():
    """Test extraction of class with multiple property decorators."""
    source = """class MyClass:
    @property
    def x(self):
        return self._x
    
    @property
    def y(self):
        return self._y
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 16.4μs -> 8.64μs (90.4% faster)

def test_init_with_parameters():
    """Test extraction of __init__ with multiple parameters."""
    source = """class MyClass:
    def __init__(self, a, b, c=None):
        self.a = a
        self.b = b
        self.c = c
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.0μs -> 6.71μs (79.0% faster)

def test_init_with_multiline_body():
    """Test extraction of __init__ with multiline body."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
        self.y = 2
        self.z = 3
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.54μs (76.6% faster)

def test_attribute_property_decorator():
    """Test extraction with attribute-style property decorator."""
    source = """class MyClass:
    @some_module.property
    def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.2μs -> 7.07μs (101% faster)

def test_init_with_decorator():
    """Test extraction of __init__ that itself has a decorator."""
    source = """class MyClass:
    @some_decorator
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.5μs -> 6.45μs (93.8% faster)

def test_result_format():
    """Test that result follows expected format."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.20μs (81.4% faster)
    lines = result.split('\n')

def test_empty_class_body():
    """Test handling of class with empty body (pass statement)."""
    source = """class MyClass:
    pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 9.52μs -> 4.58μs (108% faster)

def test_class_case_sensitive_name():
    """Test that class name matching is case-sensitive."""
    source = """class MyClass:
    def __init__(self):
        pass
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("myclass", source, tree); result = codeflash_output # 14.1μs -> 11.2μs (25.7% faster)

def test_nested_class():
    """Test extraction from nested classes."""
    source = """class Outer:
    class Inner:
        def __init__(self):
            self.x = 1
"""
    tree = ast.parse(source)
    # Should find Inner class (ast.walk finds all classes)
    codeflash_output = extract_init_stub_from_class("Inner", source, tree); result = codeflash_output # 13.1μs -> 8.32μs (57.0% faster)

def test_multiple_classes_same_module():
    """Test extraction when module has multiple classes."""
    source = """class FirstClass:
    def some_method(self):
        pass

class SecondClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("SecondClass", source, tree); result = codeflash_output # 13.6μs -> 8.87μs (53.8% faster)

def test_class_with_init_and_other_methods():
    """Test that other methods are excluded from extraction."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
    
    def other_method(self):
        return self.x * 2
    
    def another_method(self):
        return self.x + 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.8μs -> 7.62μs (81.0% faster)

def test_post_init_without_init():
    """Test extraction of __post_init__ when there's no __init__."""
    source = """class MyClass:
    def __post_init__(self):
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.2μs -> 6.32μs (76.7% faster)

def test_async_init():
    """Test extraction of async __init__ if present."""
    source = """class MyClass:
    async def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.4μs -> 6.20μs (83.7% faster)

def test_property_with_setter():
    """Test extraction of property with setter (both have @property-like decorators)."""
    source = """class MyClass:
    @property
    def x(self):
        return self._x
    
    @x.setter
    def x(self, value):
        self._x = value
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 15.1μs -> 7.99μs (89.2% faster)

def test_init_with_string_multiline():
    """Test extraction of __init__ containing multiline strings."""
    source = '''class MyClass:
    def __init__(self):
        """
        This is a docstring
        """
        self.x = 1
'''
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.8μs -> 6.57μs (79.6% faster)

def test_class_with_class_attributes():
    """Test that class attributes (not methods) don't affect extraction."""
    source = """class MyClass:
    class_var = 42
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.67μs (74.0% faster)

def test_multiple_decorators_on_init():
    """Test __init__ with multiple decorators."""
    source = """class MyClass:
    @decorator1
    @decorator2
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 12.7μs -> 6.96μs (82.7% faster)

def test_class_with_staticmethod():
    """Test that staticmethod is excluded from extraction."""
    source = """class MyClass:
    @staticmethod
    def static_method():
        return 1
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.3μs -> 7.33μs (80.9% faster)

def test_class_with_classmethod():
    """Test that classmethod is excluded from extraction."""
    source = """class MyClass:
    @classmethod
    def class_method(cls):
        return cls()
    
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.3μs -> 7.30μs (82.0% faster)

def test_init_with_annotations():
    """Test extraction of __init__ with type annotations."""
    source = """class MyClass:
    def __init__(self, x: int, y: str) -> None:
        self.x = x
        self.y = y
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.6μs -> 6.63μs (74.8% faster)

def test_property_with_annotations():
    """Test extraction of property with return type annotations."""
    source = """class MyClass:
    @property
    def x(self) -> int:
        return self._x
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 13.9μs -> 6.94μs (101% faster)

def test_whitespace_preservation():
    """Test that indentation and whitespace are preserved in extracted stub."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
        self.y = 2
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.30μs (79.3% faster)

def test_special_characters_in_init_body():
    """Test extraction of __init__ with special characters in body."""
    source = r"""class MyClass:
    def __init__(self):
        self.pattern = r"\d+"
        self.escaped = "test\\nvalue"
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.5μs -> 6.50μs (76.6% faster)

def test_init_with_default_none():
    """Test __init__ with None as default parameter."""
    source = """class MyClass:
    def __init__(self, value=None):
        self.value = value
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.50μs (73.7% faster)

def test_init_with_list_default():
    """Test __init__ with list as default parameter."""
    source = """class MyClass:
    def __init__(self, items=[]):
        self.items = items
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.1μs -> 6.20μs (79.2% faster)

def test_property_on_async_method():
    """Test property decorator on async method (if supported by AST)."""
    source = """class MyClass:
    @property
    async def x(self):
        return 1
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 14.0μs -> 6.93μs (102% faster)

def test_init_with_varargs():
    """Test __init__ with *args."""
    source = """class MyClass:
    def __init__(self, *args):
        self.args = args
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.2μs -> 6.05μs (84.3% faster)

def test_init_with_kwargs():
    """Test __init__ with **kwargs."""
    source = """class MyClass:
    def __init__(self, **kwargs):
        self.kwargs = kwargs
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.4μs -> 6.28μs (81.0% faster)

def test_init_with_args_and_kwargs():
    """Test __init__ with both *args and **kwargs."""
    source = """class MyClass:
    def __init__(self, a, *args, **kwargs):
        self.a = a
        self.args = args
        self.kwargs = kwargs
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 11.3μs -> 6.29μs (80.0% faster)

def test_large_class_with_many_methods():
    """Test extraction from class with many non-relevant methods."""
    # Create a class with 100 regular methods and 1 __init__
    methods = "\n    ".join([f"def method_{i}(self):\n        return {i}" for i in range(100)])
    source = f"""class MyClass:
    def __init__(self):
        self.x = 1
    
    {methods}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 66.8μs -> 35.4μs (89.0% faster)

def test_class_with_many_properties():
    """Test extraction from class with many property methods."""
    # Create multiple property decorators
    properties = "\n    ".join([
        f"@property\n    def prop_{i}(self):\n        return {i}"
        for i in range(50)
    ])
    source = f"""class MyClass:
    {properties}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 92.5μs -> 48.9μs (89.2% faster)

def test_module_with_many_classes():
    """Test extraction when module contains many classes."""
    # Create 100 classes, extract from one in the middle
    classes = "\n".join([
        f"""class Class{i}:
    def method(self):
        pass
"""
        for i in range(50)
    ])
    target_class = """class TargetClass:
    def __init__(self):
        self.x = 1
"""
    rest_classes = "\n".join([
        f"""class Class{i+50}:
    def method(self):
        pass
"""
        for i in range(50)
    ])
    source = f"{classes}\n{target_class}\n{rest_classes}"
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TargetClass", source, tree); result = codeflash_output # 101μs -> 91.7μs (10.5% faster)

def test_init_with_long_body():
    """Test extraction of __init__ with very long body (1000 lines)."""
    body_lines = "\n        ".join([f"self.attr_{i} = {i}" for i in range(1000)])
    source = f"""class MyClass:
    def __init__(self):
        {body_lines}
"""
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 63.0μs -> 57.3μs (10.0% faster)

def test_extraction_performance_many_calls():
    """Test extraction performance with multiple repeated calls."""
    source = """class MyClass:
    def __init__(self):
        self.x = 1
"""
    tree = ast.parse(source)
    # Call extraction 1000 times
    results = []
    for _ in range(1000):
        codeflash_output = extract_init_stub_from_class("MyClass", source, tree); result = codeflash_output # 4.28ms -> 2.06ms (108% faster)
        results.append(result)

def test_large_source_file():
    """Test extraction from a very large source file."""
    # Create a large source file with 500 classes
    classes = []
    for i in range(500):
        if i == 250:  # Target class at middle
            classes.append("""class TargetClass:
    def __init__(self, param1, param2):
        self.p1 = param1
        self.p2 = param2
""")
        else:
            classes.append(f"""class Class{i}:
    def method(self):
        return {i}
""")
    source = "\n".join(classes)
    tree = ast.parse(source)
    codeflash_output = extract_init_stub_from_class("TargetClass", source, tree); result = codeflash_output # 438μs -> 410μs (6.70% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1524-2026-02-18T14.38.26 and push.

The optimized code achieves a **70% runtime speedup** (from 7.02ms to 4.13ms) through three key improvements: ## 1. **Faster Class Discovery via Deque-Based BFS (Primary Speedup)** The original code uses `ast.walk()` which recursively traverses the entire AST tree even after finding the target class. The line profiler shows this taking 20.5ms (71% of time). The optimized version replaces this with an explicit BFS using `collections.deque`, which stops immediately upon finding the target class. The profiler shows this reduces traversal time to 9.95ms - **cutting the search overhead by >50%**. This is especially impactful when: - The target class appears early in the module (eliminates unnecessary traversal) - The module contains many classes (test shows 7-10% faster on modules with 100-1000 classes) - The function is called frequently (shown by the 108% speedup on 1000 repeated calls) ## 2. **Explicit Loops Replace Generator Overhead** The original code uses `any()` with a generator expression and `min()` with a generator to check decorators and find minimum line numbers. These create function call and generator overhead. The optimized version uses explicit `for` loops with early breaks: - Decorator checking: Directly iterates and breaks on first match - Min line number: Uses explicit comparison instead of `min()` generator The profiler shows decorator processing time reduced from ~1.4ms to ~0.3ms, and min line calculation from 69μs to 28μs. ## 3. **Conditional Flag Pattern for Relevance Checking** Instead of evaluating both conditions in a compound expression, the optimized version uses an `is_relevant` flag with early exits, reducing redundant checks. ## Impact on Workloads Based on `function_references`, this function is called from: - `enrich_testgen_context`: Used in test generation workflows where it may process many classes - Benchmark tests: Indicates this is in a performance-critical path The optimization particularly benefits: - **Large codebases**: 89-90% faster on classes with 100+ methods or 50+ properties - **Repeated calls**: 108% faster when called 1000 times in sequence - **Early matches**: Up to 88% faster when target class is found quickly - **Deep nesting**: 57% faster for nested classes The annotated tests show consistent 50-108% speedups across most scenarios, with minimal gains (6-10%) only when processing very large files where string slicing dominates runtime.

claude · 2026-02-18T14:55:55Z

PR Review Summary

Prek Checks

✅ Fixed — 2 issues auto-fixed and committed:

I001 (unsorted-imports): Sorted deque, defaultdict → defaultdict, deque
PLR1730 (if-stmt-min-max): Replaced manual if d.lineno < m: m = d.lineno with m = min(m, d.lineno)

Additionally fixed 6 mypy type errors in extract_init_stub_from_class:

Typed BFS deque as deque[ast.AST] (was inferred as deque[Module])
Renamed loop variable to avoid shadowing causing false attribute errors

All prek and mypy checks pass after fixes.

Code Review

No critical bugs or security vulnerabilities found.

Notable observations:

The Path import was moved to TYPE_CHECKING block — this is safe because the file uses from __future__ import annotations
build_testgen_context has a new optional function_to_optimize parameter — backward compatible
All removed functions (safe_relative_to, is_project_path, extract_init_stub, resolve_transitive_type_deps, etc.) are only used internally in this file; no external callers found
The enrich_testgen_context behavior changed: removed runtime introspection of external libraries (Steps 2 & 3 using importlib/inspect), replaced with AST-based approach via extract_parameter_type_constructors. This is a deliberate architectural change.

Test Coverage

File	Stmts (main)	Stmts (PR)	Coverage (main)	Coverage (PR)	Delta
`code_context_extractor.py`	620	554	85%	91%	+6% ✅
`test_code_context_extractor.py`	1044	1052	98%	98%	0%
Total	1664	1606	93%	95%	+2% ✅

Coverage improved from 85% → 91% for the main source file. Tests were updated to match the new API (removed functions replaced with new test cases). 8 pre-existing test failures in test_tracer.py are unrelated to this PR.

Last updated: 2026-02-18T14:57Z

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 18, 2026

codeflash-ai bot mentioned this pull request Feb 18, 2026

fixes-for-core-unstructured-experimental #1524

Merged

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 18, 2026

style: auto-fix linting issues and resolve mypy type errors

ae740d9

KRRT7 merged commit 2364096 into fixes-for-core-unstructured-experimental Feb 18, 2026
26 of 27 checks passed

KRRT7 deleted the codeflash/optimize-pr1524-2026-02-18T14.38.26 branch February 18, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

⚡️ Speed up function `extract_init_stub_from_class` by 70% in PR #1524 (`fixes-for-core-unstructured-experimental`)#1529

⚡️ Speed up function `extract_init_stub_from_class` by 70% in PR #1524 (`fixes-for-core-unstructured-experimental`)#1529
KRRT7 merged 2 commits intofixes-for-core-unstructured-experimentalfrom
codeflash/optimize-pr1524-2026-02-18T14.38.26

codeflash-ai bot commented Feb 18, 2026

Uh oh!

Uh oh!

claude bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

codeflash-ai bot commented Feb 18, 2026

⚡️ This pull request contains optimizations for PR #1524

📄 70% (0.70x) speedup for extract_init_stub_from_class in codeflash/languages/python/context/code_context_extractor.py

📝 Explanation and details

1. Faster Class Discovery via Deque-Based BFS (Primary Speedup)

2. Explicit Loops Replace Generator Overhead

3. Conditional Flag Pattern for Relevance Checking

Impact on Workloads

Uh oh!

Uh oh!

claude bot commented Feb 18, 2026

PR Review Summary

Prek Checks

Code Review

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 70% (0.70x) speedup for `extract_init_stub_from_class` in `codeflash/languages/python/context/code_context_extractor.py`