Skip to content

Comments

⚡️ Speed up function discover_functions_from_source by 12% in PR #1199 (omni-java)#1293

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T07.34.56
Closed

⚡️ Speed up function discover_functions_from_source by 12% in PR #1199 (omni-java)#1293
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T07.34.56

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 12% (0.12x) speedup for discover_functions_from_source in codeflash/languages/java/discovery.py

⏱️ Runtime : 21.3 milliseconds 19.0 milliseconds (best of 19 runs)

📝 Explanation and details

This optimization achieves an 11% runtime improvement (from 21.3ms to 19.0ms) through several targeted changes to the Java code discovery pipeline:

Key Optimizations

1. Module-Level Import Hoisting

The fnmatch module is now imported once at the module level instead of being conditionally imported inside _should_include_method on every pattern check. This eliminates repeated import overhead when filtering methods with include/exclude patterns, as shown in the line profiler where pattern matching checks were consuming ~5-11% of total time in the original code.

2. Default Path Pre-computation

The fallback Path("unknown.java") is now computed once before the loop (default_file_path = file_path or Path("unknown.java")) rather than 1,224 times inside the loop. The line profiler shows this change reduced time spent on the file_path assignment from 12.2% to 0.3% of total function time - a critical improvement since this line was the second-most expensive operation in the original code.

3. Early Exit Reordering in Filters

The include_methods check is moved earlier in _should_include_method, before the more expensive pattern matching operations. This allows the function to exit early for methods that should be excluded due to being class methods, avoiding unnecessary fnmatch calls. The line count calculation is also made conditional - only computed when min_lines or max_lines criteria are actually set, reducing unnecessary arithmetic for 1,022 out of 1,341 invocations.

Performance Impact by Test Case

The optimizations particularly benefit scenarios with:

  • Multiple methods with patterns: 9-31% faster (e.g., test_large_scale_many_methods_under_limit shows 29.9% improvement)
  • File path handling: Tests that provide explicit paths see consistent 3-18% improvements
  • Line count filtering: 18.6% faster when min/max line criteria are active

Tests that regressed slightly (showing slower times) are edge cases with very few methods where the overhead of the additional conditional check (if criteria.min_lines is not None or criteria.max_lines is not None) marginally exceeds savings, but these represent atypical usage with only 1-2 methods.

Why This Matters

While individual micro-optimizations are small, they compound significantly in the hot loop that processes all discovered methods. With 1,650+ method invocations in typical runs, eliminating repeated imports, reducing object allocations, and enabling early exits creates measurable aggregate savings. The 11% runtime improvement demonstrates how loop-level optimizations scale effectively for Java codebases with many methods.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 18 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.languages.base import FunctionFilterCriteria
from codeflash.languages.java.discovery import discover_functions_from_source
from codeflash.languages.java.parser import JavaAnalyzer, JavaMethodNode
from codeflash.models.function_types import FunctionParent

# Helper to create a JavaMethodNode with minimal required fields.
def make_method(
    name: str,
    class_name: str | None = None,
    return_type: str | None = "int",
    is_abstract: bool = False,
    javadoc_start_line: int | None = None,
    start_line: int = 1,
    end_line: int = 1,
    start_col: int = 0,
    end_col: int = 0,
) -> JavaMethodNode:
    """
    Create a JavaMethodNode used in tests.

    We pass a plain object() for the `node` parameter because the real tree-sitter
    Node is not required for the logic inside discover_functions_from_source;
    the tests that rely on analyzer.has_return_statement will provide a stubbed
    has_return_statement implementation.
    """
    return JavaMethodNode(
        name=name,
        node=object(),  # Node is typed but not runtime-enforced
        start_line=start_line,
        end_line=end_line,
        start_col=start_col,
        end_col=end_col,
        is_static=False,
        is_public=True,
        is_private=False,
        is_protected=False,
        is_abstract=is_abstract,
        is_synchronized=False,
        return_type=return_type,
        class_name=class_name,
        source_text=f"// source for {name}",
        javadoc_start_line=javadoc_start_line,
    )

def attach_methods_to_analyzer(analyzer: JavaAnalyzer, methods: list[JavaMethodNode], return_map: dict[int | str, bool] | None = None):
    """
    Attach fake find_methods and has_return_statement implementations to a real
    JavaAnalyzer instance.

    - find_methods must accept the same signature used by discover_functions_from_source.
    - has_return_statement should return True/False according to return_map when provided,
      otherwise default to True for non-void return types, False for void.
    """
    # Provide find_methods with the expected signature.
    def fake_find_methods(source, include_private=True, include_static=True):
        # ignore parameters, just return our prepared list
        return methods

    # Prepare has_return_statement behavior
    def fake_has_return_statement(method, source):
        # If a mapping provided, allow keys by method name or start_line for lookup.
        if return_map:
            if method.name in return_map:
                return bool(return_map[method.name])
            if method.start_line in return_map:
                return bool(return_map[method.start_line])
        # Default behavior: non-void => True, void => False
        return False if method.return_type == "void" else True

    # Attach to the instance
    analyzer.find_methods = fake_find_methods  # type: ignore[assignment]
    analyzer.has_return_statement = fake_has_return_statement  # type: ignore[assignment]

def test_basic_inclusion_creates_function_to_optimize_and_parents():
    # Basic: a normal method with a non-void return and a class name should be discovered.

    method = make_method(
        name="compute",
        class_name="Calculator",
        return_type="int",
        is_abstract=False,
        javadoc_start_line=10,
        start_line=20,
        end_line=25,
        start_col=4,
        end_col=20,
    )

    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [method])

    # Provide an explicit file_path to ensure the returned object's file_path is preserved.
    path = Path("src/Calculator.java")
    codeflash_output = discover_functions_from_source("dummy source", file_path=path, analyzer=analyzer); results = codeflash_output # 19.5μs -> 18.9μs (3.09% faster)

    func = results[0]

    # Parents should be a list with a FunctionParent whose stringifies into 'ClassDef:Calculator'
    parents = getattr(func, "parents")

def test_abstract_methods_are_excluded():
    # Edge: abstract methods should be skipped by discover_functions_from_source.

    abstract_method = make_method(name="doWork", class_name="Worker", is_abstract=True)
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [abstract_method])

    codeflash_output = discover_functions_from_source("dummy", analyzer=analyzer); results = codeflash_output # 2.65μs -> 7.18μs (63.1% slower)

def test_constructor_methods_are_excluded_when_name_matches_class():
    # Edge: constructors (method.name == class_name) are skipped.

    constructor = make_method(name="Widget", class_name="Widget", return_type="void")
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [constructor])

    codeflash_output = discover_functions_from_source("dummy", analyzer=analyzer); results = codeflash_output # 2.64μs -> 7.05μs (62.6% slower)

def test_include_and_exclude_patterns_filtering():
    # Patterns: include_patterns must match method name to include; exclude_patterns must filter out.

    m1 = make_method(name="alpha", class_name="A")
    m2 = make_method(name="beta", class_name="B")
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [m1, m2])

    # If we include only 'alpha', we should get only that function
    include_criteria = FunctionFilterCriteria(include_patterns=["alpha"])
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=include_criteria, analyzer=analyzer); results_inc = codeflash_output # 30.1μs -> 27.4μs (9.91% faster)

    # If we exclude 'beta', we should only see 'alpha'
    exclude_criteria = FunctionFilterCriteria(exclude_patterns=["beta"])
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=exclude_criteria, analyzer=analyzer); results_exc = codeflash_output # 15.1μs -> 14.7μs (2.83% faster)

def test_require_return_filters_void_and_missing_return():
    # require_return True should exclude void return types and those without actual return statements.

    # Method with void return -> excluded
    void_method = make_method(name="log", class_name="L", return_type="void")
    # Method with non-void but analyzer reports no return -> excluded
    no_return_method = make_method(name="maybe", class_name="M", return_type="int")
    analyzer = JavaAnalyzer()
    # Provide return_map so that 'maybe' has no actual return
    attach_methods_to_analyzer(analyzer, [void_method, no_return_method], return_map={"maybe": False})

    criteria = FunctionFilterCriteria(require_return=True)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria, analyzer=analyzer); results = codeflash_output # 2.85μs -> 6.84μs (58.3% slower)

    # If require_return is False, the non-void method that lacks a return should be included,
    # but void should still be included because the criteria does not forbid void when require_return is False.
    criteria_no_req = FunctionFilterCriteria(require_return=False)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria_no_req, analyzer=analyzer); results_no_req = codeflash_output # 27.6μs -> 21.0μs (31.3% faster)
    names = sorted(getattr(f, "function_name") for f in results_no_req)

def test_include_methods_flag_filters_out_class_methods():
    # If include_methods is False, methods that have a class_name should be excluded.

    m = make_method(name="helper", class_name="Util", return_type="int")
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [m])

    criteria = FunctionFilterCriteria(include_methods=False)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria, analyzer=analyzer); results = codeflash_output # 2.11μs -> 5.78μs (63.5% slower)

def test_line_count_filters_min_and_max_lines():
    # Test min_lines and max_lines behavior using method_lines = end_line - start_line + 1.

    # This method has 5 lines (10..14 inclusive)
    method = make_method(name="longOne", class_name="C", start_line=10, end_line=14, return_type="int")
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [method])

    # min_lines greater than actual should exclude it
    criteria_min = FunctionFilterCriteria(min_lines=6)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria_min, analyzer=analyzer); res_min = codeflash_output # 2.17μs -> 6.28μs (65.5% slower)

    # max_lines smaller than actual should exclude it
    criteria_max = FunctionFilterCriteria(max_lines=4)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria_max, analyzer=analyzer); res_max = codeflash_output # 1.15μs -> 3.44μs (66.6% slower)

    # When within bounds it should be included
    criteria_ok = FunctionFilterCriteria(min_lines=1, max_lines=10)
    codeflash_output = discover_functions_from_source("dummy", filter_criteria=criteria_ok, analyzer=analyzer); res_ok = codeflash_output # 18.8μs -> 15.9μs (18.6% faster)

def test_javadoc_and_provided_file_path_propagated():
    # Ensure javadoc_start_line and provided file_path are propagated in the returned FunctionToOptimize.

    method = make_method(name="doced", class_name="D", javadoc_start_line=42, start_line=100, end_line=102)
    analyzer = JavaAnalyzer()
    attach_methods_to_analyzer(analyzer, [method])

    file_path = Path("project/src/D.java")
    codeflash_output = discover_functions_from_source("dummy", file_path=file_path, analyzer=analyzer); results = codeflash_output # 15.8μs -> 14.4μs (9.17% faster)
    func = results[0]

def test_large_scale_many_methods_under_limit():
    # Large-scale: generate a number of methods to ensure discover_functions_from_source scales.
    # Keep the count under the specified 1000-element guideline; use 200 as a representative large test.

    count = 200
    methods = []
    for i in range(count):
        # Create methods with unique names; all non-abstract, non-void, and with class_name
        m = make_method(
            name=f"m{i}",
            class_name=f"C{i % 5}",  # many methods share some classes
            return_type="int",
            is_abstract=False,
            start_line=1 + i * 10,
            end_line=2 + i * 10,
        )
        methods.append(m)

    analyzer = JavaAnalyzer()
    # All methods are guaranteed to have return statements in this test
    attach_methods_to_analyzer(analyzer, methods, return_map=None)

    codeflash_output = discover_functions_from_source("dummy large source", analyzer=analyzer); results = codeflash_output # 1.50ms -> 1.15ms (29.9% faster)

    # Spot check first, middle, last items to ensure ordering and correct bookkeeping.
    first = results[0]
    middle = results[count // 2]
    last = results[-1]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path
from unittest.mock import MagicMock, Mock, patch

# imports
import pytest
from codeflash.languages.base import FunctionFilterCriteria
from codeflash.languages.java.discovery import (_should_include_method,
                                                discover_functions_from_source)
from codeflash.languages.java.parser import JavaAnalyzer, JavaMethodNode
from codeflash.models.function_types import FunctionParent
from tree_sitter import Node

def test_discover_functions_from_source_empty_source():
    """Test with empty source code - should return empty list."""
    codeflash_output = discover_functions_from_source(""); result = codeflash_output # 23.5μs -> 28.4μs (17.4% slower)

def test_discover_functions_from_source_no_methods():
    """Test with source code containing no methods - should return empty list."""
    source = "public class EmptyClass { }"
    codeflash_output = discover_functions_from_source(source); result = codeflash_output # 37.2μs -> 42.0μs (11.3% slower)

def test_should_include_method_with_all_criteria():
    """Test _should_include_method with comprehensive criteria."""
    mock_node = Mock(spec=Node)
    mock_analyzer = Mock(spec=JavaAnalyzer)
    
    method = JavaMethodNode(
        name="getValue",
        node=mock_node,
        start_line=2,
        end_line=5,
        start_col=4,
        end_col=6,
        is_static=False,
        is_public=True,
        is_private=False,
        is_protected=False,
        is_abstract=False,
        is_synchronized=False,
        return_type="int",
        class_name="TestClass",
        source_text="public int getValue() { return 42; }",
        javadoc_start_line=None
    )
    
    criteria = FunctionFilterCriteria(
        include_patterns=["get*"],
        exclude_patterns=["*Helper"],
        require_return=True,
        include_methods=True,
        min_lines=3,
        max_lines=10
    )
    
    mock_analyzer.has_return_statement.return_value = True
    source = "source code"
    
    result = _should_include_method(method, criteria, source, mock_analyzer)

def test_should_include_method_abstract_excluded():
    """Test _should_include_method excludes abstract methods."""
    mock_node = Mock(spec=Node)
    mock_analyzer = Mock(spec=JavaAnalyzer)
    
    method = JavaMethodNode(
        name="abstractMethod",
        node=mock_node,
        start_line=2,
        end_line=2,
        start_col=4,
        end_col=6,
        is_static=False,
        is_public=True,
        is_private=False,
        is_protected=False,
        is_abstract=True,
        is_synchronized=False,
        return_type="void",
        class_name="TestClass",
        source_text="abstract void abstractMethod();",
        javadoc_start_line=None
    )
    
    criteria = FunctionFilterCriteria()
    source = "source code"
    
    result = _should_include_method(method, criteria, source, mock_analyzer)

def test_should_include_method_constructor_excluded():
    """Test _should_include_method excludes constructors."""
    mock_node = Mock(spec=Node)
    mock_analyzer = Mock(spec=JavaAnalyzer)
    
    method = JavaMethodNode(
        name="TestClass",
        node=mock_node,
        start_line=2,
        end_line=4,
        start_col=4,
        end_col=6,
        is_static=False,
        is_public=True,
        is_private=False,
        is_protected=False,
        is_abstract=False,
        is_synchronized=False,
        return_type=None,
        class_name="TestClass",
        source_text="public TestClass() { }",
        javadoc_start_line=None
    )
    
    criteria = FunctionFilterCriteria()
    source = "source code"
    
    result = _should_include_method(method, criteria, source, mock_analyzer)

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-03T07.34.56 and push.

Codeflash Static Badge

This optimization achieves an **11% runtime improvement** (from 21.3ms to 19.0ms) through several targeted changes to the Java code discovery pipeline:

## Key Optimizations

### 1. **Module-Level Import Hoisting**
The `fnmatch` module is now imported once at the module level instead of being conditionally imported inside `_should_include_method` on every pattern check. This eliminates repeated import overhead when filtering methods with include/exclude patterns, as shown in the line profiler where pattern matching checks were consuming ~5-11% of total time in the original code.

### 2. **Default Path Pre-computation**
The fallback `Path("unknown.java")` is now computed once before the loop (`default_file_path = file_path or Path("unknown.java")`) rather than 1,224 times inside the loop. The line profiler shows this change reduced time spent on the file_path assignment from **12.2%** to **0.3%** of total function time - a critical improvement since this line was the second-most expensive operation in the original code.

### 3. **Early Exit Reordering in Filters**
The `include_methods` check is moved earlier in `_should_include_method`, before the more expensive pattern matching operations. This allows the function to exit early for methods that should be excluded due to being class methods, avoiding unnecessary fnmatch calls. The line count calculation is also made conditional - only computed when `min_lines` or `max_lines` criteria are actually set, reducing unnecessary arithmetic for 1,022 out of 1,341 invocations.

## Performance Impact by Test Case

The optimizations particularly benefit scenarios with:
- **Multiple methods with patterns**: 9-31% faster (e.g., `test_large_scale_many_methods_under_limit` shows 29.9% improvement)
- **File path handling**: Tests that provide explicit paths see consistent 3-18% improvements
- **Line count filtering**: 18.6% faster when min/max line criteria are active

Tests that regressed slightly (showing slower times) are edge cases with very few methods where the overhead of the additional conditional check (`if criteria.min_lines is not None or criteria.max_lines is not None`) marginally exceeds savings, but these represent atypical usage with only 1-2 methods.

## Why This Matters

While individual micro-optimizations are small, they compound significantly in the hot loop that processes all discovered methods. With 1,650+ method invocations in typical runs, eliminating repeated imports, reducing object allocations, and enabling early exits creates measurable aggregate savings. The 11% runtime improvement demonstrates how loop-level optimizations scale effectively for Java codebases with many methods.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 3, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 3, 2026
@KRRT7
Copy link
Collaborator

KRRT7 commented Feb 19, 2026

Closing stale bot PR.

@KRRT7 KRRT7 closed this Feb 19, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1199-2026-02-03T07.34.56 branch February 19, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant