Skip to content

Comments

⚡️ Speed up function _extract_test_method_name by 83% in PR #1199 (omni-java)#1627

Open
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T22.53.40
Open

⚡️ Speed up function _extract_test_method_name by 83% in PR #1199 (omni-java)#1627
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T22.53.40

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 83% (0.83x) speedup for _extract_test_method_name in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 10.0 milliseconds 5.48 milliseconds (best of 78 runs)

📝 Explanation and details

The optimized code achieves an 83% speedup (from 10.0ms to 5.48ms) by introducing a fast-path heuristic that uses simple string operations (find(), split(), string slicing) to extract method names before falling back to expensive regex matching.

Key optimization: The code now checks for common Java modifiers (public, private, protected) and return types (void, String, int, etc.) using basic string scanning. When found, it extracts the method name by:

  1. Finding the modifier/type using str.find() (much cheaper than regex)
  2. Locating the opening parenthesis (
  3. Splitting the substring and taking the last token before (
  4. Validating it's a valid identifier with a simple regex check

Why it's faster:

  • Line profiler shows the original regex _METHOD_SIG_PATTERN.search() took 84% of total time (10.18ms out of 12.11ms)
  • In the optimized version, this regex is only invoked for 18 out of 2084 calls (0.9% hit rate), taking just 25.9% of total time
  • For the remaining 99.1% of cases, the fast-path succeeds using simple string operations that are orders of magnitude faster than regex
  • The fast-path successfully handles 2064 cases via modifier matching and 1 case via type matching, bypassing the expensive regex entirely

Test results show the optimization excels when:

  • Working with large inputs: test_large_mixed_content shows 27,030% speedup (3.76ms → 13.9μs)
  • Processing bulk signatures: test_alternating_modifiers_large shows 6,373% speedup (724μs → 11.2μs)
  • Handling multi-line declarations: test_large_multiline_method_declaration shows 466% speedup (27.6μs → 4.88μs)
  • Common Java patterns with standard modifiers and return types are accelerated

Trade-offs:

  • Simple single-line cases show 20-30% slowdown (3-4μs → 4-6μs) due to fast-path overhead before regex fallback
  • However, the overall workload improvement is dramatically positive (83% speedup), indicating the function is primarily called with signatures that benefit from the fast-path
  • The optimization preserves exact behavior through careful fallback logic and validation

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2084 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import _extract_test_method_name

def test_01_basic_single_line_signature():
    # A straightforward, single-line Java method signature with common modifiers and return type.
    # The function should extract the method name "testFoo".
    lines = ["public void testFoo()"]
    codeflash_output = _extract_test_method_name(lines) # 2.91μs -> 3.75μs (22.4% slower)

def test_02_basic_multiline_signature():
    # A signature split across multiple list elements should be joined and parsed correctly.
    # We split tokens across elements but keep the '(' attached to the method name.
    lines = ["public static", "int", "computeSum(" , "int a)"]  # joined -> "public static int computeSum( int a)"
    codeflash_output = _extract_test_method_name(lines) # 3.15μs -> 4.05μs (22.1% slower)

def test_03_return_type_with_single_array_brackets():
    # The regex explicitly supports a single '[]' after a type (e.g., String[]).
    # Ensure method name "getArr" is extracted when return type is an array.
    lines = ["public String[] getArr()"]
    codeflash_output = _extract_test_method_name(lines) # 3.48μs -> 3.73μs (6.73% slower)

def test_04_generic_return_type_uses_fallback_to_find_name():
    # Return types using generics (e.g., List<String>) do not match the strict return-type pattern,
    # but the fallback pattern should still find "doSomething".
    lines = ["List<String> doSomething()"]
    codeflash_output = _extract_test_method_name(lines) # 6.85μs -> 9.15μs (25.1% slower)

def test_05_empty_input_returns_unknown():
    # An empty list should produce an empty joined string and therefore return the fallback "unknown".
    lines: list[str] = []
    codeflash_output = _extract_test_method_name(lines) # 1.04μs -> 1.16μs (10.4% slower)

def test_06_whitespace_only_returns_unknown():
    # A list containing only whitespace should be treated as empty content and return "unknown".
    lines = ["   ", "\t"]
    codeflash_output = _extract_test_method_name(lines) # 1.24μs -> 1.34μs (7.45% slower)

def test_07_name_with_underscores_and_digits():
    # Method names can include underscores and digits; ensure they are captured (regex uses \w+).
    lines = ["protected double compute_1_val()"]
    codeflash_output = _extract_test_method_name(lines) # 3.18μs -> 4.40μs (27.8% slower)

def test_08_dollar_in_method_name_produces_expected_behavior():
    # Java synthetic method names sometimes include '. The patterns used here only capture \w characters.
    # For "public void lambda$1()", the strict signature regex fails and the fallback finds the last \w+ token
    # before '(' which is "1". We assert the current behavior explicitly so regressions are detectable.
    lines = ["public void lambda$1()"]
    codeflash_output = _extract_test_method_name(lines) # 13.5μs -> 17.8μs (23.9% slower)

def test_09_multi_dimensional_array_fallback():
    # The primary pattern only supports a single "[]" after a type; a double array "String[][]" won't match.
    # The fallback must still find the method name "multiDim".
    lines = ["public String[][] multiDim()"]
    codeflash_output = _extract_test_method_name(lines) # 10.2μs -> 3.80μs (170% faster)

def test_10_invalid_input_types_raise_type_error():
    # The implementation joins strings; passing None or lists with non-str elements should raise TypeError.
    with pytest.raises(TypeError):
        _extract_test_method_name(None) # 2.65μs -> 2.67μs (0.750% slower)
    with pytest.raises(TypeError):
        # list with an integer will cause str.join to complain (expects str instances)
        _extract_test_method_name(["public", "void", 123, "foo("]) # 2.65μs -> 2.50μs (6.41% faster)

def test_11_large_single_signature_split_across_many_elements():
    # Build a single method signature spread across 1000 list elements to ensure join + regex still works.
    # We place the return type ("void") and method name ("hugeMethod(") near the end.
    pieces = ["public"] + [""] * 995 + ["void", "hugeMethod("]
    # The join will create a large whitespace-filled string; the function should still find "hugeMethod".
    codeflash_output = _extract_test_method_name(pieces) # 10.2μs -> 10.8μs (5.85% slower)

def test_12_bulk_many_signatures_loop():
    # Generate 1000 distinct simple signatures and ensure each name is extracted correctly.
    # This also confirms deterministic behavior across many iterations.
    n = 1000
    for i in range(n):
        method_name = f"test_case_{i}"
        signature = f"public void {method_name}()"  # simple, consistent signature
        # Call the function many times; each iteration must return the corresponding name.
        codeflash_output = _extract_test_method_name([signature]) # 780μs -> 981μs (20.5% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

import re

# imports
import pytest
from codeflash.languages.java.instrumentation import _extract_test_method_name

def test_simple_public_void_method():
    # Test extraction of a simple public void method name
    lines = ["public void testExample()"]
    codeflash_output = _extract_test_method_name(lines) # 3.34μs -> 4.53μs (26.3% slower)

def test_simple_public_string_method():
    # Test extraction of a public method returning String
    lines = ["public String getName()"]
    codeflash_output = _extract_test_method_name(lines) # 3.07μs -> 4.24μs (27.7% slower)

def test_simple_private_method():
    # Test extraction of a private method
    lines = ["private void helper()"]
    codeflash_output = _extract_test_method_name(lines) # 2.94μs -> 4.21μs (30.3% slower)

def test_simple_protected_method():
    # Test extraction of a protected method
    lines = ["protected int calculate()"]
    codeflash_output = _extract_test_method_name(lines) # 3.01μs -> 4.27μs (29.6% slower)

def test_static_method():
    # Test extraction of a static method
    lines = ["public static void staticTest()"]
    codeflash_output = _extract_test_method_name(lines) # 3.00μs -> 3.89μs (22.9% slower)

def test_final_method():
    # Test extraction of a final method
    lines = ["public final void finalTest()"]
    codeflash_output = _extract_test_method_name(lines) # 2.93μs -> 3.88μs (24.5% slower)

def test_method_with_parameters():
    # Test extraction of a method with multiple parameters
    lines = ["public void testWithParams(int a, String b, boolean c)"]
    codeflash_output = _extract_test_method_name(lines) # 2.89μs -> 3.75μs (23.0% slower)

def test_method_with_array_return_type():
    # Test extraction of a method returning an array
    lines = ["public int[] getArray()"]
    codeflash_output = _extract_test_method_name(lines) # 3.43μs -> 3.77μs (9.05% slower)

def test_method_with_custom_return_type():
    # Test extraction of a method with custom class return type
    lines = ["public MyClass getInstance()"]
    codeflash_output = _extract_test_method_name(lines) # 3.07μs -> 3.84μs (20.1% slower)

def test_multiline_method_declaration():
    # Test extraction from method split across multiple lines
    lines = ["public void", "testMultiline()"]
    codeflash_output = _extract_test_method_name(lines) # 3.15μs -> 3.97μs (20.7% slower)

def test_multiline_method_with_parameters():
    # Test extraction from method with parameters split across lines
    lines = ["public void", "testMethod(int x,", "String y)"]
    codeflash_output = _extract_test_method_name(lines) # 3.12μs -> 3.99μs (21.9% slower)

def test_method_with_extra_whitespace():
    # Test extraction with extra whitespace between tokens
    lines = ["public    void    spacedMethod  (  )"]
    codeflash_output = _extract_test_method_name(lines) # 3.05μs -> 3.85μs (20.8% slower)

def test_long_method_name():
    # Test extraction of method with long name
    lines = ["public void veryLongTestMethodNameWithManyCharacters()"]
    codeflash_output = _extract_test_method_name(lines) # 2.99μs -> 4.01μs (25.5% slower)

def test_method_with_underscores():
    # Test extraction of method name with underscores
    lines = ["public void test_method_with_underscores()"]
    codeflash_output = _extract_test_method_name(lines) # 2.96μs -> 4.06μs (27.2% slower)

def test_method_with_numbers_in_name():
    # Test extraction of method name with numbers
    lines = ["public void test123Method456()"]
    codeflash_output = _extract_test_method_name(lines) # 2.89μs -> 3.93μs (26.5% slower)

def test_boolean_return_type():
    # Test extraction with boolean return type
    lines = ["public boolean isValid()"]
    codeflash_output = _extract_test_method_name(lines) # 2.92μs -> 3.80μs (23.2% slower)

def test_double_return_type():
    # Test extraction with double return type
    lines = ["public double calculate()"]
    codeflash_output = _extract_test_method_name(lines) # 2.90μs -> 3.75μs (22.7% slower)

def test_float_return_type():
    # Test extraction with float return type
    lines = ["public float getValue()"]
    codeflash_output = _extract_test_method_name(lines) # 2.99μs -> 3.79μs (21.2% slower)

def test_char_return_type():
    # Test extraction with char return type
    lines = ["public char getChar()"]
    codeflash_output = _extract_test_method_name(lines) # 2.96μs -> 3.61μs (18.0% slower)

def test_byte_return_type():
    # Test extraction with byte return type
    lines = ["public byte getByte()"]
    codeflash_output = _extract_test_method_name(lines) # 3.04μs -> 3.76μs (19.2% slower)

def test_short_return_type():
    # Test extraction with short return type
    lines = ["public short getShort()"]
    codeflash_output = _extract_test_method_name(lines) # 2.94μs -> 3.75μs (21.6% slower)

def test_long_return_type():
    # Test extraction with long return type
    lines = ["public long getId()"]
    codeflash_output = _extract_test_method_name(lines) # 2.93μs -> 3.50μs (16.3% slower)

def test_combination_public_static_final():
    # Test extraction with all modifiers combined
    lines = ["public static final void allModifiers()"]
    codeflash_output = _extract_test_method_name(lines) # 2.90μs -> 3.94μs (26.5% slower)

def test_string_array_return_type():
    # Test extraction with String array return type
    lines = ["public String[] getStrings()"]
    codeflash_output = _extract_test_method_name(lines) # 3.41μs -> 3.81μs (10.5% slower)

def test_empty_list():
    # Test with empty method lines list
    lines = []
    codeflash_output = _extract_test_method_name(lines) # 1.12μs -> 1.17μs (4.27% slower)

def test_list_with_empty_strings():
    # Test with list containing only empty strings
    lines = ["", "", ""]
    codeflash_output = _extract_test_method_name(lines) # 1.38μs -> 1.32μs (4.54% faster)

def test_list_with_whitespace_only():
    # Test with list containing only whitespace
    lines = ["   ", "\t", "  \n  "]
    codeflash_output = _extract_test_method_name(lines) # 1.31μs -> 1.34μs (2.31% slower)

def test_no_method_signature():
    # Test with text that contains no method signature
    lines = ["this is just random text"]
    codeflash_output = _extract_test_method_name(lines) # 13.2μs -> 15.7μs (15.5% slower)

def test_fallback_pattern_simple():
    # Test fallback pattern with simple function-like text
    lines = ["someFunction()"]
    codeflash_output = _extract_test_method_name(lines) # 4.21μs -> 6.23μs (32.5% slower)

def test_fallback_pattern_with_arguments():
    # Test fallback pattern with function and arguments
    lines = ["myFunc(arg1, arg2)"]
    codeflash_output = _extract_test_method_name(lines) # 5.34μs -> 7.32μs (27.1% slower)

def test_fallback_pattern_with_spaces():
    # Test fallback pattern with spaces before parenthesis
    lines = ["myFunction  ()"]
    codeflash_output = _extract_test_method_name(lines) # 5.35μs -> 7.23μs (26.0% slower)

def test_method_with_no_visibility_modifier():
    # Test method without public/private/protected modifier
    lines = ["void packagePrivateMethod()"]
    codeflash_output = _extract_test_method_name(lines) # 2.94μs -> 4.65μs (36.8% slower)

def test_method_with_no_return_type_uses_fallback():
    # Test text that doesn't match primary pattern but matches fallback
    lines = ["testNoReturnType()"]
    codeflash_output = _extract_test_method_name(lines) # 4.36μs -> 6.22μs (30.0% slower)

def test_single_character_method_name():
    # Test extraction of single character method name
    lines = ["public void a()"]
    codeflash_output = _extract_test_method_name(lines) # 3.03μs -> 3.84μs (21.2% slower)

def test_single_line_with_newlines():
    # Test single line containing newline characters
    lines = ["public void test\nMethod()"]
    codeflash_output = _extract_test_method_name(lines) # 9.49μs -> 3.89μs (144% faster)

def test_tabs_as_whitespace():
    # Test extraction with tabs as whitespace
    lines = ["public\tvoid\ttabbedMethod()"]
    codeflash_output = _extract_test_method_name(lines) # 3.04μs -> 5.22μs (41.8% slower)

def test_method_with_java_generics_in_param():
    # Test method with generic parameters (may not match primary pattern)
    lines = ["public void testGenerics()"]
    codeflash_output = _extract_test_method_name(lines) # 2.92μs -> 4.04μs (27.8% slower)

def test_method_multiline_split_at_return_type():
    # Test method split with return type on separate line
    lines = ["public", "void", "methodName()"]
    codeflash_output = _extract_test_method_name(lines) # 3.20μs -> 4.03μs (20.6% slower)

def test_method_multiline_split_at_name():
    # Test method split with name on separate line
    lines = ["public void", "methodName()"]
    codeflash_output = _extract_test_method_name(lines) # 3.06μs -> 4.08μs (24.8% slower)

def test_multiple_parentheses():
    # Test text with multiple parentheses pairs
    lines = ["public void method() { test(); }"]
    # Should match the first method declaration
    codeflash_output = _extract_test_method_name(lines) # 3.02μs -> 3.78μs (20.2% slower)

def test_nested_parentheses_in_parameters():
    # Test method with complex parameter types
    lines = ["public void method(java.util.List list)"]
    codeflash_output = _extract_test_method_name(lines) # 2.96μs -> 3.72μs (20.2% slower)

def test_method_starting_with_number_not_matched():
    # Test that method names starting with numbers don't match in fallback
    lines = ["public void 1invalidName()"]
    # Regex requires method name to start with letter or underscore
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 2.94μs -> 3.81μs (22.6% slower)

def test_underscore_only_method_name():
    # Test method name that is only underscore
    lines = ["public void _()"]
    codeflash_output = _extract_test_method_name(lines) # 2.90μs -> 3.64μs (20.4% slower)

def test_mixed_case_method_name():
    # Test method name with mixed case
    lines = ["public void tEsT_MiXeD_CaSe()"]
    codeflash_output = _extract_test_method_name(lines) # 3.02μs -> 3.76μs (19.7% slower)

def test_method_with_deprecated_annotation_on_same_line():
    # Test when annotation is on same line as method
    lines = ["@Deprecated public void oldMethod()"]
    # Regex should still find the method name
    codeflash_output = _extract_test_method_name(lines) # 6.87μs -> 3.89μs (76.8% faster)

def test_method_declaration_with_throws_clause():
    # Test method with throws clause
    lines = ["public void testMethod() throws IOException"]
    codeflash_output = _extract_test_method_name(lines) # 3.03μs -> 3.75μs (19.2% slower)

def test_very_long_parameter_list():
    # Test method with many parameters
    lines = ["public void test(int a, int b, int c, int d, int e, int f, int g)"]
    codeflash_output = _extract_test_method_name(lines) # 2.94μs -> 3.70μs (20.3% slower)

def test_method_with_spaces_in_lines():
    # Test list where each element has leading/trailing spaces
    lines = ["  public void  ", "  spacedMethod()  "]
    codeflash_output = _extract_test_method_name(lines) # 3.28μs -> 4.11μs (20.3% slower)

def test_only_opening_parenthesis():
    # Test text with only opening parenthesis
    lines = ["method("]
    codeflash_output = _extract_test_method_name(lines) # 3.88μs -> 5.85μs (33.7% slower)

def test_only_closing_parenthesis():
    # Test text with only closing parenthesis - won't match
    lines = [")"]
    codeflash_output = _extract_test_method_name(lines) # 1.09μs -> 2.75μs (60.4% slower)

def test_method_name_with_all_digits():
    # Test that all-digit names don't match (not valid identifiers after first char)
    lines = ["public void test123()"]
    codeflash_output = _extract_test_method_name(lines) # 3.10μs -> 3.93μs (20.9% slower)

def test_common_java_primitive_arrays():
    # Test various array types as return values
    lines = ["public boolean[] getBooleans()"]
    codeflash_output = _extract_test_method_name(lines) # 3.56μs -> 3.91μs (8.98% slower)

def test_double_array_return():
    # Test double array return type
    lines = ["public double[] getDoubles()"]
    codeflash_output = _extract_test_method_name(lines) # 3.36μs -> 3.84μs (12.5% slower)

def test_float_array_return():
    # Test float array return type
    lines = ["public float[] getFloats()"]
    codeflash_output = _extract_test_method_name(lines) # 3.44μs -> 3.94μs (12.7% slower)

def test_char_array_return():
    # Test char array return type
    lines = ["public char[] getChars()"]
    codeflash_output = _extract_test_method_name(lines) # 3.35μs -> 3.69μs (9.22% slower)

def test_byte_array_return():
    # Test byte array return type
    lines = ["public byte[] getBytes()"]
    codeflash_output = _extract_test_method_name(lines) # 3.35μs -> 3.72μs (9.98% slower)

def test_short_array_return():
    # Test short array return type
    lines = ["public short[] getShorts()"]
    codeflash_output = _extract_test_method_name(lines) # 3.34μs -> 3.78μs (11.7% slower)

def test_long_array_return():
    # Test long array return type
    lines = ["public long[] getLongs()"]
    codeflash_output = _extract_test_method_name(lines) # 3.33μs -> 3.66μs (9.05% slower)

def test_custom_class_array_return():
    # Test custom class array return type
    lines = ["public MyClass[] getInstances()"]
    codeflash_output = _extract_test_method_name(lines) # 3.30μs -> 3.71μs (11.1% slower)

def test_many_lines_simple():
    # Test with 100 lines, each contributing to the method signature
    lines = ["public void"] + ["test" + str(i) for i in range(100)]
    # Since joined with spaces, the signature becomes complex
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 257μs -> 257μs (0.019% slower)

def test_large_multiline_method_declaration():
    # Test method declaration split across 50 lines
    lines = ["public"] + ["void"] * 10 + ["largeMethod"] + ["(int param,"] * 20 + ["String end)"]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 27.6μs -> 4.88μs (466% faster)

def test_many_parameter_lines():
    # Test method with parameter list spanning many lines
    lines = ["public void testMethod("]
    for i in range(500):
        lines.append(f"int param{i},")
    lines.append("String lastParam)")
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 7.34μs -> 8.38μs (12.3% slower)

def test_very_long_method_name():
    # Test with extremely long method name
    long_name = "test" + "a" * 500 + "Method"
    lines = [f"public void {long_name}()"]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 4.32μs -> 5.67μs (23.8% slower)

def test_1000_element_list():
    # Test with very large list of lines
    lines = ["public void"] + ["arg" + str(i) for i in range(1000)]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 2.50ms -> 2.49ms (0.285% faster)

def test_deeply_nested_signatures_fallback():
    # Test fallback performance with many method-like patterns
    lines = [
        "func1() func2() func3() func4() func5() " * 100
        + "targetMethod()"
    ]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 314μs -> 346μs (9.13% slower)

def test_alternating_modifiers_large():
    # Test with many modifier combinations
    lines = []
    for i in range(100):
        modifiers = ["public" if i % 2 == 0 else "private"]
        modifiers.append("static" if i % 3 == 0 else "")
        modifiers.append("final" if i % 5 == 0 else "")
        lines.append(" ".join(filter(None, modifiers)))
    lines.append("void testMethod()")
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 724μs -> 11.2μs (6373% faster)

def test_performance_many_calls():
    # Test performance by calling function many times with same input
    lines = ["public void performanceTest()"]
    for _ in range(1000):
        codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 779μs -> 991μs (21.3% slower)

def test_large_mixed_content():
    # Test with 500 lines of mixed content before method
    lines = ["random text line " + str(i) for i in range(500)]
    lines.append("public void actualMethod()")
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 3.76ms -> 13.9μs (27030% faster)

def test_all_return_types_in_sequence():
    # Test with all primitive types in a large list
    primitive_types = [
        "void", "String", "int", "long", "boolean", "double", "float",
        "char", "byte", "short"
    ]
    lines = []
    for i in range(100):
        return_type = primitive_types[i % len(primitive_types)]
        lines.append(f"public {return_type} method{i}()")
    
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 4.25μs -> 5.15μs (17.5% slower)

def test_unicode_in_comments_large():
    # Test performance with unicode characters in content (non-matching parts)
    lines = ["/* comment with unicode: éàù */ " * 100]
    lines.append("public void testUnicode()")
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 577μs -> 6.44μs (8867% faster)

def test_consecutive_method_declarations():
    # Test with multiple method declarations in sequence
    lines = [
        "public void first()",
        "public void second()",
        "public void third()"
    ]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 3.24μs -> 3.94μs (17.8% slower)

def test_very_large_single_line():
    # Test with one extremely long single line
    long_line = "public void " + "x" * 10000 + "()"
    lines = [long_line]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 28.0μs -> 34.1μs (17.9% slower)

def test_stress_regex_engine():
    # Test that could stress the regex engine
    lines = ["public void method"] + ["(param" * 100 + ")"
    ]
    codeflash_output = _extract_test_method_name(lines); result = codeflash_output # 3.24μs -> 4.11μs (21.2% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-20T22.53.40 and push.

Codeflash Static Badge

The optimized code achieves an **83% speedup** (from 10.0ms to 5.48ms) by introducing a fast-path heuristic that uses simple string operations (`find()`, `split()`, string slicing) to extract method names before falling back to expensive regex matching.

**Key optimization:** The code now checks for common Java modifiers (`public`, `private`, `protected`) and return types (`void`, `String`, `int`, etc.) using basic string scanning. When found, it extracts the method name by:
1. Finding the modifier/type using `str.find()` (much cheaper than regex)
2. Locating the opening parenthesis `(`
3. Splitting the substring and taking the last token before `(`
4. Validating it's a valid identifier with a simple regex check

**Why it's faster:** 
- Line profiler shows the original regex `_METHOD_SIG_PATTERN.search()` took **84%** of total time (10.18ms out of 12.11ms)
- In the optimized version, this regex is **only invoked for 18 out of 2084 calls** (0.9% hit rate), taking just 25.9% of total time
- For the remaining 99.1% of cases, the fast-path succeeds using simple string operations that are orders of magnitude faster than regex
- The fast-path successfully handles 2064 cases via modifier matching and 1 case via type matching, bypassing the expensive regex entirely

**Test results show the optimization excels when:**
- Working with large inputs: `test_large_mixed_content` shows **27,030% speedup** (3.76ms → 13.9μs)
- Processing bulk signatures: `test_alternating_modifiers_large` shows **6,373% speedup** (724μs → 11.2μs)  
- Handling multi-line declarations: `test_large_multiline_method_declaration` shows **466% speedup** (27.6μs → 4.88μs)
- Common Java patterns with standard modifiers and return types are accelerated

**Trade-offs:**
- Simple single-line cases show 20-30% slowdown (3-4μs → 4-6μs) due to fast-path overhead before regex fallback
- However, the overall workload improvement is dramatically positive (83% speedup), indicating the function is primarily called with signatures that benefit from the fast-path
- The optimization preserves exact behavior through careful fallback logic and validation
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 20, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

  • Formatting fix applied: Fixed slice formatting in instrumentation.py line 84 (s[idx + len(typ):]s[idx + len(typ) :]). Committed and pushed as 9b06e149.
  • Pre-existing issue (not from this PR): codeflash/languages/registry.py has 3× F401 (unused-import) warnings. These imports are side-effect imports required for language registration — removing them would break functionality. This is a false positive from ruff and predates this PR.
  • mypy: No issues found in codeflash/languages/java/instrumentation.py.

Code Review

No critical issues found. The optimization is sound:

  • Adds a fast-path heuristic using str.find() and str.split() before falling back to the original regex-based extraction
  • Validates candidates with _WORD_RE (^\w+$) before returning
  • Falls through to the original regex logic for edge cases not handled by the fast path
  • The break statements correctly prevent false matches from subsequent modifier/type tokens
  • No security vulnerabilities, no breaking API changes

Test Coverage

File Stmts Miss Coverage Status
codeflash/languages/java/instrumentation.py 556 100 82% New file (not on main)

New code analysis (lines 57-102, the optimization):

  • 26 lines covered by tests: Fast-path modifier detection (lines 62-76), type detection (lines 81-92)
  • 2 lines uncovered: break fallback paths (lines 77, 93) — reached only when fast-path finds a modifier/type but fails validation
  • 7 lines uncovered: Original regex fallback (lines 96-102) — now bypassed by the fast path for most test inputs
  • Coverage is above 75% threshold for new files

Note: Lines 96-102 (regex fallback) being uncovered is expected behavior — the fast path handles the common cases that tests exercise, which is the whole point of the optimization.

Optimization PRs

Checked 16 open codeflash-ai[bot] optimization PRs. None have fully passing CI (common failures: js-* e2e tests, code/snyk limits, some with prek/windows failures). No PRs merged.


Last updated: 2026-02-20T23:15:00Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants