fix: Java E2E pipeline — direct JVM benchmarking, JUnit detection, and instrumentation fixes by mashraf-222 · Pull Request #1580 · codeflash-ai/codeflash

mashraf-222 · 2026-02-20T05:53:43Z

Summary

This PR fixes 5 out of 9 critical bugs found during Java E2E optimization testing with the aerospike-client-java project. It builds on and supersedes the previous PR #1552, which was reverted due to accumulated complexity. This branch was rebuilt incrementally with E2E validation at each step.

Key results:

Direct JVM benchmarking now works end-to-end (was always falling back to Maven)
JUnit version detection fixed for multi-module Maven projects (64% of original failures)
Test execution overhead reduced from ~5-10s to ~0.65s per benchmark loop
Framework detection calls reduced from 303,000+ to 6 per optimization run
Test method names correctly stored in SQLite for behavior tests

Problems Fixed

Bug #7 (Critical): JUnit Version Detection Failure — 64% of all failures

Problem: The pom.xml parser only checked <dependencies>, missing test dependencies declared in <dependencyManagement> (common in multi-module Maven projects like Aerospike). This caused JUnit 4 projects to be incorrectly identified as JUnit 5, generating incompatible test code.

Root cause: _detect_test_deps_from_pom() in config.py didn't parse <dependencyManagement> sections or check submodule pom.xml files.

Fix:

Refactored _detect_test_deps_from_pom() to parse both <dependencies> and <dependencyManagement> sections using a shared check_dependencies() helper
Added recursive checking of submodule pom.xml files (test/, tests/, src/test/, testing/)
Changed default fallback from JUnit 5 to JUnit 4 (more common in legacy projects)
Added debug logging for framework detection decisions

Bug #3 (High): Direct JVM Execution Always Failing

Problem: Direct JVM benchmarking never worked — it always fell back to Maven. Three separate issues:

ConsoleLauncher not on classpath: junit-platform-console-standalone is a separate artifact not included in the normal dependency tree. mvn dependency:build-classpath doesn't output it, so ConsoleLauncher class was not found at runtime.
False JUnit 4 detection: The old code ran java -cp ... JUnitCore -version to detect JUnit 4. But JUnit 5 projects include JUnit 4 classes via junit-vintage-engine, so this check always returned true. JUnitCore then ran against JUnit 5 tests, found 0 tests, and triggered the Maven fallback.
Missing perf_stdout: After direct JVM execution, perf_stdout wasn't populated from the subprocess result, so the throughput calculation pipeline had no timing data.

Fixes:

Added _find_junit_console_standalone() to locate the JAR in ~/.m2/repository. If not present, downloads via mvn dependency:get. Appends to classpath in _get_test_classpath().
Replaced subprocess-based JUnit detection with classpath string inspection: checks for junit-jupiter, junit-platform, or console-standalone in the classpath. This is deterministic, instant, and not fooled by vintage-engine compatibility classes. Addresses PR #1552 review comment about avoiding per-execution detection overhead.
Added multi-module classpath support: includes target/classes from sibling modules for projects where test modules depend on other modules.
Fixed perf_stdout capture in parse_test_output.py to extract stdout from subprocess results for Java performance tests.

Bug #6 (High): Instrumentation Breaking Complex Expressions

Problem: Timing markers were inserted inside cast expressions, ternary operators, array access, and other complex expressions, causing "not a statement" compilation errors (e.g., (Long)list.get(2) broken by instrumentation).

Fix:

Added _is_inside_complex_expression() that walks the tree-sitter AST upward to detect problematic parent types: cast_expression, ternary_expression, array_access, binary_expression, unary_expression, parenthesized_expression, instanceof_expression.
Stops at statement boundaries to avoid false positives.
Both behavior and timing instrumentation now skip calls inside complex expressions.
Added in_complex flag to the call collection pipeline.

Bug #2 (Medium-High): Extremely Slow rglob Calls

Problem: resolve_test_file_from_class_path() was called for every timing marker without caching, causing 43+ rglob calls per optimization on large projects.

Fix:

Added _test_file_path_cache: dict[tuple[str, Path], Path | None] module-level cache
Caches both positive and negative lookup results
All resolution paths (direct match, rglob fallback, Java-specific) store results in cache

Pre-existing Bug: TestConfig.test_framework Uncached Property

Problem: TestConfig.test_framework was an uncached @property that re-detected the framework on every access. During test result parsing, each test result accessed this property, causing 303,000+ calls to _detect_test_deps_from_pom() and 2.3M lines of output per optimization run. This bug exists on omni-java and was not introduced by this branch.

Fix:

Added _test_framework: Optional[str] = None cache field to TestConfig
Property now returns cached value after first detection
Also changed default fallback in _detect_java_test_framework() from junit5 to junit4

Behavior Test Method Name Fix

Problem: SQLite setString(3, ...) in behavior instrumentation used a hardcoded "{class_name}Test" string instead of the actual test method name. This meant all test results mapped to the same identifier, losing per-test granularity.

Fix:

Added _extract_test_method_name() with two regex patterns: _METHOD_SIG_PATTERN (full Java method signature) and _FALLBACK_METHOD_PATTERN (simple name extraction)
Each instrumented test method now declares String _cf_test{N} = "{methodName}" and uses it in the SQLite insert
Added bisect optimization for _byte_to_line_index() (O(log n) vs O(n) per call)

Code Changes

File	Lines	Description
`codeflash/languages/java/config.py`	+71/-27	Parse `<dependencyManagement>`, check submodule pom.xml files, change default to JUnit 4
`codeflash/languages/java/test_runner.py`	+139/-37	`_find_junit_console_standalone()`, classpath string JUnit detection, multi-module classpath, `--add-opens` for JUnit 4 path
`codeflash/languages/java/instrumentation.py`	+76/-11	`_is_inside_complex_expression()`, `_extract_test_method_name()`, `in_complex` flag, `bisect` for line index, correct SQLite test name
`codeflash/verification/verification_utils.py`	+12/-4	Cache `test_framework` property, change default fallback to JUnit 4
`codeflash/verification/parse_test_output.py`	+28/-2	Path resolution cache, `perf_stdout` capture for Java performance tests
`tests/test_languages/test_java/test_instrumentation.py`	+16/-8	Updated 8 expected strings for `_cf_test` variable and `setString(3, ...)`

Other files with minor formatting changes (from pre-commit):

codeflash/cli_cmds/console.py, codeflash/cli_cmds/logging_config.py, codeflash/context/code_context_extractor.py, codeflash/languages/java/context.py, codeflash/optimization/function_optimizer.py, codeflash/verification/parse_line_profile_test_output.py

Testing

E2E Validation (Fibonacci — JUnit 5, single-module)

cd code_to_optimize/java/
CODEFLASH_CFAPI_SERVER=local CODEFLASH_AIS_SERVER=local \
  uv run codeflash --file src/main/java/com/example/Fibonacci.java --function fibonacci --verbose --no-pr

JUnit 5 correctly detected on every invocation
Zero Maven fallbacks — all benchmark loops via direct JVM ConsoleLauncher
5,817% speedup found and accepted
Loop times: ~0.65s (was ~5-10s with Maven fallback)
Framework detection: 6 calls (was 303,000+)

E2E Validation (BubbleSort — exercises instrumentation + measurement)

uv run codeflash --file src/main/java/com/example/BubbleSort.java --function bubbleSort --verbose --no-pr

JUnit 5 correctly detected, zero Maven fallbacks
11 optimization candidates tested, all correctness tests passed
No optimization found (expected — BubbleSort near-optimal for test inputs)
Pipeline completed cleanly with no errors

Unit Tests

All 41 Java instrumentation tests pass (tests/test_languages/test_java/test_instrumentation.py)
Updated 8 expected strings for _cf_test variable and setString(3, _cf_test{N})

Performance Impact

Metric	Before	After	Improvement
Benchmark loop time	~5-10s (Maven)	~0.65s (direct JVM)	8-15x faster
Framework detection calls/run	303,000+	6	50,000x fewer
Verbose output lines/run	2,300,000+	~13,000	177x smaller
Maven fallback rate	100%	0%	Eliminated
rglob calls per function	43+ uncached	1 + cache hits	Eliminated redundancy

Known Issues Not Addressed

Bug setup github actions #1: AI generates implicit int-to-byte conversions — requires AI service fix
Bug function optimizer refactor, updated #5: JaCoCo XML parsed repeatedly — low priority
Bug end to end test with prod aiserver #8: AI response truncation — requires AI service fix
Bug Test perf only after behavior passes #9: Generated test file cleanup — requires architecture change
Aerospike Surefire <includes> bypass: Behavior mode still uses Maven (not direct JVM). Aerospike's custom Surefire <includes> config blocks -Dtest=... filtering. Plan documented but not implemented in this PR.

Relationship to PR #1552

This PR supersedes #1552. The original branch had 12 commits that were reverted due to accumulated debug logging and regressions. This branch was reset to the last known-good commit and rebuilt with:

No debug/investigation commits
Each fix independently validated via E2E
Improved JUnit detection (classpath string check vs subprocess probing)
ConsoleLauncher classpath fix (not in fix: Java E2E optimization pipeline issues - 64% failure reduction, 10-20x speedup #1552)
TestConfig caching fix (not in fix: Java E2E optimization pipeline issues - 64% failure reduction, 10-20x speedup #1552)

🤖 Generated with Claude Code

codeflash-ai · 2026-02-20T06:12:37Z

codeflash/languages/java/instrumentation.py

+_FALLBACK_METHOD_PATTERN = re.compile(r"\b(\w+)\s*\(")
+
+
+def _extract_test_method_name(method_lines: list[str]) -> str:


⚡️Codeflash found 58% (0.58x) speedup for _extract_test_method_name in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 11.1 milliseconds → 6.98 milliseconds (best of 165 runs)

📝 Explanation and details

This optimization achieves a 58% runtime improvement (from 11.1ms to 6.98ms) by exploiting a common-case optimization: searching for method signatures line-by-line before falling back to the expensive string join operation.

Key Changes:

Early line-by-line search: The optimized code first iterates through individual lines, attempting to match _METHOD_SIG_PATTERN on each line separately. This allows the function to return immediately when a method signature is found on a single line.

Deferred string joining: The expensive " ".join(method_lines).strip() operation is only performed if the line-by-line search fails to find a match. This saves both string allocation and regex execution time in the common case.

Why This Is Faster:

String joining is expensive: Creating a single concatenated string from multiple lines involves memory allocation and copying. The line profiler shows the original code spent 4.3% of time just on the join operation.

Regex on smaller strings is faster: Running the regex on individual lines (typically short) is much faster than running it on one large joined string. The optimized version processes 5,579 individual line searches (8.2ms total) versus 1,070 searches on joined strings (11.4ms in original).

Early exit opportunity: For method signatures that appear on a single line (the common case), the optimized code returns immediately after finding the match, skipping the join entirely.

Performance Characteristics:

The annotated tests reveal the optimization excels when:

Method signatures are on single lines (8-13% faster): Most real-world Java methods have their signature on one line

Large input lists with early matches (100-300% faster): When the signature appears early in a large list, the line-by-line search finds it quickly without processing remaining lines or joining them

Multiple valid signatures (311% faster for 1000 repetitions): Early exit prevents unnecessary work

The optimization performs slightly worse when:

Signatures span multiple lines (50-75% slower): The line-by-line search fails, then falls back to the join approach, adding overhead

Fallback pattern is needed (17-40% slower): Similar double-work scenario when primary pattern fails

Input is all invalid (87% slower for 1000 empty strings): Must scan all lines before falling back

However, the runtime metric shows these edge cases are rare in practice—the overall 58% speedup indicates the common case (single-line signatures) dominates real workloads.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 1070 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.java.instrumentation import _extract_test_method_name def test_basic_public_void_signature(): # A simple, common Java test method signature should extract the method name. method_lines = ["public void testSomething() {"] # Call the function with a typical signature and assert the expected name. codeflash_output = _extract_test_method_name(method_lines) # 3.24μs -> 2.88μs (12.5% faster) def test_modifiers_and_return_type_with_parameters(): # Signatures with multiple modifiers and a primitive return type should work. method_lines = ["private static final int computeSum(int a, int b) throws Exception {"] # The regex captures the method name "computeSum". codeflash_output = _extract_test_method_name(method_lines) # 3.15μs -> 2.90μs (8.67% faster) def test_array_return_type_and_annotation_above(): # An annotation line followed by a protected array return type should still match. method_lines = ["@Override", "protected String[] getArray() {"] # Verify the array-returning method name is correctly extracted. codeflash_output = _extract_test_method_name(method_lines) # 8.01μs -> 4.90μs (63.4% faster) def test_generic_return_type_falls_back_to_simple_pattern(): # Generic return types (e.g., List<String>) contain '<' and won't match the primary pattern. # The fallback pattern should still find the method name. method_lines = ["public List<String> listMethod() {"] codeflash_output = _extract_test_method_name(method_lines) # 10.7μs -> 16.0μs (33.1% slower) def test_no_modifiers_only_name_and_params(): # A minimal signature with only the method name and parentheses should be caught by fallback. method_lines = ["doWork()"] codeflash_output = _extract_test_method_name(method_lines) # 3.92μs -> 4.75μs (17.5% slower) def test_underscore_and_digits_in_method_name(): # Method names can include underscores and digits; ensure such names are handled. method_lines = ["int _do_42() {"] codeflash_output = _extract_test_method_name(method_lines) # 2.87μs -> 2.61μs (9.60% faster) def test_empty_input_returns_unknown(): # An empty list yields an empty string after joining; neither regex will match. method_lines: list[str] = [] codeflash_output = _extract_test_method_name(method_lines) # 1.23μs -> 1.20μs (2.41% faster) def test_whitespace_only_returns_unknown(): # Input that is only whitespace should also return "unknown". method_lines = [" ", "\t"] codeflash_output = _extract_test_method_name(method_lines) # 1.36μs -> 1.67μs (18.5% slower) def test_missing_parentheses_returns_unknown(): # If there is no '(' in the text, the fallback cannot match, so result is "unknown". method_lines = ["public void noParen"] codeflash_output = _extract_test_method_name(method_lines) # 12.9μs -> 20.5μs (36.9% slower) def test_multiple_methods_first_is_taken(): # If multiple method signatures are present, the first match should be returned. method_lines = ["public void first() { } public void second() { }"] # Expect the first method name, not the second. codeflash_output = _extract_test_method_name(method_lines) # 3.02μs -> 2.71μs (11.4% faster) def test_multidimensional_array_return_uses_fallback(): # The primary pattern supports a single [] only; int[][] won't match it. # The fallback should still capture the method name. method_lines = ["public int[][] multiArray() {"] codeflash_output = _extract_test_method_name(method_lines) # 9.65μs -> 14.1μs (31.7% slower) def test_method_signature_split_across_lines(): # Signatures split across multiple lines should be joined and still match. method_lines = ["public", "void", "splitAcrossLines(", ") {"] # Despite splitting, the join produces a single string that the primary regex can parse. codeflash_output = _extract_test_method_name(method_lines) # 3.46μs -> 7.44μs (53.6% slower) def test_large_input_many_filler_lines_and_one_signature(): # Build 1000 filler lines deterministically to test scalability. filler = [f"int fillerVar{i};" for i in range(499)] # Insert a target signature in the middle. target_signature = ["public boolean largeScaleMethod(int x) {"] filler2 = [f"// comment line {i}" for i in range(499)] method_lines = filler + target_signature + filler2 # The function should find the method name that was placed in the middle. codeflash_output = _extract_test_method_name(method_lines) # 2.01ms -> 1.97ms (1.66% faster) def test_many_calls_loop_for_determinism_and_performance(): # Call the function 1000 times with predictable varying method names to ensure determinism. for i in range(1000): # Construct a unique but deterministic signature for each iteration. name = f"repeatedMethod_{i}" lines = [f"public void {name}() {{"] # simple signature per iteration # Each call must return the exact method name we encoded. codeflash_output = _extract_test_method_name(lines) # 788μs -> 781μs (0.864% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest from codeflash.languages.java.instrumentation import _extract_test_method_name def test_basic_public_method(): """Test extraction of a simple public method name.""" method_lines = ["public void testExample() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.62μs -> 3.29μs (10.1% faster) def test_basic_private_method(): """Test extraction of a simple private method name.""" method_lines = ["private void testHelper() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.22μs -> 2.88μs (11.5% faster) def test_basic_protected_method(): """Test extraction of a simple protected method name.""" method_lines = ["protected void testProtected() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.33μs -> 2.99μs (11.4% faster) def test_static_method(): """Test extraction of a static method name.""" method_lines = ["public static void testStatic() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.16μs -> 2.81μs (12.1% faster) def test_final_method(): """Test extraction of a final method name.""" method_lines = ["public final void testFinal() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.90μs -> 2.83μs (2.48% faster) def test_static_final_method(): """Test extraction of a static final method name.""" method_lines = ["public static final void testStaticFinal() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.10μs -> 2.74μs (13.1% faster) def test_method_with_parameters(): """Test extraction of method name when method has parameters.""" method_lines = ["public void testWithParams(String arg1, int arg2) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.99μs -> 2.76μs (8.35% faster) def test_method_with_string_return_type(): """Test extraction of method name with String return type.""" method_lines = ["public String getTestName() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.93μs -> 2.79μs (5.06% faster) def test_method_with_int_return_type(): """Test extraction of method name with int return type.""" method_lines = ["public int calculateValue() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.92μs -> 2.74μs (6.91% faster) def test_method_with_boolean_return_type(): """Test extraction of method name with boolean return type.""" method_lines = ["public boolean isValid() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.94μs -> 2.71μs (8.87% faster) def test_method_with_custom_return_type(): """Test extraction of method name with custom return type.""" method_lines = ["public MyClass testCustom() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.15μs -> 2.96μs (6.10% faster) def test_method_with_array_return_type(): """Test extraction of method name with array return type.""" method_lines = ["public int[] testArray() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.53μs -> 3.22μs (9.67% faster) def test_method_with_multiple_spaces(): """Test extraction when method signature has multiple spaces.""" method_lines = ["public void testMultiSpace() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.98μs -> 2.79μs (6.86% faster) def test_method_split_across_lines(): """Test extraction when method signature is split across multiple lines.""" method_lines = ["public void", "testMultiLine(", "String arg) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.33μs -> 12.8μs (74.0% slower) def test_method_with_no_visibility_modifier(): """Test extraction of method without visibility modifier.""" method_lines = ["void testNoModifier() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.91μs -> 2.56μs (13.3% faster) def test_method_with_leading_whitespace(): """Test extraction when method line has leading whitespace.""" method_lines = [" public void testIndented() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.10μs -> 2.87μs (8.06% faster) def test_method_with_trailing_whitespace(): """Test extraction when method line has trailing whitespace.""" method_lines = ["public void testTrailing() "] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.03μs -> 2.69μs (12.3% faster) def test_method_with_underscore_in_name(): """Test extraction of method with underscore in name.""" method_lines = ["public void test_method_name() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.05μs -> 2.81μs (8.59% faster) def test_method_with_numbers_in_name(): """Test extraction of method with numbers in name.""" method_lines = ["public void test123Method456() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.99μs -> 2.79μs (7.22% faster) def test_method_with_long_parameter_list(): """Test extraction when method has many parameters.""" method_lines = ["public void testLongParams(String a, int b, boolean c, double d, float e) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.88μs -> 2.76μs (4.32% faster) def test_method_name_only_lowercase(): """Test extraction of method with all lowercase name.""" method_lines = ["public void testmethod() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.96μs -> 2.75μs (7.69% faster) def test_method_name_mixed_case(): """Test extraction of method with mixed case name.""" method_lines = ["public void TeStMeThOd() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.94μs -> 2.71μs (8.54% faster) def test_empty_method_lines_list(): """Test extraction from empty method lines list.""" method_lines = [] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 1.26μs -> 1.29μs (2.40% slower) def test_single_empty_string(): """Test extraction from list with single empty string.""" method_lines = [""] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 1.14μs -> 1.37μs (16.8% slower) def test_method_lines_with_only_whitespace(): """Test extraction from lines containing only whitespace.""" method_lines = [" ", " ", " "] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 1.37μs -> 1.85μs (25.9% slower) def test_no_parentheses(): """Test extraction when method line has no parentheses.""" method_lines = ["public void testNoParens"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 15.9μs -> 26.0μs (38.9% slower) def test_no_method_signature(): """Test extraction when line doesn't contain valid method signature.""" method_lines = ["This is just random text without method"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 18.9μs -> 31.8μs (40.7% slower) def test_fallback_pattern_simple_case(): """Test fallback pattern when primary pattern doesn't match.""" method_lines = ["someMethod(arg) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 5.17μs -> 7.21μs (28.3% slower) def test_fallback_pattern_with_spaces(): """Test fallback pattern with spaces before parentheses.""" method_lines = ["someFunction (param)"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 7.07μs -> 10.6μs (33.4% slower) def test_method_line_with_comments(): """Test extraction when method line contains comments.""" method_lines = ["public void testWithComment() { // this is a comment"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.05μs -> 2.83μs (7.44% faster) def test_method_with_throws_clause(): """Test extraction when method has throws clause.""" method_lines = ["public void testThrows() throws IOException {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.95μs -> 2.65μs (11.0% faster) def test_method_with_annotation_on_same_line(): """Test extraction when annotation is on same line as method.""" method_lines = ["@Test public void testAnnotated() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 6.45μs -> 6.16μs (4.72% faster) def test_very_long_method_name(): """Test extraction with very long method name.""" long_name = "a" * 1000 method_lines = [f"public void {long_name}() {{"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 5.52μs -> 5.33μs (3.58% faster) def test_method_name_with_dollar_sign(): """Test extraction when method name contains dollar sign.""" method_lines = ["public void test$Method() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 12.9μs -> 20.2μs (36.3% slower) def test_multiple_method_signatures_in_line(): """Test extraction when line contains multiple method patterns.""" method_lines = ["void method1() { void method2() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.96μs -> 2.67μs (11.3% faster) def test_method_with_double_array(): """Test extraction with double array return type.""" method_lines = ["public int[][] testDoubleArray() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 10.1μs -> 15.2μs (33.8% slower) def test_method_with_tab_characters(): """Test extraction when method uses tabs instead of spaces.""" method_lines = ["public\tvoid\ttestTabs() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.85μs -> 2.75μs (3.63% faster) def test_method_with_newlines_in_list(): """Test extraction with newlines distributed across list elements.""" method_lines = [ "public", "void", "testNewlines", "(", ")", "{" ] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.25μs -> 7.51μs (56.8% slower) def test_fallback_with_underscore_prefix(): """Test fallback pattern with underscore-prefixed method name.""" method_lines = ["_privateHelper(x) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 5.14μs -> 7.26μs (29.3% slower) def test_empty_parameter_list(): """Test method with empty parameter list.""" method_lines = ["public void testEmpty() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.11μs -> 2.77μs (12.3% faster) def test_method_with_generic_return_type(): """Test extraction with generic return type (treated as custom type).""" method_lines = ["public List testGeneric() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.20μs -> 2.92μs (9.60% faster) def test_multiple_visibility_keywords(): """Test line with multiple visibility keywords (malformed).""" method_lines = ["public private void testBadFormat() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 8.66μs -> 8.53μs (1.51% faster) def test_method_line_with_javadoc_marker(): """Test extraction when line has javadoc end marker.""" method_lines = ["*/ public void testAfterJavadoc() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.09μs -> 2.85μs (8.47% faster) def test_single_character_method_name(): """Test extraction with single character method name.""" method_lines = ["public void x() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.86μs -> 2.79μs (2.55% faster) def test_method_name_all_caps(): """Test extraction with all uppercase method name.""" method_lines = ["public void TEST() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 3.02μs -> 2.73μs (10.3% faster) def test_very_large_method_lines_list(): """Test extraction with very large number of lines.""" # Create a list with 1000 lines, method definition at the beginning method_lines = ["public void testLargeList() {"] + [" // line of code" for _ in range(999)] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 11.0μs -> 2.71μs (304% faster) def test_very_long_joined_string(): """Test extraction when lines are joined into a very long string.""" # Create 1000 lines that will be joined method_lines = ["public void testLongJoin("] + [f"arg{i}," for i in range(999)] + ["arg999) {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 11.4μs -> 2.92μs (290% faster) def test_method_with_many_parameters(): """Test extraction from method with large number of parameters.""" params = ", ".join([f"param{i}" for i in range(100)]) method_lines = [f"public void testManyParams({params}) {{"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.94μs -> 2.75μs (7.29% faster) def test_repeated_pattern_in_method_lines(): """Test extraction when similar patterns are repeated many times.""" method_lines = ["public void foo() {"] * 1000 # Should match the first occurrence codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 10.8μs -> 2.62μs (311% faster) def test_very_long_return_type(): """Test with very long custom return type name.""" long_type = "VeryLongCustomTypeNameWith" + "X" * 900 method_lines = [f"public {long_type} testLongType() {{"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 5.48μs -> 5.15μs (6.45% faster) def test_large_list_with_method_at_end(): """Test extraction when method definition appears at end of large list.""" method_lines = ["random text"] * 999 + ["public void testAtEnd() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 4.19ms -> 2.07ms (102% faster) def test_large_list_with_method_in_middle(): """Test extraction when method definition is in middle of large list.""" method_lines = ["random text"] * 500 + ["public void testInMiddle() {"] + ["random text"] * 499 codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 2.10ms -> 1.03ms (104% faster) def test_thousand_element_list_all_empty(): """Test extraction from list with 1000 empty strings.""" method_lines = [""] * 1000 codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 8.06μs -> 65.5μs (87.7% slower) def test_thousand_different_method_names(): """Test extraction correctly identifies first method among many patterns.""" method_lines = [f"void method{i}() {{" for i in range(1000)] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 11.4μs -> 2.90μs (292% faster) def test_pattern_at_very_end_of_very_long_string(): """Test extraction when pattern appears at the very end of a large joined string.""" # Create lines that when joined will have method signature at the very end method_lines = ["x"] * 999 + ["public void testAtEnd() {"] codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 950μs -> 60.6μs (1468% faster) def test_alternating_valid_invalid_lines(): """Test extraction from alternating valid and invalid lines.""" method_lines = [] for i in range(500): method_lines.append("invalid line") method_lines.append(f"public void method{i}() {{") codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 17.1μs -> 6.63μs (158% faster) def test_many_false_positive_patterns(): """Test extraction when many similar patterns exist but most are malformed.""" method_lines = [] # Add many lines with word( pattern but no valid method signature for i in range(500): method_lines.append(f"word{i}(param)") # Add valid method at end method_lines.append("public void testValid() {") codeflash_output = _extract_test_method_name(method_lines); result = codeflash_output # 687μs -> 644μs (6.62% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1580-2026-02-20T06.12.37

Suggested change

def _extract_test_method_name(method_lines: list[str]) -> str:

def _extract_test_method_name(method_lines: list[str]) -> str:

for line in method_lines:

match = _METHOD_SIG_PATTERN.search(line)

if match:

return match.group(1)

codeflash/languages/java/test_runner.py

claude · 2026-02-20T06:15:45Z

codeflash/languages/java/config.py

+        # Check common submodule locations
+        for submodule_name in ["test", "tests", "src/test", "testing"]:
+            submodule_pom = project_root / submodule_name / "pom.xml"
+            if submodule_pom.exists():
+                logger.debug(f"Checking submodule pom at {submodule_pom}")
+                sub_junit5, sub_junit4, sub_testng = _detect_test_deps_from_pom(project_root / submodule_name)


✅ Resolved — the recursion is now bounded to specific hardcoded directory names (["test", "tests", "src/test", "testing"]) with early break on first match. Practical recursion depth is at most 1 level.

claude · 2026-02-20T06:16:15Z

PR Review Summary

Prek Checks

Status: Fixed and passing

1 formatting issue found in codeflash/languages/java/instrumentation.py (long function signature line)
Auto-fixed and committed: 5346cabe (style: auto-fix linting issues)
Ruff check: Passed
Ruff format: Passed

Mypy

Pre-existing mypy errors across multiple files. No new type errors introduced by this PR.

Code Review (Re-review after latest commits)

Resolved previous issues:

Cache clearing for _test_file_path_cache — resolved via clear_test_file_path_cache() in optimizer.py
JUnitCore reports_dir warning — resolved via explicit debug logging
Variable naming clarity — resolved with inline documentation

Remaining items from previous review (still valid):

parenthesized_expression in _is_inside_complex_expression may over-filter valid instrumentation targets (comment #2832061950)
Reviewer feedback on unique test identifiers (comment #2832083123 by @misrasaurabh1)

New observations (low priority):

_TEST_ANNOTATION_RE regex (instrumentation.py:67) is now dead code after _is_test_annotation was rewritten to use string operations
Multi-module classpath discovery iterates all directories in project_root (test_runner.py:586-593) — could accidentally include unrelated module classpath entries; filtering to directories with pom.xml/build.gradle would be safer

Test Coverage

File	PR	Main	Delta
`java/config.py`	85%	N/A (new)	—
`java/instrumentation.py`	82%	N/A (new)	—
`java/support.py`	63%	N/A (new)	⚠️ Below 75%
`java/test_runner.py`	42%	N/A (new)	⚠️ Below 75%
`models/models.py`	79%	78%	+1%
`optimization/optimizer.py`	20%	19%	+1%
`verification/coverage_utils.py`	52%	22%	+30%
`verification/parse_test_output.py`	53%	58%	-5% ⚠️
`verification/verification_utils.py`	54%	61%	-7% ⚠️
TOTAL	57%	52%	+5%

Notes:

New Java files (test_runner.py at 42%, support.py at 63%) are below the 75% threshold. These require Java infrastructure (Maven, JDK) for integration testing, which limits unit test coverage.
parse_test_output.py (-5%) and verification_utils.py (-7%) show coverage decreases due to new code paths (cache layer, test framework caching) not fully exercised by existing tests.
Overall coverage increased by +5% driven by new Java test coverage and coverage_utils.py improvements (+30%).
33 pre-existing test failures on PR branch (mostly test_tracer.py), same 8 failures on main.

Last updated: 2026-02-20T20:18:00Z

codeflash-ai · 2026-02-20T06:28:08Z

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for `_is_inside_complex_expression` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 181 microseconds → 156 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _is_inside_complex_expression by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1584

If you approve, it will be merged into this PR (branch fix/java-direct-jvm-and-bugs).

claude · 2026-02-20T08:39:20Z

codeflash/languages/java/test_runner.py

+    if is_junit4:
+        # Use JUnit 4's JUnitCore runner
+        cmd = [
+            str(java),
+            # Java 16+ module system: Kryo needs reflective access to internal JDK classes
+            "--add-opens",
+            "java.base/java.util=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.lang=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.lang.reflect=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.io=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.math=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.net=ALL-UNNAMED",
+            "--add-opens",
+            "java.base/java.util.zip=ALL-UNNAMED",
+            "-cp",
+            classpath,
+            "org.junit.runner.JUnitCore",
+        ]
+        # Add test classes
+        cmd.extend(test_classes)


✅ Resolved in latest commit — the JUnitCore branch now has explicit debug logging (lines 700-704) warning that XML report generation is not supported with JUnitCore and that reports_dir is being ignored.

claude · 2026-02-20T08:39:26Z

codeflash/languages/java/test_runner.py

+    has_console_launcher = "console-standalone" in classpath or "ConsoleLauncher" in classpath
+    # Use ConsoleLauncher if available (works for both JUnit 4 via vintage and JUnit 5).
+    # Only use JUnitCore when ConsoleLauncher is not on the classpath at all.
+    is_junit4 = not has_console_launcher


✅ Resolved in latest commit — the variable now has extensive inline documentation explaining the logic (lines 679-689), making the intent clear despite the name.

claude · 2026-02-20T08:39:33Z

codeflash/languages/java/instrumentation.py

+        if current.type in {
+            "cast_expression",
+            "ternary_expression",
+            "array_access",
+            "binary_expression",
+            "unary_expression",
+            "parenthesized_expression",
+            "instanceof_expression",
+        }:
+            logger.debug("Found complex expression parent: %s", current.type)
+            return True


Potential over-filtering: Including parenthesized_expression in the complex expression set may skip valid instrumentation targets. In Java tree-sitter, a simple (func()) is parsed as a parenthesized_expression. Patterns like assertTrue((calculator.add(1, 2)) > 0) would cause the target call to be silently skipped for instrumentation. Consider whether parenthesized_expression should be excluded from this check, or add a depth-limited check that only triggers for parenthesized expressions that are themselves inside another complex expression.

misrasaurabh1 · 2026-02-20T08:45:31Z

tests/test_languages/test_java/test_instrumentation.py

        String _cf_outputFile1 = System.getenv("CODEFLASH_OUTPUT_FILE");
        String _cf_testIteration1 = System.getenv("CODEFLASH_TEST_ITERATION");
        if (_cf_testIteration1 == null) _cf_testIteration1 = "0";
+        String _cf_test1 = "testNegativeInput_ThrowsIllegalArgumentException";


i still think this identifier isn't enough. we need a unique identifier that if we see a given marker, we know a completely unique identity for that run. so uniqueness for a test invocation is defined by
test_file -> test_class -> test_function -> test_line_node ->run_counter.
This is largely how python instrumentation works. currently i see that the module name isn't unique (since it lacks the __perfinstrumented section - it may be fine to skip this but then look at what the python instrumentation does for a reference implementation).
There are other things we will fix shortly, the line numbers (needed for runtime annotations on the tests we display on PRs). will test more

misrasaurabh1 · 2026-02-20T08:46:50Z

we would also need the same test function name in the performance instrumentation

claude · 2026-02-20T08:58:14Z

codeflash/verification/parse_test_output.py



+# Cache for resolved test file paths to avoid repeated rglob calls
+_test_file_path_cache: dict[tuple[str, Path], Path | None] = {}


✅ Resolved in latest commit — clear_test_file_path_cache() function was added and is called in optimizer.py:run() after each optimization iteration cleanup.

claude · 2026-02-20T08:58:22Z

codeflash/languages/java/test_runner.py

+    else:
+        logger.debug("JUnit 4 project, using ConsoleLauncher (via vintage engine)")
+
+    if is_junit4:


✅ Resolved in latest commit — the JUnitCore branch now has explicit debug logging (line 700-704) warning that XML report generation is not supported with JUnitCore and that reports_dir is ignored.

codeflash-ai · 2026-02-20T09:12:34Z

⚡️ Codeflash found optimizations for this PR

📄 17% (0.17x) speedup for `_is_inside_lambda` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 1.05 milliseconds → 894 microseconds (best of 34 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _is_inside_lambda by 17% in PR #1580 (fix/java-direct-jvm-and-bugs) #1594

If you approve, it will be merged into this PR (branch fix/java-direct-jvm-and-bugs).

codeflash-ai · 2026-02-20T09:27:01Z

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for `_add_timing_instrumentation` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 10.2 milliseconds → 8.81 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _add_timing_instrumentation by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1595

If you approve, it will be merged into this PR (branch fix/java-direct-jvm-and-bugs).

misrasaurabh1 · 2026-02-20T09:39:01Z

in behavior instrumentation we are still not printing the test class +fn name in stdout markers

int _cf_loop5 = Integer.parseInt(System.getenv("CODEFLASH_LOOP_INDEX"));
        int _cf_iter5 = 5;
        String _cf_mod5 = "CryptoTest";
        String _cf_cls5 = "CryptoTest";
        String _cf_fn5 = "computeDigest";
        String _cf_outputFile5 = System.getenv("CODEFLASH_OUTPUT_FILE");
        String _cf_testIteration5 = System.getenv("CODEFLASH_TEST_ITERATION");
        if (_cf_testIteration5 == null) _cf_testIteration5 = "0";
        String _cf_test5 = "testComputeDigest_EmptySetNameAndEmptyStringKey_Returns20ByteDigest";
        System.out.println("!$######" + _cf_mod5 + ":" + _cf_cls5 + ":" + _cf_fn5 + ":" + _cf_loop5 + ":" + _cf_iter5 + "######$!");

_cf_test5 is not present in the stdout

codeflash-ai · 2026-02-20T09:54:15Z

⚡️ Codeflash found optimizations for this PR

📄 17% (0.17x) speedup for `_get_qualified_name` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 2.86 milliseconds → 2.45 milliseconds (best of 93 runs)

A new Optimization Review has been created.

🔗 Review here

codeflash-ai · 2026-02-20T10:00:37Z

⚡️ Codeflash found optimizations for this PR

📄 197% (1.97x) speedup for `_add_behavior_instrumentation` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 13.3 milliseconds → 4.49 milliseconds (best of 118 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _add_behavior_instrumentation by 197% in PR #1580 (fix/java-direct-jvm-and-bugs) #1596

If you approve, it will be merged into this PR (branch fix/java-direct-jvm-and-bugs).

…2026-02-20T10.00.27 ⚡️ Speed up function `_add_behavior_instrumentation` by 197% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

codeflash-ai · 2026-02-20T10:31:29Z

This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from:

⚡️ Speed up function _add_behavior_instrumentation by 197% in PR #1580 (fix/java-direct-jvm-and-bugs) #1596

codeflash-ai · 2026-02-20T12:46:58Z

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

⚡️ Speed up function _add_timing_instrumentation by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1595

…2026-02-20T09.26.51 ⚡️ Speed up function `_add_timing_instrumentation` by 16% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

…2026-02-20T09.12.25 ⚡️ Speed up function `_is_inside_lambda` by 17% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

codeflash-ai · 2026-02-20T12:47:02Z

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

⚡️ Speed up function _is_inside_lambda by 17% in PR #1580 (fix/java-direct-jvm-and-bugs) #1594

…2026-02-20T06.34.48 ⚡️ Speed up function `_byte_to_line_index` by 41% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

codeflash-ai · 2026-02-20T12:47:25Z

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

⚡️ Speed up function _byte_to_line_index by 41% in PR #1580 (fix/java-direct-jvm-and-bugs) #1586

Direct JVM execution with ConsoleLauncher was always failing because junit-platform-console-standalone is not included in the standard junit-jupiter dependency tree. The _get_test_classpath() function now finds and adds the console standalone JAR from ~/.m2, downloading it via Maven if needed. This enables direct JVM test execution for JUnit 5 projects, avoiding the Maven overhead (~500ms vs ~5-10s per invocation) and Surefire configuration issues (e.g., custom <includes> that ignore -Dtest). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TestConfig.test_framework was an uncached @Property that called _detect_java_test_framework() -> detect_java_project() -> _detect_test_deps_from_pom() (parses pom.xml) on every access. During test result parsing, this was accessed once per testcase, causing 300K+ redundant pom.xml parses and massive debug log spam. Cache the result after first detection using _test_framework field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…s probing The previous detection ran `java -cp ... JUnitCore -version` to check for JUnit 4, but JUnit 5 projects include JUnit 4 classes via junit-vintage-engine, causing false positive detection. This made direct JVM execution always fail and fall back to Maven. Now checks for JUnit 5 JAR names (junit-jupiter, junit-platform, console-standalone) in the classpath string instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Check dependencyManagement section in pom.xml for test dependencies - Recursively check submodule pom.xml files (test, tests, etc.) - Change default fallback from JUnit 5 to JUnit 4 (more common in legacy) - Add debug logging for framework detection decisions - Fixes Bug #7: 64% of optimizations blocked by incorrect JUnit 5 detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… with vintage engine ConsoleLauncher runs both JUnit 4 (via vintage engine) and JUnit 5 tests. The detection now correctly distinguishes between JUnit 5 projects (have junit-jupiter on classpath) and JUnit 4 projects using ConsoleLauncher as the runner. Previously, the injected console-standalone JAR falsely triggered "JUnit 5 detected" for all projects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Convert f-string logging to lazy % formatting (G004) and replace try-except-pass with contextlib.suppress (SIM105). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@test

The optimized code achieves a **196% speedup** (from 13.3ms to 4.49ms) primarily through two focused optimizations that target the hottest paths identified by the line profiler: ## Key Optimizations ### 1. Early Exit in `wrap_target_calls_with_treesitter` (Primary Driver) The profiler shows that in the original code, 55.5% of `wrap_target_calls_with_treesitter`'s time (9.7ms out of 17.5ms) was spent in `_collect_calls`, which parses Java code with tree-sitter. The optimization adds: ```python body_text = "\n".join(body_lines) if func_name not in body_text: return list(body_lines), 0 ``` This simple string membership check avoids expensive tree-sitter parsing when the target function isn't present in the test method body. Since many test methods don't call the function being instrumented, this provides massive savings. The annotated tests confirm this pattern - tests with empty or simple bodies (no function calls) show the largest speedups: 639% for large methods and 1018% for complex expressions. ### 2. Optimized `_is_test_annotation` (Secondary Improvement) The profiler shows `_is_test_annotation` being called 1,950 times, spending 100% of its time (1.21ms) on regex matching. The optimization replaces the regex with direct string checks: ```python if not stripped_line.startswith("@test"): return False if len(stripped_line) == 5: # exactly "@test" return True next_char = stripped_line[5] return next_char == " " or next_char == "(" ``` This avoids regex overhead for the 1,737 non-`@Test` annotations that can be rejected immediately with `startswith()`. The profiler shows this reduced time from 1.21ms to 0.91ms (25% faster in this function). ## Performance Impact by Test Type The annotated tests reveal optimization effectiveness varies by workload: - **Empty/simple methods**: 107-154% faster (early exit dominates) - **Methods with complex expressions**: 396-1018% faster (avoids parsing large expression trees) - **Large methods with many statements**: 510-639% faster (early exit + reduced AST traversal) - **Methods with actual function calls**: 111-152% faster (smaller benefit since tree-sitter must run) ## Context and Production Impact Based on `function_references`, this function is called from test discovery in `test_instrumentation.py`, specifically for behavior instrumentation that captures return values. The early exit optimization is particularly valuable here because: 1. Test discovery processes many test methods, but typically only a subset call the target function 2. The function operates on the hot path during test suite instrumentation 3. Large test suites with 100+ test methods (see test case showing 154% speedup for 150 methods) benefit significantly The optimization maintains correctness - all test cases pass with identical output, confirming the early exit safely bypasses work that produces no changes when the function isn't present.

This optimization achieves a **15% runtime improvement** (10.2ms → 8.81ms) by replacing recursive AST traversal with iterative stack-based traversal in two critical functions: `collect_test_methods` and `collect_target_calls`. ## Key Changes **1. Iterative AST Traversal (Primary Speedup)** - Replaced recursive tree walking with explicit stack-based iteration - In `collect_test_methods`: Changed from recursive calls to `while stack` loop with `stack.extend(reversed(current.children))` - In `collect_target_calls`: Similar transformation using explicit stack management - **Impact**: Line profiler shows `collect_test_methods` dropped from 24.2% to 3.8% of total runtime (81% reduction in that function) **2. Why This Works in Python** - Python function calls have significant overhead (frame creation, argument binding, scope setup) - Recursive traversal compounds this overhead across potentially deep AST trees - Iterative approach uses a simple list for the stack, avoiding repeated function call overhead - The `reversed()` call ensures children are processed in the same order as recursive traversal, preserving correctness **3. Performance Characteristics** Based on annotated tests: - **Large method bodies** (500+ lines): 23.8% faster - most benefit from reduced recursion overhead - **Many test methods** (100 methods): 9.2% faster - cumulative savings across many traversals - **Simple cases**: 2-5% faster - overhead reduction still measurable - **Empty/no-match cases**: Minor regression (8-9% slower) due to negligible baseline times (12-40μs) ## Impact on Workloads The function references show `_add_timing_instrumentation` is called from test instrumentation code. This optimization particularly benefits: - **Java projects with large test suites** containing many `@Test` methods - **Complex test methods** with deep AST structures and multiple method invocations - **Batch instrumentation operations** where the function is called repeatedly The iterative approach scales better than recursion as AST depth and method count increase, making it especially valuable for large Java codebases where instrumentation is applied across hundreds of test methods.

The optimization achieves a **17% runtime improvement** (from 1.05ms to 894μs) by caching the `current.type` attribute access in a local variable (`t` or `current_type`) inside the loop. This seemingly small change reduces repeated attribute lookups on the same object during each iteration. **What Changed:** Instead of accessing `current.type` twice per iteration (once for each conditional check), the optimized version stores it in a local variable and reuses that value. This transforms two attribute lookups into one per iteration. **Why This Improves Performance:** In Python, attribute access involves dictionary lookups in the object's `__dict__`, which carries overhead. By caching the attribute value in a local variable, the code performs this lookup once per iteration instead of twice. Local variable access in Python is significantly faster than attribute access because it's a simple array index operation at the bytecode level (LOAD_FAST) versus a dictionary lookup (LOAD_ATTR). **Key Performance Characteristics:** The line profiler shows the optimization is particularly effective for the common case where both conditions need to be checked. The time spent on the two conditional checks decreased from 28% + 23.4% = 51.4% of total time to 22.4% + 15.3% = 37.7%, demonstrating measurable savings from the reduced attribute access overhead. **Test Case Performance:** - The optimization shows the most significant gains in **large-scale traversal scenarios** (1000-node chains), with 4-5% speedups in `test_long_chain_with_lambda_at_top_large_scale` and `test_long_chain_with_method_declaration_earlier_large_scale` - Shorter chains show slight regressions (1-6% slower) in individual test cases, likely due to measurement noise and the overhead of the additional variable assignment being more noticeable in very short executions - The overall **17% improvement** across the full workload confirms the optimization is beneficial when amortized across realistic usage patterns with varying tree depths This optimization is particularly valuable when traversing deep AST structures, where the function may iterate many times before finding a lambda or method declaration, making the cumulative savings from reduced attribute access substantial.

The main optimization here is eliminating the `max(0, idx)` call by handling the edge case directly. Since `bisect_right` returns 0 when `byte_offset` is less than all elements, subtracting 1 gives -1, which we can catch with a simple comparison. This avoids the function call overhead of `max()`.

…imization The base class stubs for remove_test_functions_from_generated_tests() and add_runtime_comments_to_generated_tests() return None, causing an AttributeError crash in function_optimizer.py when iterating generated_tests.generated_tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ation Java stdout markers now include the test method name in the class field (e.g., "TestClass.testMethod") matching the Python marker format. The parser extracts the test method name from this combined field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…all mode The module-level _test_file_path_cache persists across optimization iterations, which can cause negative cache entries to mask test files generated in later iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-20T20:17:50Z

codeflash/languages/java/test_runner.py

+    has_junit5_tests = "junit-jupiter" in classpath
+    has_console_launcher = "console-standalone" in classpath or "ConsoleLauncher" in classpath
+    # Use ConsoleLauncher if available (works for both JUnit 4 via vintage and JUnit 5).
+    # Only use JUnitCore when ConsoleLauncher is not on the classpath at all.
+    is_junit4 = not has_console_launcher


Low: The multi-module classpath loop iterates all subdirectories in project_root, which could inadvertently include target/classes from unrelated modules (e.g., a docs module or tooling module). Consider filtering to only known source modules (e.g., directories containing a pom.xml or build.gradle).

codeflash-ai · 2026-02-20T20:28:09Z

⚡️ Codeflash found optimizations for this PR

📄 35% (0.35x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 1.06 milliseconds → 783 microseconds (best of 249 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _byte_to_line_index by 35% in PR #1580 (fix/java-direct-jvm-and-bugs) #1619

If you approve, it will be merged into this PR (branch fix/java-direct-jvm-and-bugs).

codeflash-ai · 2026-02-20T20:38:54Z

codeflash/languages/java/support.py

+                parts = inv_id.iteration_id.split("_")
+                cur_invid = parts[0] if len(parts) < 3 else "_".join(parts[:-1])
+                key = key + "#" + cur_invid
+            if key not in unique_inv_ids:
+                unique_inv_ids[key] = 0
+            unique_inv_ids[key] += min(runtimes)


⚡️Codeflash found 21% (0.21x) speedup for JavaSupport._build_runtime_map in codeflash/languages/java/support.py

⏱️ Runtime : 681 microseconds → 565 microseconds (best of 152 runs)

📝 Explanation and details

The optimized code achieves a 20% runtime improvement (681μs → 565μs) by replacing inefficient string manipulation in the _build_runtime_map method with a more direct approach.

Key Optimization:

The original code processes iteration_id strings with multiple operations:

parts = inv_id.iteration_id.split("_") cur_invid = parts[0] if len(parts) < 3 else "_".join(parts[:-1])

The optimized version uses rsplit with a limit:

cur_invid = inv_id.iteration_id.rsplit("_", 1)[0]

Why This Is Faster:

Eliminates unnecessary list creation: split("_") creates a full list of all parts, even when only the first or all-but-last are needed. With large iteration IDs (e.g., "a_b_c_d_e_f_g"), this creates wasteful intermediate data structures.

Avoids conditional join: The original code conditionally rebuilds strings with "_".join(parts[:-1]) for iteration IDs with 3+ parts, performing O(n) string concatenations. rsplit("_", 1)[0] directly extracts the prefix without rebuilding.

Optimizes dictionary operations: Replaces if key not in dict check + assignment with dict.get(key, 0), reducing dictionary lookups from 2 to 1 per iteration.

Performance Impact by Test Case:

Tests with 3+ underscore-separated iteration IDs show the largest gains (28-72% faster), as they avoid the expensive "_".join(parts[:-1]) operation

Tests with simple or no iteration IDs still benefit (5-11% faster) from the improved dictionary access pattern

The large-scale test (1000 invocations) demonstrates 13.8% improvement, showing the optimization scales well

The optimization maintains identical behavior including edge cases (empty strings, None values, ValueError on empty runtime lists) while reducing computational overhead in the hot path.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 72 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.java.support import JavaSupport from codeflash.models.models import InvocationId def test_basic_single_invocation_no_iteration(): # Create a JavaSupport instance (real constructor) js = JavaSupport() # Single InvocationId with class and function names, no iteration_id inv = InvocationId( test_module_path="mod", test_class_name="MyTest", test_function_name="testFoo", function_getting_tested="foo", iteration_id=None, ) # Provide multiple runtimes; function uses min(runtimes) runtimes_map = {inv: [150, 200, 120]} # Call the method under test codeflash_output = js._build_runtime_map(runtimes_map); result = codeflash_output # 2.23μs -> 2.05μs (8.76% faster) def test_basic_single_invocation_with_short_iteration_parts(): js = JavaSupport() # iteration_id with two parts -> length < 3 -> use first part inv = InvocationId( test_module_path="mod", test_class_name="SomeTest", test_function_name="testBar", function_getting_tested="bar", iteration_id="run42_iterA", ) runtimes_map = {inv: [500, 400]} # Expected key: "SomeTest.testBar#run42" expected_key = "SomeTest.testBar#run42" codeflash_output = js._build_runtime_map(runtimes_map); result = codeflash_output # 2.90μs -> 2.67μs (8.60% faster) def test_basic_single_invocation_with_long_iteration_parts(): js = JavaSupport() # iteration_id with three parts -> join all except last with underscores inv = InvocationId( test_module_path="mod", test_class_name="T", test_function_name="f", function_getting_tested="f_impl", iteration_id="a_b_c", ) runtimes_map = {inv: [9, 7, 11]} # Expected key: "T.f#a_b" (join parts[:-1]) expected_key = "T.f#a_b" codeflash_output = js._build_runtime_map(runtimes_map); result = codeflash_output # 3.36μs -> 2.62μs (28.3% faster) def test_multiple_invocations_same_key_accumulate_minimums(): js = JavaSupport() # Two InvocationId instances that map to the same key and should accumulate min(runtimes) inv1 = InvocationId( test_module_path="mod", test_class_name="DupTest", test_function_name="testX", function_getting_tested="x", iteration_id="one_two", # parts len=2 => cur_invid = "one" ) inv2 = InvocationId( test_module_path="mod2", test_class_name="DupTest", test_function_name="testX", function_getting_tested="x", iteration_id="one_three", # parts len=2 => cur_invid = "one" -> same key ) runtimes_map = { inv1: [5, 10], # min = 5 inv2: [3, 4, 6], # min = 3 } codeflash_output = js._build_runtime_map(runtimes_map); result = codeflash_output # 4.02μs -> 3.72μs (8.07% faster) def test_invocation_with_no_test_name_is_ignored(): js = JavaSupport() # test_function_name is None -> test_qualified_name becomes None -> entry ignored inv = InvocationId( test_module_path="mod", test_class_name=None, test_function_name=None, function_getting_tested="something", iteration_id=None, ) codeflash_output = js._build_runtime_map({inv: [1, 2, 3]}); result = codeflash_output # 1.01μs -> 942ns (7.43% faster) def test_empty_strings_for_class_and_function_are_ignored(): js = JavaSupport() # Both class and function are empty strings -> treated as falsy -> ignored inv = InvocationId( test_module_path="mod", test_class_name="", test_function_name="", function_getting_tested="f", iteration_id=None, ) codeflash_output = js._build_runtime_map({inv: [10]}); result = codeflash_output # 1.01μs -> 961ns (5.20% faster) def test_empty_runtimes_raises_value_error(): js = JavaSupport() # An empty runtimes list will cause min([]) -> ValueError. Ensure the function surfaces it. inv = InvocationId( test_module_path="mod", test_class_name="E", test_function_name="testEmpty", function_getting_tested="e", iteration_id=None, ) with pytest.raises(ValueError): js._build_runtime_map({inv: []}) # 4.69μs -> 4.00μs (17.3% faster) def test_iteration_id_single_empty_part_and_trailing_hash_behavior(): js = JavaSupport() # iteration_id is an empty string -> split gives [''] -> cur_invid == '' inv = InvocationId( test_module_path="mod", test_class_name="Edge", test_function_name="testEmptyIter", function_getting_tested="edge", iteration_id="", ) codeflash_output = js._build_runtime_map({inv: [42]}); result = codeflash_output # 2.02μs -> 1.87μs (8.00% faster) def test_large_scale_mixed_entries_performance_and_correctness(): js = JavaSupport() # Build a large mapping (1000 InvocationId entries) with varying class/function/iteration combos. # We will replicate the key-generation logic here to compute expected results deterministically. large_map = {} expected = {} # Use 1000 entries, with deterministic patterns for names and runtimes N = 1000 for i in range(N): class_name = f"Class{i % 10}" if (i % 7) != 0 else None # some entries with no class -> use function-only name func_name = f"test_{i % 25}" # many duplicates across entries iteration_id = None # Some variety in iteration ids: None, single-part, two-part, three-part if i % 5 == 0: iteration_id = None elif i % 5 == 1: iteration_id = f"it{i%3}" # single-part -> cur_invid = whole elif i % 5 == 2: iteration_id = f"group{i%4}_run{i%6}" # two-part -> first part before '_' is cur_invid elif i % 5 == 3: iteration_id = f"a_b_c_{i%2}" # >=3 parts -> join all except last else: iteration_id = "" # empty string case -> cur_invid == '' inv = InvocationId( test_module_path=f"module_{i%20}", test_class_name=class_name, test_function_name=func_name, function_getting_tested="target", iteration_id=iteration_id, ) # Choose deterministic runtimes list length and values runtimes = [i % 13 + 1, (i * 3) % 17 + 1] # always >=1 large_map[inv] = runtimes # Compute expected key using same logic as _build_runtime_map (but done here for assertions) if class_name: test_qualified_name = class_name + "." + func_name else: test_qualified_name = func_name if not test_qualified_name: # skip continue key = test_qualified_name if iteration_id is not None: parts = iteration_id.split("_") cur_invid = parts[0] if len(parts) < 3 else "_".join(parts[:-1]) key = key + "#" + cur_invid # Add min to expected (accumulate) expected.setdefault(key, 0) expected[key] += min(runtimes) # Call implementation under test codeflash_output = js._build_runtime_map(large_map); result = codeflash_output # 176μs -> 154μs (13.8% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest from codeflash.languages.java.support import JavaSupport from codeflash.models.models import InvocationId def test_build_runtime_map_single_invocation_no_iteration(): """Test basic functionality with a single invocation and no iteration ID.""" java_support = JavaSupport() # Create a single InvocationId with class and function name inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) # Build the runtime map with a single entry inv_id_runtimes = {inv_id: [100, 200, 150]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.31μs -> 2.20μs (4.99% faster) def test_build_runtime_map_no_class_name(): """Test with InvocationId that has no class name (function-level test).""" java_support = JavaSupport() # Create an InvocationId without class name inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name=None, test_function_name="testFunction", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [50, 75, 100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 1.94μs -> 1.74μs (11.4% faster) def test_build_runtime_map_with_iteration_id(): """Test with an iteration ID that has underscore-separated parts.""" java_support = JavaSupport() # Create an InvocationId with a complex iteration_id inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="invocation_1_attempt" ) inv_id_runtimes = {inv_id: [300, 250, 350]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 3.51μs -> 2.58μs (35.7% faster) def test_build_runtime_map_iteration_id_no_underscores(): """Test with an iteration ID that has no underscores (less than 3 parts).""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="simple" ) inv_id_runtimes = {inv_id: [400, 500]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.71μs -> 2.48μs (9.26% faster) def test_build_runtime_map_multiple_invocations(): """Test with multiple distinct invocations.""" java_support = JavaSupport() inv_id1 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass1", test_function_name="test1", function_getting_tested="func1", iteration_id=None ) inv_id2 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass2", test_function_name="test2", function_getting_tested="func2", iteration_id=None ) inv_id_runtimes = { inv_id1: [100, 120, 110], inv_id2: [200, 180, 190] } codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.79μs -> 2.73μs (2.23% faster) def test_build_runtime_map_same_qualified_name_different_iterations(): """Test accumulation when same qualified name appears with different iteration IDs.""" java_support = JavaSupport() inv_id1 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="inv_1_a" ) inv_id2 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="inv_2_b" ) inv_id_runtimes = { inv_id1: [100], inv_id2: [150] } codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 4.59μs -> 3.37μs (36.3% faster) def test_build_runtime_map_empty_input(): """Test with an empty runtime map.""" java_support = JavaSupport() inv_id_runtimes = {} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 781ns -> 771ns (1.30% faster) def test_build_runtime_map_none_function_name(): """Test with None test_function_name.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name=None, function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [100, 200]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output def test_build_runtime_map_empty_function_name(): """Test with empty string test_function_name.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [100, 200]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.37μs -> 2.24μs (5.79% faster) def test_build_runtime_map_none_class_name_none_function(): """Test with both class_name and function_name as None.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name=None, test_function_name=None, function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 1.07μs -> 1.05μs (1.80% faster) def test_build_runtime_map_single_runtime_value(): """Test with a single runtime value in the list.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [42]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.13μs -> 1.97μs (8.11% faster) def test_build_runtime_map_large_runtime_values(): """Test with very large runtime values.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [10000000, 9999999, 10000001]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.11μs -> 2.01μs (4.97% faster) def test_build_runtime_map_zero_runtime(): """Test with zero runtime values.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [0, 100, 50]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.04μs -> 1.88μs (8.55% faster) def test_build_runtime_map_negative_runtime(): """Test with negative runtime values (edge case).""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [-10, 100, 50]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.03μs -> 1.96μs (3.62% faster) def test_build_runtime_map_special_characters_in_names(): """Test with special characters in class and function names.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="Test$Class", test_function_name="test_Method_123", function_getting_tested="targetFunction", iteration_id=None ) inv_id_runtimes = {inv_id: [100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 1.96μs -> 1.81μs (8.27% faster) def test_build_runtime_map_long_iteration_id(): """Test with a long iteration ID containing many underscore-separated parts.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="a_b_c_d_e_f_g" ) inv_id_runtimes = {inv_id: [100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 3.41μs -> 2.50μs (36.5% faster) def test_build_runtime_map_accumulation_same_key(): """Test that runtime values are accumulated for identical keys.""" java_support = JavaSupport() inv_id1 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="inv_1_a" ) # Same qualified_name and iteration (after processing), different runtimes inv_id2 = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="inv_1_a" ) inv_id_runtimes = { inv_id1: [100, 200], inv_id2: [150, 250] } codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 3.38μs -> 2.41μs (39.9% faster) def test_build_runtime_map_iteration_id_single_underscore(): """Test with iteration_id that has exactly one underscore (2 parts).""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="part1_part2" ) inv_id_runtimes = {inv_id: [100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 2.67μs -> 2.42μs (9.94% faster) def test_build_runtime_map_iteration_id_exactly_3_parts(): """Test with iteration_id that has exactly 3 underscore-separated parts.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id="a_b_c" ) inv_id_runtimes = {inv_id: [100]} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 3.16μs -> 2.46μs (28.0% faster) def test_build_runtime_map_empty_runtime_list(): """Test behavior when runtime list is empty (should not occur in normal use).""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) # This will cause ValueError when calling min([]) inv_id_runtimes = {inv_id: []} with pytest.raises(ValueError): java_support._build_runtime_map(inv_id_runtimes) # 4.59μs -> 4.03μs (13.9% faster) def test_build_runtime_map_100_invocations(): """Test with 100 different invocations.""" java_support = JavaSupport() inv_id_runtimes = {} for i in range(100): inv_id = InvocationId( test_module_path=f"com.example.Module{i}", test_class_name=f"TestClass{i}", test_function_name=f"test{i}", function_getting_tested=f"targetFunc{i}", iteration_id=None ) inv_id_runtimes[inv_id] = [100 + i, 150 + i, 120 + i] codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 51.3μs -> 47.4μs (8.29% faster) def test_build_runtime_map_1000_invocations_with_iterations(): """Test with 1000 invocations with various iteration IDs.""" java_support = JavaSupport() inv_id_runtimes = {} for i in range(1000): inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=f"inv_{i % 100}_attempt_{i % 10}" if i % 3 == 0 else None ) # Use dict to group by key to handle accumulation if inv_id not in inv_id_runtimes: inv_id_runtimes[inv_id] = [100 + i] else: inv_id_runtimes[inv_id].append(100 + i) codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 100μs -> 76.1μs (32.0% faster) # All values should be >= 100 for value in result.values(): pass def test_build_runtime_map_large_runtime_lists(): """Test with invocations that have large lists of runtime values.""" java_support = JavaSupport() inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) # Create a large list of runtime values large_runtime_list = list(range(1000, 2000)) # 1000 values from 1000 to 1999 inv_id_runtimes = {inv_id: large_runtime_list} codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 10.8μs -> 10.7μs (1.23% faster) def test_build_runtime_map_deeply_nested_iteration_ids(): """Test with iteration IDs that have many underscore-separated parts.""" java_support = JavaSupport() inv_id_runtimes = {} for i in range(100): # Create iteration_id with many parts parts = [f"part{j}" for j in range(10)] iteration_id = "_".join(parts) inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name=f"TestClass{i}", test_function_name=f"test{i}", function_getting_tested="targetFunction", iteration_id=iteration_id ) inv_id_runtimes[inv_id] = [100 + i] codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 107μs -> 62.5μs (71.5% faster) expected_iteration = "_".join([f"part{j}" for j in range(9)]) def test_build_runtime_map_accumulation_stress(): """Test accumulation with many invocations mapping to same final key.""" java_support = JavaSupport() inv_id_runtimes = {} # Create 100 different InvocationId objects that map to the same final key for i in range(100): inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=f"iteration_{i % 10}_suffix_{i % 5}" ) inv_id_runtimes[inv_id] = [100 * (i + 1)] codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 11.6μs -> 8.46μs (37.1% faster) # Expected: multiple keys but all start with "TestClass.testMethod#" for key in result.keys(): pass # Verify accumulation: for each unique key, sum of minimums is present total_sum = sum(result.values()) def test_build_runtime_map_mixed_none_and_values(): """Test with mix of None and non-None iteration IDs.""" java_support = JavaSupport() inv_id_runtimes = {} # Add invocations with and without iteration_id for i in range(50): inv_id_with_iter = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=f"iter_{i}" ) inv_id_runtimes[inv_id_with_iter] = [100 + i] inv_id_without_iter = InvocationId( test_module_path="com.example.TestModule", test_class_name="TestClass", test_function_name="testMethod", function_getting_tested="targetFunction", iteration_id=None ) if inv_id_without_iter not in inv_id_runtimes: inv_id_runtimes[inv_id_without_iter] = [50 + i] else: inv_id_runtimes[inv_id_without_iter].append(50 + i) codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 31.9μs -> 29.9μs (6.63% faster) def test_build_runtime_map_1000_mixed_scenarios(): """Comprehensive large-scale test with 1000 mixed invocations.""" java_support = JavaSupport() inv_id_runtimes = {} # Create diverse set of invocations for i in range(1000): # Vary class/function names class_idx = i % 50 func_idx = i % 30 iter_idx = i % 20 inv_id = InvocationId( test_module_path="com.example.TestModule", test_class_name=f"TestClass{class_idx}" if class_idx % 2 == 0 else None, test_function_name=f"test{func_idx}", function_getting_tested=f"targetFunc{i % 10}", iteration_id=f"iter_{iter_idx}" if i % 3 == 0 else None ) runtimes = [100 * i + j for j in range(5)] if inv_id in inv_id_runtimes: inv_id_runtimes[inv_id].extend(runtimes) else: inv_id_runtimes[inv_id] = runtimes codeflash_output = java_support._build_runtime_map(inv_id_runtimes); result = codeflash_output # 120μs -> 113μs (6.35% faster) # All values should be positive for value in result.values(): pass # Verify structure of keys for key in result.keys(): pass # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1580-2026-02-20T20.38.54

Click to see suggested changes

Suggested change

parts = inv_id.iteration_id.split("_")

cur_invid = parts[0] if len(parts) < 3 else "_".join(parts[:-1])

key = key + "#" + cur_invid

if key not in unique_inv_ids:

unique_inv_ids[key] = 0

unique_inv_ids[key] += min(runtimes)

# Use rsplit to avoid creating large intermediate lists and joins.

# This yields the same behavior as the original logic:

# - "a" -> "a"

# - "a_b" -> "a"

# - "a_b_c" -> "a_b"

cur_invid = inv_id.iteration_id.rsplit("_", 1)[0]

key = key + "#" + cur_invid

# Preserve original exception behavior for empty runtimes by calling min() directly.

m = min(runtimes)

unique_inv_ids[key] = unique_inv_ids.get(key, 0) + m

codeflash-ai bot reviewed Feb 20, 2026

View reviewed changes

claude bot reviewed Feb 20, 2026

View reviewed changes

codeflash/languages/java/test_runner.py Outdated Show resolved Hide resolved

claude bot reviewed Feb 20, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _is_inside_complex_expression by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1584

Open

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _byte_to_line_index by 41% in PR #1580 (fix/java-direct-jvm-and-bugs) #1586

Merged

claude bot reviewed Feb 20, 2026

View reviewed changes

misrasaurabh1 reviewed Feb 20, 2026

View reviewed changes

claude bot reviewed Feb 20, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _is_inside_lambda by 17% in PR #1580 (fix/java-direct-jvm-and-bugs) #1594

Merged

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _add_timing_instrumentation by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1595

Merged

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _add_behavior_instrumentation by 197% in PR #1580 (fix/java-direct-jvm-and-bugs) #1596

Merged

misrasaurabh1 added a commit that referenced this pull request Feb 20, 2026

Merge pull request #1596 from codeflash-ai/codeflash/optimize-pr1580-…

8c3a2b0

…2026-02-20T10.00.27 ⚡️ Speed up function `_add_behavior_instrumentation` by 197% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

claude bot added a commit that referenced this pull request Feb 20, 2026

Merge pull request #1595 from codeflash-ai/codeflash/optimize-pr1580-…

ae1c03d

…2026-02-20T09.26.51 ⚡️ Speed up function `_add_timing_instrumentation` by 16% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

claude bot added a commit that referenced this pull request Feb 20, 2026

Merge pull request #1594 from codeflash-ai/codeflash/optimize-pr1580-…

f32d19e

…2026-02-20T09.12.25 ⚡️ Speed up function `_is_inside_lambda` by 17% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

claude bot added a commit that referenced this pull request Feb 20, 2026

Merge pull request #1586 from codeflash-ai/codeflash/optimize-pr1580-…

1f6001f

…2026-02-20T06.34.48 ⚡️ Speed up function `_byte_to_line_index` by 41% in PR #1580 (`fix/java-direct-jvm-and-bugs`)

mashraf-222 and others added 22 commits February 20, 2026 20:13

style: auto-fix linting issues

b6564e6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply suggestion from @claude[bot]

d54aa68

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

style: auto-fix linting issues

53528a2

Convert f-string logging to lazy % formatting (G004) and replace try-except-pass with contextlib.suppress (SIM105). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve mypy type errors in Java config and instrumentation

b8ec235

coverage reported correctly

58561c8

fix pr creation bug

8a1ab8e

style: merge multiple comparisons per PLR1714

864f87f

style: auto-fix linting issues

a523c9a

style: auto-fix linting issues

4294601

chore: log debug message when JUnitCore ignores reports_dir parameter

38d6309

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 force-pushed the fix/java-direct-jvm-and-bugs branch from 8e8b3fd to 38d6309 Compare February 20, 2026 20:13

style: auto-fix linting issues

5346cab

claude bot reviewed Feb 20, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Feb 20, 2026

⚡️ Speed up function _byte_to_line_index by 35% in PR #1580 (fix/java-direct-jvm-and-bugs) #1619

Closed

mashraf-222 merged commit e2c3e98 into omni-java Feb 20, 2026
28 of 34 checks passed

mashraf-222 deleted the fix/java-direct-jvm-and-bugs branch February 20, 2026 20:29

codeflash-ai bot reviewed Feb 20, 2026

View reviewed changes

		_FALLBACK_METHOD_PATTERN = re.compile(r"\b(\w+)\s*\(")


		def _extract_test_method_name(method_lines: list[str]) -> str:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 1070 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

-def _extract_test_method_name(method_lines: list[str]) -> str:
+def _extract_test_method_name(method_lines: list[str]) -> str:
+    for line in method_lines:
+        match = _METHOD_SIG_PATTERN.search(line)
+        if match:
+            return match.group(1)



		# Cache for resolved test file paths to avoid repeated rglob calls
		_test_file_path_cache: dict[tuple[str, Path], Path \| None] = {}

-                parts = inv_id.iteration_id.split("_")
-                cur_invid = parts[0] if len(parts) < 3 else "_".join(parts[:-1])
-                key = key + "#" + cur_invid
-            if key not in unique_inv_ids:
-                unique_inv_ids[key] = 0
-            unique_inv_ids[key] += min(runtimes)
+                # Use rsplit to avoid creating large intermediate lists and joins.
+                # This yields the same behavior as the original logic:
+                # - "a" -> "a"
+                # - "a_b" -> "a"
+                # - "a_b_c" -> "a_b"
+                cur_invid = inv_id.iteration_id.rsplit("_", 1)[0]
+                key = key + "#" + cur_invid
+            # Preserve original exception behavior for empty runtimes by calling min() directly.
+            m = min(runtimes)
+            unique_inv_ids[key] = unique_inv_ids.get(key, 0) + m

Comments

Conversation

mashraf-222 commented Feb 20, 2026

Summary

Problems Fixed

Bug #7 (Critical): JUnit Version Detection Failure — 64% of all failures

Bug #3 (High): Direct JVM Execution Always Failing

Bug #6 (High): Instrumentation Breaking Complex Expressions

Bug #2 (Medium-High): Extremely Slow rglob Calls

Pre-existing Bug: TestConfig.test_framework Uncached Property

Behavior Test Method Name Fix

Code Changes

Other files with minor formatting changes (from pre-commit):

Testing

E2E Validation (Fibonacci — JUnit 5, single-module)

E2E Validation (BubbleSort — exercises instrumentation + measurement)

Unit Tests

Performance Impact

Known Issues Not Addressed

Relationship to PR #1552

Uh oh!

codeflash-ai bot Feb 20, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 58% (0.58x) speedup for _extract_test_method_name in codeflash/languages/java/instrumentation.py

Uh oh!

Uh oh!

claude bot Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Mypy

Code Review (Re-review after latest commits)

Test Coverage

Uh oh!

codeflash-ai bot commented Feb 20, 2026

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for _is_inside_complex_expression in codeflash/languages/java/instrumentation.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _is_inside_complex_expression by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1584

Uh oh!

claude bot Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1 Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1 commented Feb 20, 2026

Uh oh!

claude bot Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Feb 20, 2026

⚡️ Codeflash found optimizations for this PR

📄 17% (0.17x) speedup for _is_inside_lambda in codeflash/languages/java/instrumentation.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _is_inside_lambda by 17% in PR #1580 (fix/java-direct-jvm-and-bugs) #1594

Uh oh!

codeflash-ai bot commented Feb 20, 2026

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for _add_timing_instrumentation in codeflash/languages/java/instrumentation.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _add_timing_instrumentation by 16% in PR #1580 (fix/java-direct-jvm-and-bugs) #1595

Uh oh!

⚡️Codeflash found 58% (0.58x) speedup for `_extract_test_method_name` in `codeflash/languages/java/instrumentation.py`

claude bot Feb 20, 2026 •

edited

Loading

claude bot commented Feb 20, 2026 •

edited

Loading

📄 16% (0.16x) speedup for `_is_inside_complex_expression` in `codeflash/languages/java/instrumentation.py`

⚡️ Speed up function `_is_inside_complex_expression` by 16% in PR #1580 (`fix/java-direct-jvm-and-bugs`) #1584

claude bot Feb 20, 2026 •

edited

Loading

claude bot Feb 20, 2026 •

edited

Loading

claude bot Feb 20, 2026 •

edited

Loading

claude bot Feb 20, 2026 •

edited

Loading

📄 17% (0.17x) speedup for `_is_inside_lambda` in `codeflash/languages/java/instrumentation.py`

⚡️ Speed up function `_is_inside_lambda` by 17% in PR #1580 (`fix/java-direct-jvm-and-bugs`) #1594

📄 16% (0.16x) speedup for `_add_timing_instrumentation` in `codeflash/languages/java/instrumentation.py`

⚡️ Speed up function `_add_timing_instrumentation` by 16% in PR #1580 (`fix/java-direct-jvm-and-bugs`) #1595

misrasaurabh1 commented Feb 20, 2026 •

edited by mashraf-222

Loading

📄 17% (0.17x) speedup for `_get_qualified_name` in `codeflash/languages/java/instrumentation.py`

📄 197% (1.97x) speedup for `_add_behavior_instrumentation` in `codeflash/languages/java/instrumentation.py`

⚡️ Speed up function `_add_behavior_instrumentation` by 197% in PR #1580 (`fix/java-direct-jvm-and-bugs`) #1596

📄 35% (0.35x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`

⚡️ Speed up function `_byte_to_line_index` by 35% in PR #1580 (`fix/java-direct-jvm-and-bugs`) #1619

⚡️Codeflash found 21% (0.21x) speedup for `JavaSupport._build_runtime_map` in `codeflash/languages/java/support.py`