Skip to content

Comments

⚡️ Speed up function _byte_to_line_index by 35% in PR #1580 (fix/java-direct-jvm-and-bugs)#1619

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1580-2026-02-20T20.27.56
Closed

⚡️ Speed up function _byte_to_line_index by 35% in PR #1580 (fix/java-direct-jvm-and-bugs)#1619
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1580-2026-02-20T20.27.56

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1580

If you approve this dependent PR, these changes will be merged into the original PR branch fix/java-direct-jvm-and-bugs.

This PR will be automatically closed if the original PR is merged.


📄 35% (0.35x) speedup for _byte_to_line_index in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 1.06 milliseconds 783 microseconds (best of 249 runs)

📝 Explanation and details

The optimization replaces max(idx, 0) with a ternary expression 0 if idx < 0 else idx, achieving a 34% runtime improvement (from 1.06ms to 783μs).

What changed:
The only modification is in the return statement - replacing the max() built-in function call with an inline conditional expression.

Why it's faster:
The max() function in Python involves overhead from:

  1. Function call setup and teardown
  2. Argument tuple creation for variadic parameters
  3. Generic comparison logic that handles arbitrary types and multiple arguments

The ternary operator 0 if idx < 0 else idx is:

  1. A direct bytecode operation (no function call)
  2. A single comparison with immediate branching
  3. Optimized at the compiler level for simple integer comparisons

Line profiler data confirms this: the return statement dropped from 751,720ns total time (40.9% of function time) to 476,309ns (28.7% of function time) - a 37% reduction in that line alone.

Performance characteristics:
Based on the annotated tests, the optimization shows consistent improvements across all test cases:

  • 48-71% speedup on basic single and multi-line mappings
  • 50-66% speedup on edge cases (empty lists, negative offsets, large offsets)
  • 32-43% speedup on large-scale tests with 100-1000 lines
  • Particularly effective for tight line distributions where the function is called frequently

The optimization is universally beneficial because every call to _byte_to_line_index executes this return statement exactly once, making it a hot path regardless of input characteristics.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2528 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_basic_mapping_at_and_between_starts():
    # Given a simple sorted list of byte starts
    line_byte_starts = [0, 10, 20, 30]
    # Exact match to the first start should map to index 0
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 1.05μs -> 711ns (48.0% faster)
    # Value between 0 and 10 should map to index 0 (the previous start)
    codeflash_output = _byte_to_line_index(5, line_byte_starts) # 551ns -> 380ns (45.0% faster)
    # Exact match to 10 should map to index 1
    codeflash_output = _byte_to_line_index(10, line_byte_starts) # 401ns -> 281ns (42.7% faster)
    # Value just before 20 should map to index 1
    codeflash_output = _byte_to_line_index(19, line_byte_starts) # 381ns -> 260ns (46.5% faster)
    # Exact match to 20 should map to index 2
    codeflash_output = _byte_to_line_index(20, line_byte_starts) # 401ns -> 270ns (48.5% faster)
    # Value beyond the last start should map to the last index (3)
    codeflash_output = _byte_to_line_index(999, line_byte_starts) # 341ns -> 240ns (42.1% faster)

def test_basic_empty_starts_list_returns_zero():
    # An empty line_byte_starts must always return 0 (clamped by max(..., 0))
    codeflash_output = _byte_to_line_index(0, []) # 932ns -> 602ns (54.8% faster)
    codeflash_output = _byte_to_line_index(12345, []) # 451ns -> 300ns (50.3% faster)
    # Negative byte offset with empty starts still returns 0
    codeflash_output = _byte_to_line_index(-1, []) # 321ns -> 210ns (52.9% faster)

def test_negative_byte_offsets_and_negative_starts():
    # Starts include a negative value and positive values
    line_byte_starts = [-100, 0, 50]
    # Very negative offset before the first start clamps to 0
    codeflash_output = _byte_to_line_index(-1000, line_byte_starts) # 1.01μs -> 711ns (42.3% faster)
    # Exactly at the negative start returns index 0
    codeflash_output = _byte_to_line_index(-100, line_byte_starts) # 581ns -> 380ns (52.9% faster)
    # Between -100 and 0 returns index 0 (the previous start)
    codeflash_output = _byte_to_line_index(-1, line_byte_starts) # 411ns -> 281ns (46.3% faster)
    # Exactly at 0 returns index 1
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 441ns -> 290ns (52.1% faster)
    # Between 0 and 50 returns index 1
    codeflash_output = _byte_to_line_index(49, line_byte_starts) # 401ns -> 280ns (43.2% faster)
    # Exactly at 50 returns index 2
    codeflash_output = _byte_to_line_index(50, line_byte_starts) # 371ns -> 250ns (48.4% faster)

def test_float_byte_offset_is_supported_and_behaves_like_numeric_comparison():
    # Although annotated as int, a float is still comparable — behavior should be consistent
    line_byte_starts = [0, 10]
    # 9.9 is less than 10 so it maps to index 0
    codeflash_output = _byte_to_line_index(9.9, line_byte_starts) # 1.09μs -> 762ns (43.3% faster)
    # 10.0 equals the second start and should map to index 1
    codeflash_output = _byte_to_line_index(10.0, line_byte_starts) # 531ns -> 341ns (55.7% faster)
    # large float beyond last start maps to last index
    codeflash_output = _byte_to_line_index(1e6 + 5.5, line_byte_starts) # 351ns -> 230ns (52.6% faster)

def test_none_as_line_byte_starts_raises_type_error():
    # Passing None instead of a list should raise a TypeError from bisect (deterministic)
    with pytest.raises(TypeError):
        _byte_to_line_index(0, None) # 2.87μs -> 2.81μs (2.14% faster)

def test_unsorted_line_byte_starts_yields_deterministic_result():
    # The function uses bisect_right which assumes sorted inputs; passing unsorted input
    # still yields a deterministic result equal to calling bisect_right directly.
    unsorted = [10, 0, 20]  # intentionally unsorted
    # We assert that the function's result matches bisect behavior directly for several offsets.
    # This documents and locks in the implementation detail (bisect_right usage).
    import bisect
    for offset in (-5, 0, 5, 10, 15, 25):
        expected_idx = bisect.bisect_right(unsorted, offset) - 1
        expected_idx = max(expected_idx, 0)
        codeflash_output = _byte_to_line_index(offset, unsorted) # 2.58μs -> 1.84μs (39.8% faster)

def test_large_scale_sequential_starts_full_coverage():
    # Create 1000 sorted line starts spaced by 10 bytes each: 0, 10, 20, ..., 9990
    n = 1000
    spacing = 10
    line_byte_starts = [i * spacing for i in range(n)]
    # Test 1000 different offsets (0 through 999*10) — each offset is an exact multiple of spacing
    # Expected index is offset // spacing (but capped at n-1 for values beyond the last start).
    for i in range(n):
        offset = i * spacing
        expected = i if i < n else n - 1  # here i < n always true, keep expression explicit
        codeflash_output = _byte_to_line_index(offset, line_byte_starts) # 419μs -> 315μs (32.7% faster)

def test_large_scale_many_random_like_offsets_but_deterministic():
    # Use a deterministic sequence of offsets (no randomness) to test non-exact positions
    n = 1000
    spacing = 7  # use a different spacing to exercise integer division behavior
    line_byte_starts = [i * spacing for i in range(n)]
    # Generate 1000 offsets that cover negatives, in-between, and beyond-last positions deterministically
    offsets = [-(i % 5) for i in range(100)]  # some negatives
    offsets += [i * spacing + (i % spacing) for i in range(450)]  # in-between positions
    offsets += [line_byte_starts[-1] + i for i in range(450)]  # beyond last
    # For each offset compute expected index: floor division by spacing, clamped to last index, and non-negative
    last_index = len(line_byte_starts) - 1
    for offset in offsets:
        if offset < 0:
            expected = 0
        else:
            expected = offset // spacing
            if expected > last_index:
                expected = last_index
        # Compare function under test to expected mapping
        codeflash_output = _byte_to_line_index(offset, line_byte_starts) # 399μs -> 296μs (34.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import bisect

# imports
import pytest
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_basic_single_line_at_start():
    """Test byte offset at the start of a single-line document."""
    # byte_offset=0 with line starting at 0 should map to line 0
    codeflash_output = _byte_to_line_index(0, [0]); result = codeflash_output # 1.16μs -> 781ns (48.8% faster)

def test_basic_single_line_middle():
    """Test byte offset in the middle of a single-line document."""
    # byte_offset=5 is within first line (0-10), should map to line 0
    codeflash_output = _byte_to_line_index(5, [0, 10]); result = codeflash_output # 1.12μs -> 661ns (69.7% faster)

def test_basic_single_line_end():
    """Test byte offset at the end of a single-line document."""
    # byte_offset=10 is at the start of second line, should map to line 1
    codeflash_output = _byte_to_line_index(10, [0, 10]); result = codeflash_output # 1.02μs -> 631ns (62.0% faster)

def test_basic_multiline_first_line():
    """Test byte offset on the first line of a multi-line document."""
    # byte_offset=5 is in first line (starts at 0), should map to line 0
    codeflash_output = _byte_to_line_index(5, [0, 20, 40, 60]); result = codeflash_output # 1.09μs -> 692ns (57.8% faster)

def test_basic_multiline_second_line():
    """Test byte offset on the second line of a multi-line document."""
    # byte_offset=25 is between line starts 20 and 40, should map to line 1
    codeflash_output = _byte_to_line_index(25, [0, 20, 40, 60]); result = codeflash_output # 982ns -> 641ns (53.2% faster)

def test_basic_multiline_third_line():
    """Test byte offset on the third line of a multi-line document."""
    # byte_offset=45 is between line starts 40 and 60, should map to line 2
    codeflash_output = _byte_to_line_index(45, [0, 20, 40, 60]); result = codeflash_output # 1.01μs -> 641ns (57.9% faster)

def test_basic_at_line_boundary():
    """Test byte offset exactly at a line boundary."""
    # byte_offset=20 is exactly at line start of line 1, should map to line 1
    codeflash_output = _byte_to_line_index(20, [0, 20, 40, 60]); result = codeflash_output # 1.02μs -> 632ns (61.7% faster)

def test_basic_at_last_line_boundary():
    """Test byte offset exactly at the last line boundary."""
    # byte_offset=60 is exactly at line start of line 3, should map to line 3
    codeflash_output = _byte_to_line_index(60, [0, 20, 40, 60]); result = codeflash_output # 992ns -> 631ns (57.2% faster)

def test_edge_empty_line_starts():
    """Test with empty line_byte_starts list."""
    # Empty list means no lines defined, should return 0 (max(idx, 0) where idx=-1)
    codeflash_output = _byte_to_line_index(0, []); result = codeflash_output # 981ns -> 631ns (55.5% faster)

def test_edge_empty_line_starts_nonzero_offset():
    """Test with empty line_byte_starts and non-zero byte offset."""
    # Empty list with any offset should still return 0 (max(idx, 0) where idx=-1)
    codeflash_output = _byte_to_line_index(100, []); result = codeflash_output # 932ns -> 561ns (66.1% faster)

def test_edge_zero_byte_offset_single_line():
    """Test byte offset of 0 with single line starting at 0."""
    # This is the absolute start of document
    codeflash_output = _byte_to_line_index(0, [0]); result = codeflash_output # 1.02μs -> 621ns (64.6% faster)

def test_edge_zero_byte_offset_multiline():
    """Test byte offset of 0 with multiple lines."""
    # Should always map to line 0 regardless of other line starts
    codeflash_output = _byte_to_line_index(0, [0, 50, 100, 150]); result = codeflash_output # 1.04μs -> 692ns (50.6% faster)

def test_edge_large_byte_offset_beyond_lines():
    """Test byte offset far beyond the last defined line."""
    # byte_offset=1000 is way beyond last line at 60, should map to last line
    codeflash_output = _byte_to_line_index(1000, [0, 20, 40, 60]); result = codeflash_output # 952ns -> 631ns (50.9% faster)

def test_edge_byte_offset_between_gaps():
    """Test byte offset in gap between line starts."""
    # byte_offset=35 is between 20 and 40, should map to line 1
    codeflash_output = _byte_to_line_index(35, [0, 20, 40]); result = codeflash_output # 991ns -> 661ns (49.9% faster)

def test_edge_single_large_byte_offset():
    """Test with single line and very large byte offset."""
    # Single line at start, very large offset still maps to line 0
    codeflash_output = _byte_to_line_index(999999, [0]); result = codeflash_output # 1.04μs -> 611ns (70.5% faster)

def test_edge_negative_byte_offset():
    """Test with negative byte offset (unusual but should handle)."""
    # Negative offset with bisect_right should return -1, then max(-1, 0) = 0
    codeflash_output = _byte_to_line_index(-10, [0, 20, 40]); result = codeflash_output # 932ns -> 661ns (41.0% faster)

def test_edge_many_lines_same_start():
    """Test with multiple consecutive identical line starts."""
    # Multiple lines starting at 0 (unusual but valid)
    codeflash_output = _byte_to_line_index(5, [0, 0, 0, 20]); result = codeflash_output # 1.00μs -> 601ns (66.7% faster)

def test_edge_offset_at_penultimate_line():
    """Test byte offset at the second-to-last line start."""
    # byte_offset=40 is exactly at third line start in four-line document
    codeflash_output = _byte_to_line_index(40, [0, 20, 40, 60]); result = codeflash_output # 1.06μs -> 641ns (65.7% faster)

def test_edge_two_line_document_middle():
    """Test byte offset in middle of two-line document."""
    # byte_offset=10 is between 0 and 20
    codeflash_output = _byte_to_line_index(10, [0, 20]); result = codeflash_output # 1.00μs -> 631ns (58.8% faster)

def test_edge_two_line_document_at_second():
    """Test byte offset at start of second line in two-line document."""
    # byte_offset=20 is exactly at line 1 start
    codeflash_output = _byte_to_line_index(20, [0, 20]); result = codeflash_output # 951ns -> 591ns (60.9% faster)

def test_largescale_100_lines():
    """Test with 100 lines (realistic code file size)."""
    # Create line starts for 100 lines, each line ~50 bytes
    line_starts = [i * 50 for i in range(100)]
    # Test various byte offsets across the document
    codeflash_output = _byte_to_line_index(0, line_starts) # 1.12μs -> 712ns (57.6% faster)
    codeflash_output = _byte_to_line_index(25, line_starts) # 661ns -> 421ns (57.0% faster)
    codeflash_output = _byte_to_line_index(50, line_starts) # 530ns -> 370ns (43.2% faster)
    codeflash_output = _byte_to_line_index(2450, line_starts) # 451ns -> 330ns (36.7% faster)
    codeflash_output = _byte_to_line_index(4950, line_starts) # 420ns -> 280ns (50.0% faster)

def test_largescale_1000_lines():
    """Test with 1000 lines (large code file)."""
    # Create line starts for 1000 lines
    line_starts = [i * 100 for i in range(1000)]
    # Test at various points: start, middle, end
    codeflash_output = _byte_to_line_index(0, line_starts) # 1.14μs -> 742ns (53.9% faster)
    codeflash_output = _byte_to_line_index(500, line_starts) # 671ns -> 501ns (33.9% faster)
    codeflash_output = _byte_to_line_index(50000, line_starts) # 561ns -> 421ns (33.3% faster)
    codeflash_output = _byte_to_line_index(99900, line_starts) # 501ns -> 380ns (31.8% faster)

def test_largescale_varying_line_sizes():
    """Test with varying line sizes (realistic text document)."""
    # Simulate realistic line lengths: 20-150 bytes per line
    line_starts = []
    current_byte = 0
    for i in range(200):
        line_starts.append(current_byte)
        # Vary line size: roughly 20 + (i % 130) bytes
        current_byte += 20 + (i % 130)
    
    # Test access at various points
    codeflash_output = _byte_to_line_index(line_starts[0], line_starts) # 1.19μs -> 781ns (52.6% faster)
    codeflash_output = _byte_to_line_index(line_starts[50], line_starts) # 651ns -> 461ns (41.2% faster)
    codeflash_output = _byte_to_line_index(line_starts[100], line_starts) # 450ns -> 331ns (36.0% faster)
    codeflash_output = _byte_to_line_index(line_starts[199], line_starts) # 470ns -> 330ns (42.4% faster)

def test_largescale_dense_lookups():
    """Test 100 consecutive lookups across a 1000-line document."""
    # Create 1000 lines
    line_starts = [i * 80 for i in range(1000)]
    
    # Perform 100 lookups at different positions
    for lookup_count in range(100):
        byte_offset = lookup_count * 8000  # Every 100 lines
        expected_line = lookup_count * 100
        if expected_line >= len(line_starts):
            expected_line = len(line_starts) - 1
        codeflash_output = _byte_to_line_index(byte_offset, line_starts); result = codeflash_output # 40.7μs -> 29.7μs (36.9% faster)

def test_largescale_boundary_testing():
    """Test all line boundaries in a 500-line document."""
    # Create 500 lines
    line_starts = [i * 64 for i in range(500)]
    
    # Test exact boundary for every 10th line
    for line_num in range(0, 500, 10):
        byte_offset = line_starts[line_num]
        codeflash_output = _byte_to_line_index(byte_offset, line_starts); result = codeflash_output # 22.8μs -> 16.8μs (35.3% faster)

def test_largescale_between_boundaries():
    """Test midpoints between line boundaries in 300-line document."""
    # Create 300 lines, 100 bytes each
    line_starts = [i * 100 for i in range(300)]
    
    # Test midpoint of each line
    for line_num in range(299):
        midpoint = (line_starts[line_num] + line_starts[line_num + 1]) // 2
        codeflash_output = _byte_to_line_index(midpoint, line_starts); result = codeflash_output # 117μs -> 85.1μs (37.6% faster)

def test_largescale_random_accesses():
    """Test random-like byte offset access patterns."""
    # Create 250 lines
    line_starts = [i * 120 for i in range(250)]
    
    # Simulate random accesses using various byte offsets
    test_offsets = [
        0, 500, 1000, 5000, 10000, 15000, 20000, 25000, 28000, 29999
    ]
    
    for offset in test_offsets:
        codeflash_output = _byte_to_line_index(offset, line_starts); result = codeflash_output # 5.18μs -> 3.61μs (43.3% faster)

def test_largescale_sparse_lines():
    """Test with very sparse line distribution."""
    # Create line starts that are far apart
    line_starts = [0, 1000, 5000, 10000, 50000, 100000, 500000]
    
    # Test offsets in each region
    codeflash_output = _byte_to_line_index(500, line_starts) # 1.11μs -> 711ns (56.4% faster)
    codeflash_output = _byte_to_line_index(3000, line_starts) # 531ns -> 341ns (55.7% faster)
    codeflash_output = _byte_to_line_index(7500, line_starts) # 400ns -> 270ns (48.1% faster)
    codeflash_output = _byte_to_line_index(25000, line_starts) # 400ns -> 261ns (53.3% faster)
    codeflash_output = _byte_to_line_index(75000, line_starts) # 360ns -> 240ns (50.0% faster)
    codeflash_output = _byte_to_line_index(250000, line_starts) # 340ns -> 250ns (36.0% faster)
    codeflash_output = _byte_to_line_index(999999, line_starts) # 330ns -> 240ns (37.5% faster)

def test_largescale_tight_lines():
    """Test with very tight line distribution."""
    # Create 1000 lines with 1-byte lines
    line_starts = list(range(1000))
    
    # Test various offsets
    codeflash_output = _byte_to_line_index(0, line_starts) # 1.26μs -> 821ns (53.7% faster)
    codeflash_output = _byte_to_line_index(1, line_starts) # 651ns -> 461ns (41.2% faster)
    codeflash_output = _byte_to_line_index(500, line_starts) # 561ns -> 461ns (21.7% faster)
    codeflash_output = _byte_to_line_index(999, line_starts) # 491ns -> 381ns (28.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1580-2026-02-20T20.27.56 and push.

Codeflash Static Badge

The optimization replaces `max(idx, 0)` with a ternary expression `0 if idx < 0 else idx`, achieving a **34% runtime improvement** (from 1.06ms to 783μs).

**What changed:**
The only modification is in the return statement - replacing the `max()` built-in function call with an inline conditional expression.

**Why it's faster:**
The `max()` function in Python involves overhead from:
1. Function call setup and teardown
2. Argument tuple creation for variadic parameters
3. Generic comparison logic that handles arbitrary types and multiple arguments

The ternary operator `0 if idx < 0 else idx` is:
1. A direct bytecode operation (no function call)
2. A single comparison with immediate branching
3. Optimized at the compiler level for simple integer comparisons

Line profiler data confirms this: the return statement dropped from 751,720ns total time (40.9% of function time) to 476,309ns (28.7% of function time) - a **37% reduction** in that line alone.

**Performance characteristics:**
Based on the annotated tests, the optimization shows consistent improvements across all test cases:
- **48-71% speedup** on basic single and multi-line mappings
- **50-66% speedup** on edge cases (empty lists, negative offsets, large offsets)
- **32-43% speedup** on large-scale tests with 100-1000 lines
- Particularly effective for tight line distributions where the function is called frequently

The optimization is universally beneficial because every call to `_byte_to_line_index` executes this return statement exactly once, making it a hot path regardless of input characteristics.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026
Base automatically changed from fix/java-direct-jvm-and-bugs to omni-java February 20, 2026 20:29
@codeflash-ai codeflash-ai bot closed this Feb 20, 2026
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Feb 20, 2026

This PR has been automatically closed because the original PR #1580 by mashraf-222 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr1580-2026-02-20T20.27.56 branch February 20, 2026 20:29
@claude
Copy link
Contributor

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

  • ruff FURB136 (if-expr-min-max): The optimization 0 if idx < 0 else idx was auto-reverted to max(idx, 0) by the linter.
  • Committed fix in 21067afb and pushed.
  • After fix: prek passes, mypy passes (no issues).

Important: The ruff FURB136 rule converts the ternary 0 if idx < 0 else idx back to max(idx, 0), which entirely reverts this PR's optimization. The net diff after linting is zero — the code on the PR branch is now identical to the base branch (omni-java). This PR should either be closed or the FURB136 rule should be suppressed with a # noqa: FURB136 comment if the micro-optimization is deemed worthwhile.

Code Review

No critical bugs, security vulnerabilities, or breaking API changes — the PR is a single-line micro-optimization that is semantically equivalent to the original.

Test Coverage

File Stmts Miss Coverage
codeflash/languages/java/instrumentation.py 527 95 82%

Since the lint fix reverted the optimization, the code on both branches is identical. No coverage regression is possible.


Last updated: 2026-02-20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants