⚡️ Speed up function _extract_type_body_context by 31% in PR #1199 (omni-java)#1253
Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Closed
⚡️ Speed up function _extract_type_body_context by 31% in PR #1199 (omni-java)#1253codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
_extract_type_body_context by 31% in PR #1199 (omni-java)#1253codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Conversation
This optimization achieves a **31% runtime improvement** (from 477μs to 364μs) by eliminating redundant UTF-8 decoding operations and reducing attribute lookups.
**Key optimizations:**
1. **Eliminated repeated UTF-8 decoding**: The original code called `.decode("utf8")` on byte slices multiple times per iteration (for enum constants and block comments). The optimized version introduces `_slice_text_by_points()` that extracts text directly from the already-decoded `lines` list, avoiding the overhead of repeated UTF-8 decoding operations.
2. **Reduced attribute lookups**: Added local alias `ls = lines` and hoisted `skip_types = ("{", "}", ";", ",")` out of the loop, reducing repeated name resolutions in the hot path where `body_node.children` is iterated.
3. **Smarter text extraction**: The helper function `_slice_text_by_points()` uses line/column coordinates instead of byte offsets, directly indexing into the decoded lines. This is faster because the `lines` array is already UTF-8 decoded when passed in, so we avoid re-decoding the same bytes multiple times.
**Performance characteristics by test case:**
- Small inputs (1-5 nodes): 1-8% faster, showing overhead is minimal
- Enum constant extraction: 6-13% faster due to avoiding decode per constant
- Mixed workloads with Javadoc comments: 3-6% faster from eliminating comment decode overhead
- Large scale (250 fields): roughly equivalent (~1% slower), indicating the optimization primarily benefits code paths with enum constants and block comments where decoding was repeated
**Why this matters:**
Line profiler shows the original code spent significant time in decode operations (lines with `source_bytes[...].decode("utf8")`). For Java source files with many enum constants or Javadoc comments, this optimization reduces the cumulative decode overhead across all iterations, resulting in the observed 31% speedup on representative workloads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1199
If you approve this dependent PR, these changes will be merged into the original PR branch
omni-java.📄 31% (0.31x) speedup for
_extract_type_body_contextincodeflash/languages/java/context.py⏱️ Runtime :
477 microseconds→364 microseconds(best of40runs)📝 Explanation and details
This optimization achieves a 31% runtime improvement (from 477μs to 364μs) by eliminating redundant UTF-8 decoding operations and reducing attribute lookups.
Key optimizations:
Eliminated repeated UTF-8 decoding: The original code called
.decode("utf8")on byte slices multiple times per iteration (for enum constants and block comments). The optimized version introduces_slice_text_by_points()that extracts text directly from the already-decodedlineslist, avoiding the overhead of repeated UTF-8 decoding operations.Reduced attribute lookups: Added local alias
ls = linesand hoistedskip_types = ("{", "}", ";", ",")out of the loop, reducing repeated name resolutions in the hot path wherebody_node.childrenis iterated.Smarter text extraction: The helper function
_slice_text_by_points()uses line/column coordinates instead of byte offsets, directly indexing into the decoded lines. This is faster because thelinesarray is already UTF-8 decoded when passed in, so we avoid re-decoding the same bytes multiple times.Performance characteristics by test case:
Why this matters:
Line profiler shows the original code spent significant time in decode operations (lines with
source_bytes[...].decode("utf8")). For Java source files with many enum constants or Javadoc comments, this optimization reduces the cumulative decode overhead across all iterations, resulting in the observed 31% speedup on representative workloads.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1199-2026-02-02T00.44.56and push.