Skip to content

Comments

⚡️ Speed up function _add_global_declarations_for_language by 103% in PR #1199 (omni-java)#1284

Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T03.34.45
Closed

⚡️ Speed up function _add_global_declarations_for_language by 103% in PR #1199 (omni-java)#1284
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
codeflash/optimize-pr1199-2026-02-03T03.34.45

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 103% (1.03x) speedup for _add_global_declarations_for_language in codeflash/code_utils/code_replacer.py

⏱️ Runtime : 409 milliseconds 202 milliseconds (best of 20 runs)

📝 Explanation and details

The optimized code achieves a 102% speedup (from 409ms to 202ms) by eliminating redundant tree-sitter parsing operations when inserting multiple declarations.

Key optimization: In the original code, after inserting each new declaration, the entire source was re-parsed via analyzer.find_module_level_declarations(result) to update line numbers. With many declarations (e.g., 100+ in test scenarios), this caused quadratic behavior—each insertion triggered a full parse of increasingly larger source code.

The optimization introduces _insert_declaration_after_dependencies_fast(), which returns not just the modified source but also metadata about the insertion: the insertion line and number of lines added. Instead of re-parsing, the code now updates the existing_decl_end_lines dictionary incrementally by:

  1. Shifting end lines of declarations appearing after the insertion point
  2. Recording the newly inserted declaration's end line directly

This transforms O(n²) parse operations into O(n) dictionary updates, where n is the number of declarations.

Performance gains by test category:

  • Dependency chains (100 declarations): 1326% faster (37.2ms → 2.61ms)
  • Independent declarations (100 items): 88.3% faster (61.3ms → 32.6ms)
  • Wide dependency graphs (100 items): 1291% faster (42.2ms → 3.03ms)
  • Simple cases (1-3 declarations): 15-25% faster

The optimization is most impactful when inserting many declarations with dependencies—precisely the scenario where re-parsing becomes expensive. For codebases with optimized code introducing numerous helper constants or utility declarations, this eliminates a major performance bottleneck while maintaining identical correctness.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 39 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import re
from pathlib import Path
from types import SimpleNamespace

import codeflash.languages.treesitter_utils as ts_utils  # to monkeypatch get_analyzer_for_file
import pytest  # used for our unit tests
from codeflash.code_utils.code_replacer import \
    _add_global_declarations_for_language
from codeflash.languages.base import Language

# Helper fake analyzer used to simulate minimal analyzer behavior.
# NOTE: This is a small, self-contained helper used only in tests to avoid requiring
# an actual tree-sitter parser. It intentionally implements only the interface
# used by _add_global_declarations_for_language (find_module_level_declarations,
# find_imports, find_referenced_identifiers). The production TreeSitterAnalyzer is not
# redefined or modified; we only monkeypatch the factory function to return this object.
class _FakeAnalyzer:
    def __init__(self):
        # No persistent state other than simple parsing behavior
        pass

    # Very small parser for module-level declarations:
    # - recognizes single-line "const NAME = ...;", "let NAME = ...;", "var NAME = ...;"
    # - recognizes "class NAME" single-line declaration (class on one line for tests)
    # - recognizes "type NAME", "interface NAME", "enum NAME" similarly
    # It returns a list of SimpleNamespace objects with attributes used by the code under test.
    def find_module_level_declarations(self, source: str):
        decls = []
        lines = source.splitlines()
        for idx, line in enumerate(lines):
            stripped = line.strip()
            # simple regex to capture declaration name and keep the full original source line as source_code
            m = re.match(r'^(?:const|let|var)\s+([A-Za-z_][A-Za-z0-9_]*)\b.*', stripped)
            if m:
                name = m.group(1)
                decls.append(
                    SimpleNamespace(
                        name=name,
                        declaration_type="variable",
                        source_code=line if line.endswith("\n") else line + "\n",
                        start_line=idx + 1,
                        end_line=idx + 1,
                        is_exported=False,
                    )
                )
                continue
            m = re.match(r'^(?:export\s+)?class\s+([A-Za-z_][A-Za-z0-9_]*)\b.*', stripped)
            if m:
                name = m.group(1)
                decls.append(
                    SimpleNamespace(
                        name=name,
                        declaration_type="class",
                        source_code=line if line.endswith("\n") else line + "\n",
                        start_line=idx + 1,
                        end_line=idx + 1,
                        is_exported=stripped.startswith("export"),
                    )
                )
                continue
            m = re.match(r'^(?:type|interface|enum)\s+([A-Za-z_][A-Za-z0-9_]*)\b.*', stripped)
            if m:
                name = m.group(1)
                decls.append(
                    SimpleNamespace(
                        name=name,
                        declaration_type=m.group(0).split()[0],
                        source_code=line if line.endswith("\n") else line + "\n",
                        start_line=idx + 1,
                        end_line=idx + 1,
                        is_exported=False,
                    )
                )
                continue
            # Also accept "export const NAME = ..." patterns
            m = re.match(r'^export\s+(?:const|let|var)\s+([A-Za-z_][A-Za-z0-9_]*)\b.*', stripped)
            if m:
                name = m.group(1)
                decls.append(
                    SimpleNamespace(
                        name=name,
                        declaration_type="variable",
                        source_code=line if line.endswith("\n") else line + "\n",
                        start_line=idx + 1,
                        end_line=idx + 1,
                        is_exported=True,
                    )
                )
                continue
        return decls

    # Parse very small set of ES module import patterns:
    # - default import: import X from '...';
    # - named imports: import {a as b, c} from '...';
    # - namespace import: import * as ns from '...';
    # Returns a list of SimpleNamespace with attributes default_import (str|None),
    # named_imports (list of (name, alias)), namespace_import (str|None)
    def find_imports(self, source: str):
        imports = []
        for line in source.splitlines():
            s = line.strip()
            if not s.startswith("import"):
                continue
            # namespace import
            m = re.match(r'import\s+\*\s+as\s+([A-Za-z_][A-Za-z0-9_]*)\s+from', s)
            if m:
                imports.append(SimpleNamespace(default_import=None, named_imports=[], namespace_import=m.group(1)))
                continue
            # default + maybe named
            m = re.match(r'import\s+([A-Za-z_][A-Za-z0-9_]*)(?:\s*,\s*\{([^}]*)\})?\s*from', s)
            if m:
                default = m.group(1)
                named_str = m.group(2)
                named = []
                if named_str:
                    for part in named_str.split(","):
                        part = part.strip()
                        if " as " in part:
                            a, b = [p.strip() for p in part.split(" as ", 1)]
                            named.append((a, b))
                        elif part:
                            named.append((part, None))
                imports.append(SimpleNamespace(default_import=default, named_imports=named, namespace_import=None))
                continue
            # named-only import: import {a, b as c} from ...
            m = re.match(r'import\s*\{([^}]*)\}\s*from', s)
            if m:
                named_str = m.group(1)
                named = []
                for part in named_str.split(","):
                    part = part.strip()
                    if " as " in part:
                        a, b = [p.strip() for p in part.split(" as ", 1)]
                        named.append((a, b))
                    elif part:
                        named.append((part, None))
                imports.append(SimpleNamespace(default_import=None, named_imports=named, namespace_import=None))
                continue
            # fallback: unknown import form - return an empty import record
            imports.append(SimpleNamespace(default_import=None, named_imports=[], namespace_import=None))
        return imports

    # Very lightweight identifier extraction:
    # - extract word tokens that look like identifiers
    # - exclude common JS keywords and the declared identifier (left-side of const/let/var)
    def find_referenced_identifiers(self, source: str):
        # identify declared name on left side, if any
        declared = None
        m = re.match(r'^(?:export\s+)?(?:const|let|var)\s+([A-Za-z_][A-Za-z0-9_]*)\b', source.strip())
        if m:
            declared = m.group(1)
        # find all words
        tokens = set(re.findall(r'\b[A-Za-z_][A-Za-z0-9_]*\b', source))
        # exclude javascript/ts reserved words and simple primitives/keywords
        exclude = {
            "const", "let", "var", "class", "type", "interface", "enum", "export",
            "from", "as", "function", "return", "if", "else", "true", "false", "null", "undefined",
            "bind", "this", "new"
        }
        if declared:
            exclude.add(declared)
        # remove tokens that are lowercase JS primitives (very conservative)
        tokens = {t for t in tokens if t not in exclude and not t.islower()}
        return tokens

def test_returns_original_for_java_language():
    # If language is Java, the function should immediately return the original_source
    original = "public class Foo {}"
    optimized = "some optimized content"
    # Language.JAVA should be present in the real Language enum and map to the early return
    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.java"),
        language=Language.JAVA,
        target_function_names=None,
    ); result = codeflash_output # 6.11μs -> 5.57μs (9.69% faster)

def test_returns_original_for_unsupported_non_js_ts_language():
    # Choose a language that isn't JS/TS/Java (if present) to ensure fallback returns original
    # If Language has a PYTHON member, use it; otherwise use an enum value that is not JS/TS/Java.
    # This test ensures that languages other than JS/TS are left unchanged.
    language = getattr(Language, "PYTHON", None)
    if language is None:
        # As a fallback, pick a language that is definitely not JS/TS/Java by creating an object
        # from the enum that is not equal; but because we must use real Language values,
        # we will instead use a known alternative type (if present). If PYTHON isn't available,
        # we pick a different known value if present.
        possible = [getattr(Language, n) for n in dir(Language) if n.isupper()]
        # Remove the JS/TS/Java ones
        possible = [p for p in possible if p not in (getattr(Language, "JAVASCRIPT", None),
                                                      getattr(Language, "TYPESCRIPT", None),
                                                      getattr(Language, "JAVA", None))]
        # Ensure there's at least one candidate; otherwise skip assertion (defensive)
        if not possible:
            pytest.skip("No suitable non-JS/TS/Java Language enum value available for this environment")
        language = possible[0]

    original = "print('hello')\n"
    optimized = "print('hello optimized')\n"
    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.py"),
        language=language,
        target_function_names=None,
    ); res = codeflash_output # 5.76μs -> 6.12μs (5.90% slower)

def test_exception_in_internal_processing_returns_original(monkeypatch):
    # Force the get_analyzer_for_file import to raise an exception to exercise the exception handling.
    def raise_on_call(path):
        raise RuntimeError("simulated failure")

    # Patch the symbol in the real module that will be imported inside the function
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", raise_on_call)

    original = "const A = 1;\n"
    optimized = "const B = 2;\n"
    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
        target_function_names=None,
    ); result = codeflash_output # 18.7μs -> 19.4μs (3.66% slower)

def test_no_new_declarations_when_optimized_empty(monkeypatch):
    # If the optimized code contains no module-level declarations, original should be returned.
    fake = _FakeAnalyzer()
    # Ensure the function will use our fake analyzer by patching the real factory
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", lambda path: fake)

    original = "import fs from 'fs';\nconst EXISTING = 1;\n"
    optimized = "// optimized code with no top level declarations\nconsole.log('ok');\n"

    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
    ); result = codeflash_output # 24.2μs -> 24.3μs (0.456% slower)

def test_add_single_declaration_after_imports(monkeypatch):
    # Add one declaration from optimized code into the original, after existing imports.
    fake = _FakeAnalyzer()
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", lambda path: fake)

    original = "import fs from 'fs';\nimport { join } from 'path';\n\nfunction main() {}\n"
    optimized = "const NEW = 42;\n"

    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
    ); result = codeflash_output # 79.4μs -> 70.7μs (12.3% faster)

def test_duplicate_declarations_filtered(monkeypatch):
    # If optimized contains duplicate declarations with identical source_code, only insert once.
    fake = _FakeAnalyzer()
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", lambda path: fake)

    original = ""
    # Two identical declaration lines in optimized code
    optimized = "const DUP = 1;\nconst DUP = 1;\n"

    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
    ); result = codeflash_output # 32.8μs -> 32.4μs (1.45% faster)

def test_insert_respects_dependencies(monkeypatch):
    # If optimized has a declaration that depends on an existing declaration in original,
    # the new declaration should be placed after the existing one.
    fake = _FakeAnalyzer()
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", lambda path: fake)

    original = "const A = 1;\n\nfunction useA() { return A; }\n"
    # B depends on A (uses A in its initializer)
    optimized = "const B = A + 1;\n"

    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
    ); result = codeflash_output # 46.3μs -> 37.6μs (23.0% faster)
    pos_a = result.index("const A = 1;\n")
    pos_b = result.index("const B = A + 1;\n")

def test_large_chain_of_declarations(monkeypatch):
    # Create a chain of 100 declarations where each depends on the previous one.
    # This tests scalability and correct ordering without exceeding the requested limits.
    fake = _FakeAnalyzer()
    monkeypatch.setattr(ts_utils, "get_analyzer_for_file", lambda path: fake)

    N = 100  # Keep well under the 1000 limit
    original = "import fs from 'fs';\n\n"  # start with an import and blank line
    # Build optimized code: const A1 = 1; const A2 = A1 + 1; ... const AN = A{N-1} + 1;
    optimized_lines = []
    for i in range(1, N + 1):
        if i == 1:
            optimized_lines.append(f"const A{i} = 1;\n")
        else:
            optimized_lines.append(f"const A{i} = A{i-1} + 1;\n")
    optimized = "".join(optimized_lines)

    codeflash_output = _add_global_declarations_for_language(
        optimized_code=optimized,
        original_source=original,
        module_abspath=Path("module.js"),
        language=Language.JAVASCRIPT,
    ); result = codeflash_output # 14.8ms -> 1.29ms (1044% faster)

    # All declarations A1..AN should be in the resulting source exactly once each and in order
    for i in range(1, N + 1):
        decl = f"const A{i} ="
    # Check ordering: A1 before A2 before A3 ... before AN
    last_pos = -1
    for i in range(1, N + 1):
        pos = result.index(f"const A{i} =")
        last_pos = pos
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path

import pytest
from codeflash.code_utils.code_replacer import \
    _add_global_declarations_for_language
from codeflash.languages.base import Language

def test_basic_add_single_const_declaration():
    """Test adding a single new const declaration to JavaScript code."""
    original_source = "const FOO = 42;\n"
    optimized_code = "const FOO = 42;\nconst BAR = 100;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 152μs -> 119μs (27.1% faster)

def test_basic_add_single_const_typescript():
    """Test adding a single new const declaration to TypeScript code."""
    original_source = "const VALUE: number = 10;\n"
    optimized_code = "const VALUE: number = 10;\nconst NEW_VAL: string = 'test';\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 153μs -> 122μs (24.7% faster)

def test_basic_add_class_declaration():
    """Test adding a new class declaration."""
    original_source = "class Foo {}\n"
    optimized_code = "class Foo {}\nclass Bar {}\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 103μs -> 87.9μs (17.5% faster)

def test_basic_no_new_declarations():
    """Test when there are no new declarations to add."""
    original_source = "const FOO = 42;\n"
    optimized_code = "const FOO = 42;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 63.5μs -> 61.6μs (3.15% faster)

def test_basic_java_returns_original():
    """Test that Java code returns original source unchanged."""
    original_source = "public class Foo { int x = 5; }"
    optimized_code = "public class Foo { int x = 5; int y = 10; }"
    module_path = Path("test.java")
    language = Language.JAVA
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 4.19μs -> 3.80μs (10.3% faster)

def test_basic_unsupported_language_returns_original():
    """Test that unsupported languages return original source unchanged."""
    original_source = "def foo(): pass"
    optimized_code = "def foo(): pass\ndef bar(): pass"
    module_path = Path("test.py")
    language = Language.PYTHON
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 4.49μs -> 4.38μs (2.51% faster)

def test_basic_empty_original_source():
    """Test adding declarations to empty original source."""
    original_source = ""
    optimized_code = "const FOO = 42;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 75.3μs -> 63.9μs (17.8% faster)

def test_basic_empty_optimized_code():
    """Test when optimized code is empty."""
    original_source = "const FOO = 42;\n"
    optimized_code = ""
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 36.5μs -> 34.8μs (4.98% faster)

def test_edge_declaration_with_dependency():
    """Test adding a declaration that depends on an existing declaration."""
    original_source = "const FOO = { bar: 42 };\n"
    optimized_code = "const FOO = { bar: 42 };\nconst _has = FOO.bar;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 131μs -> 107μs (22.2% faster)

def test_edge_multiple_dependencies():
    """Test adding a declaration with multiple dependencies."""
    original_source = "const A = 1;\nconst B = 2;\n"
    optimized_code = "const A = 1;\nconst B = 2;\nconst C = A + B;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 136μs -> 112μs (21.8% faster)

def test_edge_chain_of_dependencies():
    """Test adding declarations with chain of dependencies."""
    original_source = "const A = 1;\n"
    optimized_code = "const A = 1;\nconst B = A + 1;\nconst C = B + 1;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 147μs -> 107μs (36.5% faster)

def test_edge_declaration_already_exists_with_different_content():
    """Test when a declaration name exists in original but with different content."""
    original_source = "const FOO = 42;\n"
    optimized_code = "const FOO = 100;\nconst BAR = 200;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 103μs -> 89.8μs (15.7% faster)

def test_edge_exported_declaration():
    """Test handling of exported declarations."""
    original_source = "export const FOO = 42;\n"
    optimized_code = "export const FOO = 42;\nexport const BAR = 100;\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 126μs -> 103μs (21.6% faster)

def test_edge_multiline_declaration():
    """Test handling of multiline declarations."""
    original_source = ""
    optimized_code = """const config = {
  key1: 'value1',
  key2: 'value2'
};
"""
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 103μs -> 87.2μs (18.6% faster)

def test_edge_let_declaration():
    """Test adding let declarations."""
    original_source = "const A = 1;\n"
    optimized_code = "const A = 1;\nlet B = 2;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 106μs -> 91.2μs (17.2% faster)

def test_edge_var_declaration():
    """Test adding var declarations."""
    original_source = "const A = 1;\n"
    optimized_code = "const A = 1;\nvar B = 2;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 105μs -> 89.1μs (18.3% faster)

def test_edge_type_alias_declaration():
    """Test adding TypeScript type alias declarations."""
    original_source = "type Foo = string;\n"
    optimized_code = "type Foo = string;\ntype Bar = number;\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 107μs -> 91.7μs (17.6% faster)

def test_edge_interface_declaration():
    """Test adding TypeScript interface declarations."""
    original_source = "interface Foo { x: number; }\n"
    optimized_code = "interface Foo { x: number; }\ninterface Bar { y: string; }\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 132μs -> 112μs (17.4% faster)

def test_edge_enum_declaration():
    """Test adding TypeScript enum declarations."""
    original_source = "enum Color { Red = 0 }\n"
    optimized_code = "enum Color { Red = 0 }\nenum Size { Small = 1 }\n"
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 120μs -> 103μs (16.5% faster)

def test_edge_duplicate_declaration_in_optimized_code():
    """Test that duplicate declarations in optimized code are handled correctly."""
    original_source = ""
    optimized_code = """const FOO = 42;
const FOO = 42;
"""
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 78.8μs -> 69.0μs (14.3% faster)

def test_edge_declaration_with_string_containing_semicolon():
    """Test declaration containing strings with semicolons."""
    original_source = ""
    optimized_code = 'const MSG = "hello; world";\n'
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 73.3μs -> 62.6μs (17.1% faster)

def test_edge_declaration_with_special_characters():
    """Test declarations with special characters in names."""
    original_source = ""
    optimized_code = "const _internal = 42;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 67.2μs -> 58.0μs (15.8% faster)

def test_edge_declaration_with_destructuring():
    """Test declaration using destructuring."""
    original_source = ""
    optimized_code = "const { x, y } = obj;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 91.6μs -> 75.1μs (22.0% faster)

def test_edge_declaration_with_arrow_function():
    """Test declaration with arrow function."""
    original_source = ""
    optimized_code = "const fn = () => 42;\n"
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 76.9μs -> 65.6μs (17.2% faster)

def test_edge_mixed_declaration_types():
    """Test adding multiple different declaration types."""
    original_source = ""
    optimized_code = """const VAR1 = 1;
class MyClass {}
type MyType = string;
interface MyInterface { }
enum MyEnum { A = 0 }
"""
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 343μs -> 232μs (47.7% faster)

def test_large_many_declarations():
    """Test adding many new declarations at scale."""
    # Create original source with 50 declarations
    original_lines = [f"const VAR{i} = {i};" for i in range(50)]
    original_source = "\n".join(original_lines) + "\n"
    
    # Create optimized code with original plus 50 new declarations
    optimized_lines = original_lines + [f"const NEW_VAR{i} = {i};" for i in range(50)]
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 45.6ms -> 24.7ms (84.9% faster)
    
    # Should contain all new declarations
    for i in range(50):
        pass
    # Should maintain original declarations
    for i in range(50):
        pass

def test_large_deep_dependency_chain():
    """Test with a deep chain of dependencies."""
    # Create a dependency chain: A depends on nothing, B on A, C on B, etc.
    original_source = "const BASE = 0;\n"
    
    optimized_lines = ["const BASE = 0;"]
    for i in range(1, 100):
        prev_var = "BASE" if i == 1 else f"CHAIN{i-1}"
        optimized_lines.append(f"const CHAIN{i} = {prev_var} + 1;")
    
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 37.2ms -> 2.61ms (1326% faster)
    for i in range(1, 100):
        pass
    
    # Check ordering is preserved (each should come after its dependency)
    base_pos = result.find("const BASE = 0;")
    chain1_pos = result.find("const CHAIN1 = BASE + 1;")

def test_large_many_independent_declarations():
    """Test with many independent (non-dependent) declarations."""
    original_source = ""
    
    # Create 100 independent declarations
    optimized_lines = [f"const INDEPENDENT{i} = {i};" for i in range(100)]
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 61.3ms -> 32.6ms (88.3% faster)
    
    # All should be present
    for i in range(100):
        pass

def test_large_complex_multiline_declarations():
    """Test with many large multiline declarations."""
    original_source = ""
    
    # Create several large multiline object declarations
    optimized_lines = []
    for i in range(50):
        obj_content = ",\n  ".join([f"key{j}: 'value{j}'" for j in range(10)])
        optimized_lines.append(f"const OBJECT{i} = {{\n  {obj_content}\n}};")
    
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 123ms -> 86.2ms (43.1% faster)
    
    # All declarations should be present
    for i in range(50):
        pass

def test_large_wide_dependency_graph():
    """Test with many declarations that all depend on a common base."""
    original_source = "const BASE = { a: 1, b: 2, c: 3 };\n"
    
    # Create 100 declarations that each reference BASE
    optimized_lines = ["const BASE = { a: 1, b: 2, c: 3 };"]
    for i in range(100):
        optimized_lines.append(f"const DERIVED{i} = BASE.a + {i};")
    
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.js")
    language = Language.JAVASCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 42.2ms -> 3.03ms (1291% faster)
    
    # All derived declarations should be present
    for i in range(100):
        pass

def test_large_mixed_language_features():
    """Test with large number of mixed TypeScript features."""
    optimized_lines = []
    for i in range(25):
        optimized_lines.append(f"const VAR{i}: number = {i};")
    for i in range(25):
        optimized_lines.append(f"type Type{i} = string | number;")
    for i in range(25):
        optimized_lines.append(f"interface Interface{i} {{ prop{i}: any; }}")
    for i in range(25):
        optimized_lines.append(f"enum Enum{i} {{ A = {i} }}")
    
    original_source = ""
    optimized_code = "\n".join(optimized_lines) + "\n"
    
    module_path = Path("test.ts")
    language = Language.TYPESCRIPT
    
    codeflash_output = _add_global_declarations_for_language(
        optimized_code, original_source, module_path, language
    ); result = codeflash_output # 81.9ms -> 49.0ms (67.0% faster)
    
    # All types should be present
    for i in range(25):
        pass
    for i in range(25):
        pass
    for i in range(25):
        pass
    for i in range(25):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-03T03.34.45 and push.

Codeflash Static Badge

The optimized code achieves a **102% speedup** (from 409ms to 202ms) by eliminating redundant tree-sitter parsing operations when inserting multiple declarations.

**Key optimization:** In the original code, after inserting each new declaration, the entire source was re-parsed via `analyzer.find_module_level_declarations(result)` to update line numbers. With many declarations (e.g., 100+ in test scenarios), this caused quadratic behavior—each insertion triggered a full parse of increasingly larger source code.

The optimization introduces `_insert_declaration_after_dependencies_fast()`, which returns not just the modified source but also metadata about the insertion: the insertion line and number of lines added. Instead of re-parsing, the code now updates the `existing_decl_end_lines` dictionary incrementally by:
1. Shifting end lines of declarations appearing after the insertion point
2. Recording the newly inserted declaration's end line directly

This transforms O(n²) parse operations into O(n) dictionary updates, where n is the number of declarations.

**Performance gains by test category:**
- **Dependency chains** (100 declarations): 1326% faster (37.2ms → 2.61ms)
- **Independent declarations** (100 items): 88.3% faster (61.3ms → 32.6ms) 
- **Wide dependency graphs** (100 items): 1291% faster (42.2ms → 3.03ms)
- **Simple cases** (1-3 declarations): 15-25% faster

The optimization is most impactful when inserting many declarations with dependencies—precisely the scenario where re-parsing becomes expensive. For codebases with optimized code introducing numerous helper constants or utility declarations, this eliminates a major performance bottleneck while maintaining identical correctness.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 3, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 3, 2026
@KRRT7
Copy link
Collaborator

KRRT7 commented Feb 19, 2026

Closing stale bot PR.

@KRRT7 KRRT7 closed this Feb 19, 2026
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1199-2026-02-03T03.34.45 branch February 19, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant