Skip to content

refactor(extractors): parser abstraction layer (Phase 7.1)#700

Merged
carlos-alm merged 3 commits intomainfrom
refactor/parser-abstraction-layer
Mar 30, 2026
Merged

refactor(extractors): parser abstraction layer (Phase 7.1)#700
carlos-alm merged 3 commits intomainfrom
refactor/parser-abstraction-layer

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add 4 reusable helpers to src/extractors/helpers.ts: findParentNode, extractBodyMembers, stripQuotes, lastPathSegment
  • Refactor 9 language extractors to use them, replacing 6 findParent* functions, 5 body-iteration patterns, and inline quote-stripping/path-splitting across 7 files
  • Net: -82 lines (+77 helper, -159 extractor boilerplate)

Implements ROADMAP Phase 7.1 — reduces per-language extractor boilerplate to prepare for Batch 1 language additions (C, C++, Kotlin, etc.)

Test plan

  • TypeScript compiles (tsc --noEmit)
  • All 272 parser tests pass
  • Lint/format clean (biome)

Extract shared patterns from 9 language extractors into 4 reusable
helpers in helpers.ts, reducing per-language boilerplate by ~30 lines:

- findParentNode: replaces 6 findParent*/findCurrentImpl functions
- extractBodyMembers: replaces 5 body-iteration patterns for enums/structs
- stripQuotes: replaces inline .replace(/"/g,'') across 3 extractors
- lastPathSegment: replaces inline .split('.').pop() across 6 extractors

Net: +77 helper lines, -159 extractor lines = -82 lines total.
@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @carlos-alm's task —— View job


PR Review: refactor(extractors): parser abstraction layer (Phase 7.1) ⭐⭐⭐⭐☆

Root Cause Analysis

Problem Being Solved: Code duplication across language extractors - 6 separate findParent* functions, 5 body-iteration patterns, and inline quote-stripping/path-splitting across 7 files.

Why This Approach: Consolidating into 4 reusable helpers reduces boilerplate from ~200 to ~80 lines per extractor and directly implements ROADMAP Phase 7.1 to prepare for adding 23 new languages.

Risk Assessment: ✅ LOW - Pure refactoring with no functional changes.

Backlog Compliance

  • Zero-dep: ✅ No new runtime dependencies added
  • Foundation-aligned: ✅ Fully compliant - improves parsing performance without cloud dependencies
  • Problem-fit: 4/5 - Directly enables Phase 7 language expansion, reducing AI token waste by supporting more codebases
  • Breaking: No - Pure refactoring, no API changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Configuration & Test Integrity ✅

  • ✅ No configuration relaxation detected (biome.json, package.json unchanged)
  • ✅ No test modifications - maintains all existing test coverage
  • ✅ No linting rules disabled or coverage thresholds lowered

Code Quality Assessment ✅

Excellent abstraction design:

  1. findParentNode(node, typeNames, nameField) - Replaces 6 language-specific functions
  2. extractBodyMembers(container, bodyFields, memberType, kind) - Unified pattern for struct/enum/class members
  3. stripQuotes() & lastPathSegment() - Clean string utilities

Comprehensive refactoring: Found 25 references to old patterns across 7 files - confirms complete scope.

Technical Analysis

Net Impact: -82 lines (+77 helper, -159 boilerplate)
Files Changed: 11 (9 extractors + roadmap + helpers)
Fits ROADMAP: ✅ Exactly matches planned Phase 7.1 deliverables

Quality Indicators:

  • No eslint-disable or @ts-ignore additions
  • Clean abstractions with good naming
  • Preserves all existing functionality
  • Follows established code conventions

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid refactoring that directly implements roadmap phase 7.1, reduces maintenance burden, and prepares for major language expansion. Clean code, no configuration shortcuts, comprehensive test coverage maintained. Only downside is inability to verify tests pass without additional permissions.

This is exactly the type of foundational work needed before adding 23 new languages. Well executed.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR implements ROADMAP Phase 7.1, extracting four reusable helpers (findParentNode, extractBodyMembers, stripQuotes, lastPathSegment) into src/extractors/helpers.ts and refactoring all nine language extractors to use them. The result is a net −82 line reduction that eliminates duplicated parent-chain walkers, body-iteration loops, and inline string-manipulation patterns across the codebase, making it significantly cheaper to add new language extractors in upcoming Batch 1 work.

  • findParentNode unifies six per-language findParent* functions with a single, depth-unlimited parent-chain walker parameterised on node types and name field.
  • extractBodyMembers replaces five near-identical body-iteration loops; each bodyFields entry is tried as a grammar field name first, then as a child node type — slightly more permissive than the original patterns but functionally equivalent for all 272 tests.
  • stripQuotes and lastPathSegment eliminate scattered inline .replace / .split().pop() calls across seven files; stripQuotes now covers backticks in addition to '/", an intentional expansion of scope harmless for all current call-sites.
  • Previous review findings (anchored stripQuotes semantics, Rust self_parameter empty-block comment) are addressed.

Confidence Score: 5/5

Safe to merge — clean refactor with no regressions, all 272 tests pass, and prior review concerns are resolved.

No P0 or P1 issues found. The single P2 comment is a documentation suggestion for extractBodyMembers's double-dispatch body lookup; the behaviour itself is harmless and fully covered by the existing test suite. All previously raised concerns (anchored stripQuotes semantics, Rust self_parameter comment) have been addressed.

No files require special attention; src/extractors/helpers.ts is the logical hub but is straightforward and well-documented.

Important Files Changed

Filename Overview
src/extractors/helpers.ts Core of the refactor: adds findParentNode, extractBodyMembers, stripQuotes, and lastPathSegment — all well-documented with correct semantics; minor note on extractBodyMembers applying findChild fallback to every bodyFields entry rather than only the designated fallback name
src/extractors/rust.ts findChild removed from imports (no longer needed directly), findParentNode + extractBodyMembers adopted, self_parameter comment restored as per prior review resolution
src/extractors/csharp.ts findCSharpParentType and extractCSharpEnumMembers replaced cleanly; CS_PARENT_TYPES const correctly includes all five C# type-declaration node types
src/extractors/java.ts findJavaParentClass and extractEnumConstants replaced cleanly; JAVA_PARENT_TYPES and body-field list match original lookup order
src/extractors/javascript.ts findParentClass replaced cleanly; JS_CLASS_TYPES correctly covers both class_declaration and anonymous class forms
src/extractors/php.ts extractPhpEnumCases and two lastPathSegment call-sites replaced cleanly; PHP backslash separator correctly passed to lastPathSegment
src/extractors/python.ts findPythonParentClass replaced cleanly; PY_CLASS_TYPES singleton correctly targets class_definition
src/extractors/ruby.ts findRubyParentClass, handleRubyRequire, stripQuotes, and lastPathSegment replacements correct; stripQuotes now also strips backticks (minor expansion of original /^['"]
src/extractors/go.ts extractGoImportSpec updated to use stripQuotes + lastPathSegment; behaviour equivalent to original for well-formed Tree-sitter path nodes
src/extractors/hcl.ts Six inline .replace(/"/g,'') calls replaced with stripQuotes; anchored semantics are correct for HCL string literal nodes
docs/roadmap/ROADMAP.md Phase 7.1 marked complete; table updated to reflect actual helper signatures and file location

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph helpers["src/extractors/helpers.ts (new exports)"]
        FPN["findParentNode(node, typeNames, nameField?)"]
        EBM["extractBodyMembers(containerNode, bodyFields, memberType, kind, ...)"]
        SQ["stripQuotes(text)"]
        LPS["lastPathSegment(path, separator?)"]
    end

    subgraph extractors["Language Extractors (consumers)"]
        JS["javascript.ts\nfindParentClass → findParentNode"]
        PY["python.ts\nfindPythonParentClass → findParentNode"]
        JAVA["java.ts\nfindJavaParentClass → findParentNode\nextractEnumConstants → extractBodyMembers\nlastPathSegment (import paths)"]
        CS["csharp.ts\nfindCSharpParentType → findParentNode\nextractCSharpEnumMembers → extractBodyMembers\nlastPathSegment (using directives)"]
        RUBY["ruby.ts\nfindRubyParentClass → findParentNode\nstripQuotes + lastPathSegment (require)"]
        RUST["rust.ts\nfindCurrentImpl → findParentNode\nextractStructFields → extractBodyMembers\nextractEnumVariants → extractBodyMembers\nlastPathSegment (use paths)"]
        PHP["php.ts\nextractPhpEnumCases → extractBodyMembers\nlastPathSegment (namespace use)"]
        GO["go.ts\nstripQuotes + lastPathSegment (imports)"]
        HCL["hcl.ts\nstripQuotes (block names, module source)"]
    end

    FPN --> JS
    FPN --> PY
    FPN --> JAVA
    FPN --> CS
    FPN --> RUBY
    FPN --> RUST
    EBM --> JAVA
    EBM --> CS
    EBM --> RUST
    EBM --> PHP
    SQ --> RUBY
    SQ --> GO
    SQ --> HCL
    LPS --> JAVA
    LPS --> CS
    LPS --> RUBY
    LPS --> RUST
    LPS --> PHP
    LPS --> GO
Loading

Reviews (2): Last reviewed commit: "fix: address Greptile review comments (#..." | Re-trigger Greptile

Comment on lines 229 to 230
if (param.type === 'self_parameter') {
// Skip self parameters — matches native engine behaviour
continue;
} else if (param.type === 'parameter') {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Empty if block obscures intent

The comment // Skip self parameters — matches native engine behaviour and continue; were removed together, leaving an empty if branch. While this is functionally correct (nothing else follows in the loop body, so the else if is simply skipped), the empty block is a code smell and a future reader has no signal that skipping self_parameter is deliberate. Consider either keeping a comment or inverting to a guard clause:

Suggested change
if (param.type === 'self_parameter') {
// Skip self parameters — matches native engine behaviour
continue;
} else if (param.type === 'parameter') {
if (param.type === 'self_parameter') {
// Skip self — matches native engine behaviour
} else if (param.type === 'parameter') {

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — restored the explanatory comment in the empty self_parameter if-block: // Skip self — matches native engine behaviour.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — restored the explanatory comment in the empty self_parameter if-block: // Skip self — matches native engine behaviour.

Comment on lines +136 to +138
export function stripQuotes(text: string): string {
return text.replace(/^['"`]|['"`]$/g, '');
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 stripQuotes semantics differ subtly from replaced HCL regex

The HCL extractor previously used .replace(/"/g, '') — a global replace that removes every double-quote character in the text. stripQuotes uses anchored replacements (^ / $) that strip only the leading and trailing character. For well-formed Tree-sitter string literal nodes (always delimited with a single pair of outer quotes) the result is identical, so this is not a bug in practice.

Worth noting as a potential surprise if someone later passes a node whose .text value contains interior quote characters. A short inline comment on the "anchored, not global" choice would clarify the intentional scope:

Suggested change
export function stripQuotes(text: string): string {
return text.replace(/^['"`]|['"`]$/g, '');
}
export function stripQuotes(text: string): string {
// Strips only the leading/trailing delimiter; interior quotes are untouched.
return text.replace(/^['"`]|['"`]$/g, '');
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added clarifying JSDoc line: Strips only the leading/trailing delimiter; interior quotes are untouched.

Add clarifying comment to empty self_parameter if-block in rust.ts
and document anchored-vs-global semantics in stripQuotes JSDoc.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 0a078dc into main Mar 30, 2026
12 checks passed
@carlos-alm carlos-alm deleted the refactor/parser-abstraction-layer branch March 30, 2026 10:02
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant