Skip to content

Comments

refactor(doclinks): unify link handlers and validate all file links#1348

Open
leshy wants to merge 3 commits intodevfrom
ivan/fix/doclinks-md-link-validation
Open

refactor(doclinks): unify link handlers and validate all file links#1348
leshy wants to merge 3 commits intodevfrom
ivan/fix/doclinks-md-link-validation

Conversation

@leshy
Copy link
Contributor

@leshy leshy commented Feb 22, 2026

Problem

Doclinks validates all links and tries to intelligently re-link if stuff is moved, insists on absolute links everywhere, shares more code between .md link resolution and .py etc resolution.

updates existing docs/ links

Doclinks is vibed

we don't care about the code here, just speed & correct behaviour, review by looking at docs/ changes

How to Test

doclinks --dry-run docs/

Contributor License Agreement

  • I have read and approved the CLA.

Add Pattern 3 to doclinks that validates existing .md links in docs:
- Resolves relative .md paths to absolute links
- Validates absolute .md links exist on disk
- Falls back to doc_index search for broken links with disambiguation
- Handles index.md files by searching parent dir name
- Scores candidates by directory overlap + filename match

Delete duplicate docs/usage/transports.md (identical to transports/index.md).

Fixes 5 broken links across docs/ (agents/docs/index.md, capabilities/
navigation/readme.md, usage/lcm.md).
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 22, 2026

Greptile Summary

This PR consolidates 3 separate regex patterns and handlers in doclinks into 2 unified handlers, eliminating ~40 lines of duplicated candidate resolution logic. The refactor extends validation from only .md links to ALL file links (.py, directories, etc.) and adds path-similarity-based disambiguation when multiple candidates match.

Key improvements:

  • Merged doc_pattern + md_link_pattern into single link_pattern + replace_link_match handler
  • Extracted resolve_candidates() helper with pick_best_candidate() scoring by directory overlap
  • Validates ALL file links (not just .md) with appropriate fallback search
  • Shares single git ls-files call between both index builders (eliminates double subprocess call)
  • Expanded extract_other_backticks filter from 1 to 18 file extensions
  • Moved import time to top-level (was incorrectly inside watch loop)
  • Deleted duplicate docs/usage/transports.md (nearly identical to docs/usage/transports/index.md)

The refactor is backward-compatible and all existing behaviors are preserved. The 9 new tests provide comprehensive coverage of the disambiguation logic, link validation, and edge cases.

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The refactor is well-tested (56 tests, +9 new), eliminates code duplication, and is explicitly backward-compatible. The disambiguation logic is sound and the PR description claims all tests pass. Documentation changes are mechanical link fixes.
  • No files require special attention

Important Files Changed

Filename Overview
dimos/utils/docs/doclinks.py Refactored to unify link handlers, add disambiguation logic, validate all file links (not just .md), and share single git ls-files call. Logic is solid and well-tested.
dimos/utils/docs/test_doclinks.py Added 9 new tests covering path similarity scoring, candidate resolution, disambiguation, and comprehensive link validation scenarios. Excellent coverage.
docs/agents/docs/index.md Fixed broken doc links (docs_agent → docs) to match actual directory structure.
docs/platforms/quadruped/go2/index.md Fixed broken relative links to absolute paths, consistent with repository structure.
docs/usage/data_streams/README.md Converted relative links to absolute paths for consistency.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[process_markdown] --> B{Split by ignore regions}
    B --> C[Process each region]
    C --> D{Apply code_pattern}
    D --> E[replace_code_match]
    E --> F{File has extension or /?}
    F -->|No| G[Skip]
    F -->|Yes| H[file_index lookup]
    H --> I[resolve_candidates]
    I --> J{Candidates count}
    J -->|0| K[Error: not found]
    J -->|1| L[Use single match]
    J -->|Multiple| M[pick_best_candidate]
    M --> N{score_path_similarity}
    N -->|Unique max| O[Use best match]
    N -->|Tie| P[Error: ambiguous]
    
    C --> Q{Apply link_pattern}
    Q --> R[replace_link_match]
    R --> S{Link type?}
    S -->|.md placeholder| T[doc_index lookup]
    S -->|Absolute path| U{File exists?}
    S -->|Relative path| V{File exists?}
    S -->|URL/anchor| W[Skip]
    
    T --> I
    U -->|No| X[_search_fallback]
    V -->|No| X
    X --> Y{Has extension?}
    Y -->|.md| Z[doc_index search]
    Y -->|Other| AA[file_index search]
    Z --> I
    AA --> I
Loading

Last reviewed commit: 4a6848b

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant