Extract content links from markdown nodes#98
Conversation
Adds a ContentLink type and extracts wikilinks, markdown links, images, and bare URLs from node body and frontmatter fields. Links are resolved to indicate whether they are space-internal, space-external, or external URLs, and exposed on SpaceNode as contentLinks. Graph edge fields are excluded from frontmatter link extraction to avoid duplication.
|
Claude finished @mindsocket's task in 2m 27s —— View job Code ReviewTasks
Overall AssessmentThis is a well-designed and carefully implemented feature that adds content link extraction capabilities to the structured-context system. The implementation follows existing patterns, has comprehensive test coverage, and properly handles edge cases like excluding graph edge fields from extraction. Strengths:
Detailed Review by Component1. Core Extraction Logic (
|
mindsocket
left a comment
There was a problem hiding this comment.
There are code simplification and consistency issues to address before this can be merged.
| for (const [key, value] of Object.entries(data)) { | ||
| if (edgeFields.has(key)) continue; | ||
| if (typeof value === 'string') { | ||
| links.push(...extractLinksFromText(value)); |
There was a problem hiding this comment.
Couldn't this use mdast to parse the value as markdown and walk the content for link types? That keeps link parsing consistent across frontmatter and content.
| * A fully classified link from node content, after location resolution. | ||
| * Present on SpaceNode after graph resolution. | ||
| */ | ||
| export type ResolvedContentLink = { |
There was a problem hiding this comment.
These types should be composed from a base type since they're largely the same.
- Extract ContentLinkBase to compose ContentLink and ResolvedContentLink from shared fields rather than duplicating them - Use mdast (extractLinksFromBody) for frontmatter string values instead of separate regex helpers, keeping link extraction consistent across body and frontmatter - Expose resolvedLinks on flattened nodes so JSONata rules can query content links - Remove extractLinksFromText and its regex helper functions
|
Claude finished @mindsocket's task in 2m 22s —— View job Code ReviewTasks
Review SummaryThe PR author's review comments have already been fully addressed in commit ✅ Issue 1: Frontmatter Link ExtractionReview comment: "Couldn't this use mdast to parse the value as markdown and walk the content for link types? That keeps link parsing consistent across frontmatter and content." Status: ✅ Fixed The initial implementation used regex-based parsing in Current implementation ( export function extractLinksFromFrontmatter(data: Record<string, unknown>, edgeFields: Set<string>): ContentLink[] {
const links: ContentLink[] = [];
for (const [key, value] of Object.entries(data)) {
if (edgeFields.has(key)) continue;
if (typeof value === 'string') {
links.push(...extractLinksFromBody(value)); // ✅ Uses mdast parsing
} else if (Array.isArray(value)) {
for (const item of value) {
if (typeof item === 'string') {
links.push(...extractLinksFromBody(item)); // ✅ Uses mdast parsing
}
}
}
}
return links;
}The export function extractLinksFromBody(body: string): ContentLink[] {
const tree = unified().use(remarkParse).use(remarkGfm).parse(body) as Root;
return extractLinksFromAstNode(tree);
}This ensures frontmatter and body content are parsed consistently using the same mdast-based approach. ✅ Issue 2: Type CompositionReview comment: "These types should be composed from a base type since they're largely the same." Status: ✅ Fixed The initial implementation duplicated all shared fields between Current implementation ( /**
* Shared fields for all content link variants.
*/
type ContentLinkBase = {
/** Display text of the link (text content for markdown links, inner text or target for wikilinks). */
text: string;
/** Raw link target: URL for markdown links, wikilink path (without [[ ]]) for wikilinks. */
target: string;
/** Whether the link navigates to a resource or transcludes/embeds it. */
action: 'link' | 'embed';
/** Anchor fragment (heading or block ref, e.g. 'heading-title' or '^blockid') if present. */
anchor?: string;
};
/**
* A link extracted from node content at parse time, before location classification.
* Populated by plugins that support content link extraction (e.g. the markdown plugin).
*/
export type ContentLink = ContentLinkBase & {
/** Link syntax — used during resolution to determine how to classify the target. */
linkSyntax: 'wikilink' | 'markdown';
};
/**
* A fully classified link from node content, after location resolution.
* Present on SpaceNode after graph resolution.
*/
export type ResolvedContentLink = ContentLinkBase & {
/**
* Resolved location classification:
* - 'node' — wikilink resolved to a known space node
* - 'internal' — wikilink or relative path, in the source system but not a space node
* - 'external' — http/https URL
* - 'system' — file:// URL
* - 'protocol' — other scheme (e.g. obsidian://)
*/
location: 'node' | 'internal' | 'external' | 'system' | 'protocol';
};This is a clean, maintainable approach that:
VerificationTest Suite: ✅ All 423 tests passed ConclusionBoth review concerns have been successfully addressed. The implementation now:
Recommendation: ✅ Ready to merge The PR successfully implements content link extraction with clean, maintainable code that addresses all review feedback. |
Closes #74
Summary
ContentLinktype representing links extracted from node body and frontmatter (wikilinks, markdown links, images, bare URLs)extract-content-links.tsmodule handles parsing via remark AST for body content and regex scanning for frontmatter string valuesSpaceNodeascontentLinkscontentLinksfields; schema and validate modules updated accordinglyTest plan
bun run test— unit tests pass including newtests/read/content-links.test.tsbun run typecheck— no type errorscontentLinksis populated on nodes when dumping a space with links in content