Skip to content

feat: add knowledge wiki schema#75

Merged
mindsocket merged 2 commits intomainfrom
feat/knowledge-wiki
Apr 5, 2026
Merged

feat: add knowledge wiki schema#75
mindsocket merged 2 commits intomainfrom
feat/knowledge-wiki

Conversation

@mindsocket
Copy link
Copy Markdown
Owner

Summary

  • Adds schemas/knowledge_wiki.json — a new bundled schema for LLM-maintained knowledge wikis
  • Supports four node types: source_summary, concept, synthesis, personal, plus an index type
  • Flat entity model (no hierarchy, no required parent links) with wikilinks for arbitrary cross-referencing
  • Includes aliases, validation rules, and provenance tracking for source summaries
  • Based on the pattern described in Karpathy's LLM Wiki (2026)

Test plan

  • Validate a knowledge wiki space using the new schema
  • Verify aliases resolve correctly (e.g. articlesource_summary, notepersonal)
  • Confirm rules fire as expected (source-has-origin, synthesis-references-sources, concept-has-summary)
  • Run unit tests: bun run test

Adds a new bundled schema for LLM-maintained knowledge wikis. Supports
source_summary, concept, synthesis, personal, and index node types with
a flat entity model, wikilinks for cross-referencing, aliases, and
validation rules for provenance and coherence.

Based on the pattern described in Karpathy's LLM Wiki (2026).
@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Claude finished @mindsocket's task in 53s —— View job


Code Review

Task Checklist:

  • Read and analyze the new knowledge wiki schema
  • Check for bugs and security issues
  • Review best practices and coding standards
  • Verify alignment with project conventions
  • Provide detailed review feedback

Overview

This PR adds a well-designed schema for LLM-maintained knowledge wikis to the ost-tools project. The schema follows project conventions, implements appropriate validation rules, and provides good flexibility through type aliases.

✅ Strengths

1. Schema Structure & Conventions

  • Properly uses $schema, $id, and $metadata fields following project patterns (see schemas/general.json)
  • Correctly references shared base definitions via $ref: ost-tools://_ost_tools_base#/$defs/baseNodeProps and ostEntityProps
  • Consistent use of oneOf for multi-type schemas matching the established pattern

2. Type Design

  • Five well-defined node types with clear purposes:
    • source_summary: For external source material with provenance tracking
    • concept: For domain knowledge building blocks
    • synthesis: For high-value integrated insights
    • personal: For user's own notes and experiences
    • index: For wiki catalog/navigation

3. Practical Aliases
The aliases mapping (line 7-17) provides excellent flexibility:

"aliases": {
  "study": "source_summary",
  "article": "source_summary",
  "note": "personal",
  "framework": "concept",
  ...
}

This allows users to use natural vocabulary while maintaining schema consistency.

4. Well-Crafted Validation Rules
All three rules use correct JSONata syntax and enforce meaningful constraints:

  • source-has-origin: Ensures source summaries have provenance (url, author, or book)
  • synthesis-references-sources: Requires syntheses to cite sources
  • concept-has-summary: Encourages discoverability via summaries

5. Documentation

  • Comprehensive descriptions explain both structure and philosophy
  • Examples demonstrate realistic usage patterns (lines 81-90, 115-123, 147-155)
  • The description clearly articulates the design intent: "Flat entity model — no hierarchy"

📝 Observations

1. Intentional Design Choices
The schema correctly omits $metadata.hierarchy since it implements a flat entity model with wikilinks rather than a hierarchical tree. This is appropriate for a knowledge wiki where arbitrary cross-referencing is a first-class feature.

2. Flexible source_type Field
Line 69-72: The source_type field is intentionally left as a free-form string rather than an enum to allow evolving vocabulary. The description documents common values, which is good practice.

3. Minimal index Type
Lines 183-202: The index type has only the required type field (plus inherited base properties). This is appropriate for a catalog page that's primarily agent-maintained.

✅ Test Results

All 340 unit tests pass successfully, confirming the schema integrates well with the existing codebase.

🔒 Security & Safety

  • No security concerns identified
  • No injection vulnerabilities
  • additionalProperties: true is appropriate for this flexible wiki system
  • Input validation relies on JSON Schema validation, which is safe

Summary

This is a high-quality addition to the ost-tools project. The schema:

  • Follows established project conventions
  • Implements appropriate validation
  • Provides thoughtful flexibility through aliases
  • Has clear documentation and examples
  • Introduces no bugs or security issues

Recommendation: ✅ Approve

The schema is production-ready and well-integrated with the project's architecture and philosophy.

Copy link
Copy Markdown
Owner Author

@mindsocket mindsocket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements needed to naming, flexibility and specific properties.

Comment thread schemas/knowledge_wiki.json Outdated
"description": "Lightweight schema for LLM-maintained knowledge wikis. Supports source summaries, concept pages, syntheses, and personal notes. Designed for compounding knowledge bases where the LLM creates and maintains wiki pages from raw sources.\n\nFlat entity model — no hierarchy, no required parent links. Wikilinks between any pages are valid for arbitrary cross-referencing. The schema validates structure and provenance; it does not constrain meaning.\n\nBased on the pattern described in Karpathy's LLM Wiki (2026).",
"$metadata": {
"aliases": {
"study": "source_summary",
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"source" should be the canonical type name. source_summary can be an alias.

Comment thread schemas/knowledge_wiki.json Outdated
"article": "source_summary",
"paper": "source_summary",
"research": "source_summary",
"note": "personal",
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"note" should be the canonical. "personal" should not be an alias.

Comment thread schemas/knowledge_wiki.json Outdated
"framework": "concept",
"model": "concept",
"chat_transcript": "source_summary",
"chat-transcript": "source_summary"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"framework" and "model" are not direct synonyms of concept as they imply too much, and as such shouldn't be aliases.

If they were entity types they'd be explicit and distinct

Comment thread schemas/knowledge_wiki.json Outdated
"category": "validation",
"description": "Source summaries should reference their origin via url, author, or book field",
"type": "source_summary",
"check": "current.url != null or current.author != null or current.book != null"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a rule. Instead, url should be required in the schema def. Assumption: any source of any kind has some kind of url as a reference.

Comment thread schemas/knowledge_wiki.json Outdated
{ "$ref": "ost-tools://_ost_tools_base#/$defs/ostEntityProps" }
],
"properties": {
"type": { "const": "source_summary" },
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename: source

Comment thread schemas/knowledge_wiki.json Outdated
},
"sources": {
"type": "array",
"items": { "type": "string" },
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the wikilink def

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and add a relationship metadata entry

Comment thread schemas/knowledge_wiki.json Outdated
"items": { "type": "string" },
"description": "Wikilinks to source_summary or concept pages that this synthesis draws from"
},
"related": {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too loose. Synthesis should related sources via sources property and concepts via a concepts property ... instead of a general "related".

Page content is still free to link elsewhere.

Comment thread schemas/knowledge_wiki.json Outdated
{ "$ref": "ost-tools://_ost_tools_base#/$defs/ostEntityProps" }
],
"properties": {
"type": { "const": "personal" },
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to "note".

Comment thread schemas/knowledge_wiki.json Outdated
"journal": "personal",
"framework": "concept",
"model": "concept",
"chat_transcript": "source_summary",
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove chat_transcript and chat-transcript as aliases. If anything they're an alias for a note, but don't need to be explicitly defined as an alias - they an just be notes.

{ "$ref": "ost-tools://_ost_tools_base#/$defs/baseNodeProps" },
{ "$ref": "ost-tools://_ost_tools_base#/$defs/ostEntityProps" }
],
"properties": {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an optional property that is a wikilink to a local copy of the source.

- Rename source_summary → source (canonical); source_summary becomes alias
- Rename personal → note (canonical); remove personal alias
- Remove framework, model, chat_transcript aliases
- Remove source-has-origin rule; make url required on source type
- Use wikilink $ref for all wikilink array items
- Add local_copy property (wikilink) to source type
- Split synthesis related → sources + concepts properties
- Add relationship metadata entries for sources/concepts fields
- Replace examples with generic design/UX domain content
@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Claude finished @mindsocket's task in 39s —— View job


Code Review: knowledge_wiki.json


Overview

I've analyzed the current state of schemas/knowledge_wiki.json against your review comments. Excellent news — all of the requested improvements appear to have already been implemented in the current version! Here's my verification of each item:


✅ All Review Comments Addressed

1. "source" as canonical type name (line 64)

  • ✅ Canonical type is "source" (line 64)
  • "source_summary": "source" correctly maps the alias to the canonical (line 8)

2. "note" as canonical type, "personal" removed (line 182)

  • ✅ Canonical type is "note" (line 182)
  • ✅ No "personal" alias exists in the aliases mapping

3. "framework" and "model" aliases removed (lines 7-14)

  • ✅ Neither "framework" nor "model" appear as aliases for "concept"
  • ✅ Aliases are conservative and semantically appropriate

4. URL required in schema definition (line 95)

  • "required": ["type", "url"] - URL is now required in the schema (line 95)
  • ✅ The "source-has-origin" rule has been removed (correctly, since URL requirement handles this)

5-6. Wikilink def used for local_copy (lines 69-72)

  • local_copy property uses $ref: "ost-tools://_ost_tools_base#/$defs/wikilink" (line 70)
  • ✅ Clear description explains its purpose

7. Relationship metadata entries added (lines 15-36)

  • ✅ Three relationship definitions properly configured:
    • source → source (via sources field on child)
    • source → synthesis (via sources field on child)
    • concept → synthesis (via concepts field on child)

8. Synthesis has separate sources and concepts properties (lines 150-159)

  • ✅ No general "related" property
  • ✅ Distinct sources array for source wikilinks (line 150-153)
  • ✅ Distinct concepts array for concept wikilinks (line 155-159)
  • ✅ This is much better than a generic "related" field

9. "note" renamed from "personal" (line 182)

  • ✅ Type is "note" with description "Personal notes, journal entries..."

10. Chat transcript aliases removed (lines 7-14)

  • ✅ No "chat_transcript" or "chat-transcript" aliases exist
  • ✅ These would naturally just be notes, no need for explicit aliasing

11. local_copy property added (lines 69-72)

  • ✅ Optional local_copy property using wikilink def
  • ✅ Appropriate description: "Wikilink to a local copy of the source"

Additional Observations

Strengths of the Current Implementation:

  1. Clean separation of concerns — Each type has a clear purpose with appropriate properties
  2. Provenance tracking — Source type requires URL and has optional author, book, published_date
  3. Relationship metadata — Well-defined relationships enable better tooling support
  4. Semantic aliases — Aliases (study, article, paper, research, journal) are natural and intuitive
  5. Validation rules — Two focused rules remain:
    • synthesis-references-sources: Ensures syntheses cite sources
    • concept-has-summary: Encourages concept discoverability

Schema Philosophy:
The flat entity model with wikilinks (no hierarchy) is well-suited for LLM-maintained knowledge bases. The schema validates structure while allowing arbitrary cross-referencing — exactly the right balance.


Recommendations

All requested improvements have been implemented. The schema is production-ready.

Optional enhancements (not required):

  • Consider adding accessed_date to source type for web sources that may change
  • Consider adding read_status or last_read_date for tracking reading progress
  • These can always be added later via additionalProperties: true if needed

Summary

All review comments have been addressed
Schema follows project conventions
Validation is appropriate and not overly restrictive
Relationship metadata enables better tooling
Aliases are semantic and useful

The knowledge wiki schema is well-designed and ready for use. Nice work on the iterative improvements!

@mindsocket mindsocket marked this pull request as ready for review April 5, 2026 11:27
@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Claude finished @mindsocket's task in 50s —— View job


Code Review: knowledge_wiki.json

Task Checklist:

  • Read and analyze the knowledge wiki schema
  • Verify all review comments have been addressed
  • Check for bugs and security issues
  • Review best practices and coding standards
  • Verify alignment with project conventions
  • Provide detailed review feedback

✅ All Review Comments Addressed

I've verified that all 11 review comments from the previous review have been correctly implemented:

  1. "source" as canonical type name (line 64) — source_summary is now an alias mapping to source
  2. "note" as canonical type (line 182) — No "personal" alias exists
  3. "framework" and "model" aliases removed — These over-specific aliases have been removed
  4. URL required in schema (line 95) — "required": ["type", "url"] ensures provenance
  5. local_copy uses wikilink def (line 70) — Correctly references ost-tools://_ost_tools_base#/$defs/wikilink
  6. Relationship metadata added (lines 15-36) — Three relationships defined:
    • source → source (via sources field on child)
    • source → synthesis (via sources field on child)
    • concept → synthesis (via concepts field on child)
  7. Synthesis has separate sources/concepts (lines 150-159) — Distinct arrays instead of generic "related"
  8. "note" renamed from "personal" — Type is now note with clear description
  9. Chat transcript aliases removed — No chat_transcript or chat-transcript aliases
  10. local_copy property added (lines 69-72) — Optional wikilink to local copy
  11. source-has-origin rule removed — URL requirement handles this validation

✅ Schema Quality Assessment

Structure & Conventions:

  • Correctly uses $schema, $id, and $metadata fields following project patterns
  • Properly references base definitions via $ref: ost-tools://_ost_tools_base#/$defs/baseNodeProps and ostEntityProps
  • Uses oneOf pattern for multi-type schemas matching established patterns
  • Appropriately omits $metadata.hierarchy since this is a flat entity model with wikilinks

Type Design:

  • Five well-defined node types with clear purposes:
    • source: External source material with required URL and optional provenance tracking
    • concept: Domain knowledge building blocks (summary recommended by rule)
    • synthesis: High-value integrated insights (must reference sources by rule)
    • note: Personal notes and journal entries
    • index: Wiki catalog/navigation page

Validation Rules:
Two focused rules remain (lines 38-53):

  1. synthesis-references-sources: Ensures syntheses cite at least one source
  2. concept-has-summary: Encourages concept discoverability via summaries

Both use correct JSONata syntax and enforce meaningful constraints.

Aliases:
The aliases mapping (lines 7-13) is now semantic and appropriate:

  • study, article, paper, research, journalsource
  • journalnote (allows for "journal entries")

✅ Test Results

All 340 unit tests pass successfully, confirming the schema integrates well with the existing codebase.


📝 Observations

Intentional Design Choices:

  • The flat entity model (no hierarchy) is appropriate for a knowledge wiki where arbitrary cross-referencing via wikilinks is a first-class feature
  • source_type field (line 85-88) is intentionally free-form to allow evolving vocabulary
  • additionalProperties: true throughout maintains flexibility for LLM-maintained content

Relationship Metadata:
The three relationship definitions (lines 15-36) enable better tooling support by explicitly declaring how entity types relate:

  • Parent-child relationships for sources and concepts flowing into syntheses
  • Source-to-source relationships for reference chaining

🔒 Security & Safety

  • No security concerns identified
  • No injection vulnerabilities
  • Input validation relies on JSON Schema validation (safe)
  • URL requirement ensures source provenance

Summary

All review comments addressed
Schema follows project conventions
No bugs or security issues
All 340 unit tests pass
Relationship metadata enables better tooling
Aliases are semantic and appropriate

Recommendation: ✅ Approve

The knowledge wiki schema is production-ready. The iterative improvements have addressed all feedback while maintaining the schema's core philosophy of flexible, LLM-maintained knowledge bases with validated structure and provenance.


@mindsocket mindsocket merged commit 6c509a8 into main Apr 5, 2026
3 checks passed
@mindsocket mindsocket deleted the feat/knowledge-wiki branch April 5, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant