diff --git a/BACKLOG.md b/BACKLOG.md index aaa40930..66b88c43 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -88,26 +88,57 @@ The surface check shows `Manifest entries: 70` vs `index.js exports: 66`. This m **Files:** `contracts/type-surface.m8.json`, `index.js` +### B-TYPE-3: Establish test-file wildcard ratchet + +`ts-policy-check.js` only scans `src/`, `bin/`, and `scripts/` by design — test files are excluded. This means `@type {any}` usages across test files are unchecked. Options: +- Add a separate ratchet for test files with a higher threshold +- Add per-file caps to prevent individual test files from growing unbounded +- Document the exclusion as intentional and accept it + +**Files:** `scripts/ts-policy-check.js` + --- ## Feature: Content Attachment -### B-FEAT-1: Implement content attachment (`Atom(p)` payloads on nodes) +### B-FEAT-1: ~~Implement content attachment~~ DONE (v11.5.0) + +Shipped in v11.5.0. See `docs/specs/CONTENT_ATTACHMENT.md` and CHANGELOG. + +### B-FEAT-2: Determinism fuzzer for tree construction + +Content blob tree entries are sorted by filename for deterministic tree OIDs. A property-based test should randomize: +- content blob insertion order in `PatchBuilderV2` +- content OID iteration order in `CheckpointService.createV5()` + +and verify the resulting tree OID is identical regardless of insertion order. This would catch any accidental order-dependence in tree construction. + +**Files:** new test in `test/unit/domain/services/` -Full spec in `docs/specs/CONTENT_ATTACHMENT.md`. Proposal to attach content-addressed blobs to graph nodes as first-class payloads, bridging git-warp's flat key-value properties to the paper's `α(v)` vertex attachment model. +### B-FEAT-3: Reconcile Map vs Record asymmetry in getNodeProps/getEdgeProps -**Core idea:** Store blobs in the Git object store (already a CAS), reference them via a `_content` property on nodes. This gets CRDT merge (LWW on the SHA), time-travel (`materialize({ ceiling })`), and observer scoping for free — zero changes to the CRDT model. +`getNodeProps()` returns a `Map` while `getEdgeProps()` returns a plain `Record`. This forces callers to use `.get()` for node props and `[key]` for edge props — an easy source of bugs. Options: +- Both return Map (breaking change for edge prop consumers) +- Both return Record (breaking change for node prop consumers) +- Keep both, document the asymmetry prominently -**Key decisions needed:** +**Files:** `src/domain/warp/query.methods.js`, `src/domain/WarpGraph.js` + +--- + +## Documentation Quality + +### B-DOC-1: Add markdownlint to CI + +Add a markdownlint check to the CI pipeline to catch MD040 (missing code fence language tags) and similar doc issues automatically. Currently these are only caught by CodeRabbit review, which is slow and non-blocking. + +**File:** `.github/workflows/ci.yml` -- **API shape:** Property convention only (zero new API) vs dedicated `patch.attachContent()` / `graph.getContent()` methods vs hybrid. Spec recommends hybrid. -- **Metadata properties:** Whether to store `_content.size`, `_content.mime`, `_content.encoding` at the substrate level or leave to consumers. -- **Dependency:** `git-cas` package or equivalent `git hash-object -w` / `git cat-file blob` via existing plumbing. -- **Edge attachments:** Deferred to v2 unless trivially included. Same mechanism — `_content` property on edges. +### B-DOC-2: Add a code sample linter for markdown files -**Out of scope (future):** Nested WARP attachments (where `α(v)` is itself a full WARP graph, not just an atom), content-level merge, MIME handling, content GC protection. +Syntax-check JS/TS code blocks embedded in markdown files (specs, guides, etc.) to catch issues like duplicate `const` declarations, TDZ errors, and other syntax errors before they reach review. Could use `eslint-plugin-markdown` (runs ESLint natively on fenced blocks) or a custom script that extracts code blocks and pipes them through `eslint --stdin`. -**Spec:** `docs/specs/CONTENT_ATTACHMENT.md` +**Files:** new script or CI step, `docs/**/*.md` --- diff --git a/CHANGELOG.md b/CHANGELOG.md index 3a5519a0..288ad659 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,34 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [11.5.0] — 2026-02-20 — Content Attachment (Paper I `Atom(p)`) + +Implements content attachment — the ability to attach content-addressed blobs +to WARP graph nodes and edges as first-class payloads. A blob OID stored as +a `_content` string property gets CRDT merge (LWW), time-travel, and observer +scoping for free — zero changes to JoinReducer, serialization, or the CRDT layer. + +### Added + +- **`CONTENT_PROPERTY_KEY`** constant (`'_content'`) exported from `KeyCodec` and `index.js`. +- **`PatchBuilderV2.attachContent(nodeId, content)`** — writes blob to Git object store, sets `_content` property, tracks OID for GC anchoring. +- **`PatchBuilderV2.attachEdgeContent(from, to, label, content)`** — same for edges. +- **`PatchSession.attachContent()`** / **`attachEdgeContent()`** — async pass-through delegates. +- **`WarpGraph.getContent(nodeId)`** — returns `Buffer | null` from the content blob. +- **`WarpGraph.getContentOid(nodeId)`** — returns hex OID or null. +- **`WarpGraph.getEdgeContent(from, to, label)`** / **`getEdgeContentOid(from, to, label)`** — edge variants. +- **Blob anchoring** — content blob OIDs embedded in patch commit tree as `_content_` entries (self-documenting, unique by construction). Survives `git gc --prune=now`. +- **Type declarations** — all new methods in `index.d.ts`, `type-surface.m8.json`, `consumer.ts`. +- **Integration tests** — 11 tests covering single-writer, LWW, time-travel, deletion, Writer API, GC durability, binary round-trip. +- **Unit tests** — 23 tests for PatchBuilderV2 content ops and WarpGraph query methods. +- **ADR-001 Folds** — design document for future recursive attachments (structural zoom portals). Deferred; documents the path from `Atom(p)` to full `α(v) → WARP` recursion. + +### Fixed + +- **Checkpoint content anchoring** — `CheckpointService.createV5()` now scans `state.prop` for `_content` values and embeds the referenced blob OIDs in the checkpoint tree as `_content_` entries. This ensures content survives `git gc` even if patch commits are ever pruned. +- **`GitGraphAdapter.readBlob()`** — Now always returns a real Node `Buffer` (wraps `Uint8Array` from plumbing with `Buffer.from()`). Consumers can call `.toString('utf8')` directly. +- **`observedFrontier` staleness (#43)** — `JoinReducer.join()` now folds the patch's own dot (`{writer, lamport}`) into `observedFrontier`. Previously the frontier only reflected patch context VVs (pre-creation state), lagging by one tick per writer. The graph's `_versionVector` — cloned from `observedFrontier` after materialization — now reflects actual Lamport ticks. + ## [11.4.0] — 2026-02-20 — M8 IRONCLAD Phase 3: Declaration Surface Automation Completes M8 IRONCLAD with automated declaration surface validation and expanded diff --git a/README.md b/README.md index c8877e69..227e2569 100644 --- a/README.md +++ b/README.md @@ -328,6 +328,29 @@ const sha = await (await graph.createPatch()) Each `commit()` creates one Git commit containing all the operations, advances the writer's Lamport clock, and updates the writer's ref via compare-and-swap. +### Content Attachment + +Attach content-addressed blobs to nodes and edges as first-class payloads (Paper I `Atom(p)`). Blobs are stored in the Git object store, referenced by SHA, and inherit CRDT merge, time-travel, and observer scoping automatically. + +```javascript +const patch = await graph.createPatch(); +patch.addNode('adr:0007'); // sync — queues a NodeAdd op +await patch.attachContent('adr:0007', '# ADR 0007\n\nDecision text...'); // async — writes blob +await patch.commit(); + +// Read content back +const buffer = await graph.getContent('adr:0007'); // Buffer | null +const oid = await graph.getContentOid('adr:0007'); // hex SHA or null + +// Edge content works the same way (assumes nodes and edge already exist) +const patch2 = await graph.createPatch(); +await patch2.attachEdgeContent('a', 'b', 'rel', 'edge payload'); +await patch2.commit(); +const edgeBuf = await graph.getEdgeContent('a', 'b', 'rel'); +``` + +Content blobs survive `git gc` — their OIDs are embedded in the patch commit tree and checkpoint tree, keeping them reachable. + ### Writer API For repeated writes, the Writer API is more convenient: @@ -508,7 +531,7 @@ npm run test:matrix # All runtimes in parallel ## When NOT to Use It - **High-throughput transactional workloads.** If you need thousands of writes per second with immediate consistency, use Postgres or Redis. -- **Large binary or blob storage.** Data lives in Git commit messages (default cap 1 MB). Use object storage for images or videos. +- **Large binary or blob storage.** Properties live in Git commit messages; content blobs live in the Git object store. Neither is optimized for large media files. Use object storage for images or videos. - **Sub-millisecond read latency.** Materialization has overhead. Use an in-memory database for real-time gaming physics or HFT. - **Simple key-value storage.** If you don't have relationships or need traversals, a graph database is overkill. - **Non-Git environments.** The value proposition depends on Git infrastructure (push/pull, content-addressing). diff --git a/contracts/type-surface.m8.json b/contracts/type-surface.m8.json index 9cdfcc4c..0e2e774d 100644 --- a/contracts/type-surface.m8.json +++ b/contracts/type-surface.m8.json @@ -72,6 +72,34 @@ ], "returns": "Promise | null>" }, + "getContentOid": { + "async": true, + "params": [{ "name": "nodeId", "type": "string" }], + "returns": "Promise" + }, + "getContent": { + "async": true, + "params": [{ "name": "nodeId", "type": "string" }], + "returns": "Promise" + }, + "getEdgeContentOid": { + "async": true, + "params": [ + { "name": "from", "type": "string" }, + { "name": "to", "type": "string" }, + { "name": "label", "type": "string" } + ], + "returns": "Promise" + }, + "getEdgeContent": { + "async": true, + "params": [ + { "name": "from", "type": "string" }, + { "name": "to", "type": "string" }, + { "name": "label", "type": "string" } + ], + "returns": "Promise" + }, "neighbors": { "async": true, "params": [ @@ -310,6 +338,8 @@ "removeEdge": { "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }], "returns": "PatchBuilderV2" }, "setProperty": { "params": [{ "name": "nodeId", "type": "string" }, { "name": "key", "type": "string" }, { "name": "value", "type": "unknown" }], "returns": "PatchBuilderV2" }, "setEdgeProperty": { "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }, { "name": "key", "type": "string" }, { "name": "value", "type": "unknown" }], "returns": "PatchBuilderV2" }, + "attachContent": { "async": true, "params": [{ "name": "nodeId", "type": "string" }, { "name": "content", "type": "Buffer | string" }], "returns": "Promise" }, + "attachEdgeContent": { "async": true, "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }, { "name": "content", "type": "Buffer | string" }], "returns": "Promise" }, "build": { "params": [], "returns": "PatchV2" }, "commit": { "async": true, "params": [], "returns": "Promise" } }, @@ -326,6 +356,8 @@ "removeEdge": { "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }], "returns": "this" }, "setProperty": { "params": [{ "name": "nodeId", "type": "string" }, { "name": "key", "type": "string" }, { "name": "value", "type": "unknown" }], "returns": "this" }, "setEdgeProperty": { "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }, { "name": "key", "type": "string" }, { "name": "value", "type": "unknown" }], "returns": "this" }, + "attachContent": { "async": true, "params": [{ "name": "nodeId", "type": "string" }, { "name": "content", "type": "Buffer | string" }], "returns": "Promise" }, + "attachEdgeContent": { "async": true, "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }, { "name": "content", "type": "Buffer | string" }], "returns": "Promise" }, "build": { "params": [], "returns": "PatchV2" }, "commit": { "async": true, "params": [], "returns": "Promise" } }, @@ -413,6 +445,7 @@ "encodeEdgePropKey": { "kind": "function", "params": [{ "name": "from", "type": "string" }, { "name": "to", "type": "string" }, { "name": "label", "type": "string" }, { "name": "propKey", "type": "string" }], "returns": "string" }, "decodeEdgePropKey": { "kind": "function", "params": [{ "name": "encoded", "type": "string" }], "returns": "{ from: string; to: string; label: string; propKey: string }" }, "isEdgePropKey": { "kind": "function", "params": [{ "name": "key", "type": "string" }], "returns": "boolean" }, + "CONTENT_PROPERTY_KEY": { "kind": "const" }, "computeTranslationCost": { "kind": "function" }, "migrateV4toV5": { "kind": "function" }, diff --git a/docs/ADR-001-Folds.md b/docs/ADR-001-Folds.md new file mode 100644 index 00000000..7871b784 --- /dev/null +++ b/docs/ADR-001-Folds.md @@ -0,0 +1,205 @@ +# ADR-001: Folds — Structural Zoom Portals for Recursive Attachments + +- **Date:** 2026-02-20 +- **Status:** Proposed (Deferred) +- **Owner:** @flyingrobots +- **Decision Type:** Architecture / Data Model / Query Semantics + +## Context + +git-warp currently models a WARP graph skeleton (nodes + edges) with a property system that provides: +- multi-writer convergence (CRDT semantics / LWW for props) +- time-travel via materialization ceilings +- observer scoping / visibility rules + +We also want **attachments** per the AIΩN Foundations Paper I lore: +- `α(v)` attaches a WARP value to each vertex +- `β(e)` attaches a WARP value to each edge +- base case is `Atom(p)` (external payload IDs, bytestrings, etc.) +- recursive case allows attachments to be WARP graphs (fractal structure) + +Today, git-mind (and related consumers) are blocked primarily on a minimal “content attachment” primitive: +- store a git blob OID (CAS key) on a node as a normal property (e.g. `_content`) +- read/write blobs using the existing BlobPort +- rely on existing merge/time-travel semantics with no CRDT changes + +This ADR proposes an optional future mechanism called **Folds** to support **structural recursion** (fractal attachments) as a **view/projection** feature first — without conflating it with network causality. + +### Terminology Clarification + +We have two concepts that must not be overloaded: + +- **Wormhole (Causal):** a causal/sync concept (frontiers, receipts, replication topology) +- **Fold (Structural):** a structural boundary / zoom portal for recursive attachments (pure view) + +This ADR is strictly about **Folds**. + +## Decision + +### For now (Immediate): Keep It Silly Simple + +We will ship v1 content attachment using an Atom/CAS technique: + +- Reserve a system property key: + - `CONTENT_PROPERTY_KEY = "_content"` +- Attach content by storing a blob OID (git-cas key / git blob hash) as the property value. +- Read content by resolving the stored OID via BlobPort. + +This is `Atom(p)` where `p` is a Git object ID — faithful to the Paper I base case. + +### Proposed (Deferred): Add **Folds** for Structural Recursion + +If/when we pursue true recursive WARP attachments, we will implement **Fold boundaries** as a structural convention: + +- A fold is represented by a deterministic “fold root” node ID in the same graph. +- A skeleton entity (node or edge) has an attachment subgraph rooted at that fold root. +- Traversal/render/query operate at configurable “zoom levels”: + - collapsed (ignore fold interiors) + - shallow (peek one fold deep) + - recursive (expand folds up to max depth) + +Folds are **not causal shortcuts**, do not change synchronization, and do not create new op types. + +## Fold Design + +### 1) Representation + +Folds are encoded using existing nodes/edges/props only. + +#### 1.1 Fold Root IDs (deterministic) + +We define deterministic fold root IDs to avoid collisions: + +- **Node fold root:** `fold:node:` +- **Edge fold root:** `fold:edge:|