Skip to content

Remove BYTEA column, add Conversation links to index.md#33

Merged
efenocchi merged 18 commits intomainfrom
remove-bytea-add-conv-column
Apr 10, 2026
Merged

Remove BYTEA column, add Conversation links to index.md#33
efenocchi merged 18 commits intomainfrom
remove-bytea-add-conv-column

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented Apr 10, 2026

Summary

Remove BYTEA content column

All file content is text (markdown, JSONL). The BYTEA content column duplicated the summary TEXT column with unnecessary hex encoding overhead.

  • Remove content: Buffer from WriteRow and PendingRow interfaces
  • Remove hex encoding from all write paths (upsert, flush, appendFile, session-start, wiki-worker)
  • Remove isText(), decodeContent() helpers and binary mime types
  • Prefetch and read paths use SELECT summary only
  • CREATE TABLE no longer includes content BYTEA column

Add Conversation column to virtual index.md

Direct [messages] link to session JSONL files in the index, so Claude can navigate from index to raw conversation data in one step instead of: index → summary → find Source line → construct JSONL path.

Session files are read-only

All mutating VFS operations (write, append, rm, cp-to, mv) on session JSONL paths now throw EPERM. Previously these either silently created ghost rows in the memory table or faked a delete that reverted on next bootstrap. Reads and cp-from session paths still work.

Cleanup

  • Remove stale compiled .js/.d.ts files from src/ that caused vitest to resolve outdated code
  • Fix test import paths

Test plan

  • 386 unit tests passing (vitest)
  • E2E real-table-test.mjs: validates text-only schema against live API (33 assertions)
  • E2E claude -p --plugin-dir: write, read index.md with Conversation column, follow messages link
  • E2E virtual shell: all 6 mutating ops on session paths return EPERM, reads work, normal ops unaffected
  • E2E curl verification: writes/deletes hit actual SQL table, no ghost rows

kaghni and others added 12 commits April 9, 2026 14:21
- Move CC config (hooks, commands, skills, bundle, build) into claude-code/
- Keep shared core in src/ (deeplake-api, config, utils, hooks, shell, commands)
- Add openclaw/ plugin with its own source and build
- Root .claude-plugin/marketplace.json points to ./claude-code
- Plugin renamed: deeplake-hivemind → hivemind
- Both plugins build independently, both bundle shared core
# Conflicts:
#	.claude-plugin/marketplace.json
#	.claude-plugin/plugin.json
#	README.md
#	claude-code/.claude-plugin/plugin.json
#	claude-code/commands/update.md
#	claude-code/skills/deeplake-memory/SKILL.md
#	claude-code/skills/hivemind-memory/SKILL.md
#	openclaw/openclaw.plugin.json
#	openclaw/skills/SKILL.md
#	package.json
#	skills/deeplake-memory/SKILL.md
All file content is text (markdown, JSONL). The BYTEA content column
duplicated the summary column with unnecessary hex encoding overhead.

- Remove content: Buffer from WriteRow and PendingRow interfaces
- Remove hex encoding from upsert, flush, appendFile, session-start,
  wiki-worker
- Remove isText(), decodeContent() helpers and binary mime types
- Prefetch uses SELECT summary instead of SELECT summary, content
- CREATE TABLE no longer includes content BYTEA column
- Prefetch tests: SELECT path, summary (no content column)
- Virtual index tests: expect Conversation column header
- Prefetch mock: return {path, summary} without content
These were generated by tsc into src/ instead of dist/ and caused
vitest to resolve stale code instead of the TypeScript source.
@claude
Copy link
Copy Markdown

claude bot commented Apr 10, 2026

Claude finished @efenocchi's task in 3m 36s —— View job


2 issues flagged inline:

  • appendFile silently discards data on session paths (src/shell/deeplake-fs.ts:470) — session JSONL paths are bootstrapped into this.files (as null) but live in sessionsTable, not this.table. The fast-path UPDATE therefore matches 0 rows and silently drops the append. Reachable via the virtual shell (cat >> sessions/...).

  • No migration to drop legacy content BYTEA column (src/deeplake-api.ts:223) — ensureTable() adds missing columns for existing tables but never removes the old content column, leaving a permanently-empty BYTEA column on upgraded installs.

if (this.files.has(p) || await this.exists(p).catch(() => false)) {
const addHex = Buffer.from(add, "utf-8").toString("hex");
const ts = new Date().toISOString();
await this.client.query(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session-backed paths (sessionPaths) are in sessionsTable, not this.table, so this UPDATE always matches 0 rows and silently discards the appended content — any appendFile call on a sessions JSONL path (e.g. via the virtual shell cat >>) is a no-op.

Add a guard at the top of appendFile to reject or no-op for session paths:

Suggested change
await this.client.query(
const ts = new Date().toISOString();
if (this.sessionPaths.has(p)) throw fsErr("EPERM", "session files are read-only", p);
await this.client.query(

`summary TEXT NOT NULL DEFAULT '', ` +
`author TEXT NOT NULL DEFAULT '', ` +
`mime_type TEXT NOT NULL DEFAULT 'application/octet-stream', ` +
`mime_type TEXT NOT NULL DEFAULT 'text/plain', ` +
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ensureTable migration path (the else branch below) adds missing columns for existing tables but never drops the old content BYTEA column — upgrading users keep a permanently-empty binary column, undermining the storage savings this PR aims for.

Add a best-effort DROP COLUMN alongside the existing ADD COLUMN loop in the migration block:

Suggested change
`mime_type TEXT NOT NULL DEFAULT 'text/plain', ` +
`mime_type TEXT NOT NULL DEFAULT 'text/plain', ` +

(Suggestion is a no-op here — the actual fix belongs in the else branch at the for loop: add try { await this.query(\ALTER TABLE "${tbl}" DROP COLUMN content`); } catch {}` after the loop.)

Session JSONL files live in the sessions table (multi-row), not the
memory table. An appendFile UPDATE against the memory table silently
matched 0 rows. Now throws EPERM early.
Session JSONL files are read-only through the VFS — they live in the
sessions table and are written only by the capture hook.

Previously writeFile, rm, cp (to session dest), and mv on session
paths either silently created ghost rows in the memory table or
faked a delete that reverted on next bootstrap.

Now all mutating operations throw EPERM early. Reads and cp-from
session source still work.
@efenocchi
Copy link
Copy Markdown
Collaborator Author

/claude-review

@claude
Copy link
Copy Markdown

claude bot commented Apr 10, 2026

Claude finished @efenocchi's task in 4m 21s —— View job


Review

  • rm -rf on directory containing session paths silently corrupts in-memory tree (src/shell/deeplake-fs.ts:586) — the EPERM guard only fires for the top-level path; when removing a parent directory (e.g. /sessions), the recursive walk collects child session JSONL paths into toDelete and removeFromTree evicts them from the in-memory index. The actual rows in sessionsTable survive, but the FS becomes blind to them for the rest of the session. Fix: filter this.sessionPaths out of toDelete before the removeFromTree loop and the SQL DELETE.
  • No migration to drop legacy content BYTEA column on existing tables (src/deeplake-api.tsensureTable else branch) — the ADD-COLUMN migration loop adds missing columns but never drops content, so upgraded installs keep a permanently-empty BYTEA column, defeating the storage savings. Add try { await this.query(\ALTER TABLE "${tbl}" DROP COLUMN content`); } catch {}` after the loop.

@efenocchi
Copy link
Copy Markdown
Collaborator Author

@claude review again


async rm(path: string, opts?: RmOptions): Promise<void> {
const p = normPath(path);
if (this.sessionPaths.has(p)) throw fsErr("EPERM", "session files are read-only", p);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm -rf /sessions (or any parent directory) bypasses this guard: the recursive directory walk at lines 598–605 collects session JSONL paths into toDelete and then removeFromTree silently evicts them from the in-memory index, making them invisible for the rest of the session even though the actual rows in sessionsTable are untouched.

Filter session paths out of toDelete before removing from tree and issuing the SQL DELETE:

Suggested change
if (this.sessionPaths.has(p)) throw fsErr("EPERM", "session files are read-only", p);
if (this.sessionPaths.has(p)) throw fsErr("EPERM", "session files are read-only", p);

Then in the for (const fp of toDelete) loop (line ~606), replace with:

const safeToDelete = toDelete.filter(fp => !this.sessionPaths.has(fp));
for (const fp of safeToDelete) this.removeFromTree(fp);

and use safeToDelete in the SQL DELETE … WHERE path IN (…) block instead of toDelete.

Recursive directory delete collected session files into toDelete and
removed them from the in-memory tree, making them invisible until
restart. Now filters session paths out of the delete list.

Verified via E2E: rm -rf /sessions/ leaves all session files intact.
- Restore 0.6.7 version from main (merge took older 0.6.6)
- Restore login.md welcome message from main
- Remove .claude/settings.json (local config, not for repo)
- Add .claude/ to .gitignore
@efenocchi efenocchi merged commit 567f958 into main Apr 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants