From 38206141876e4d7041bcc279678292d0bde049d8 Mon Sep 17 00:00:00 2001
From: Dan Lynch <pyramation@gmail.com>
Date: Thu, 30 Apr 2026 02:20:57 +0000
Subject: [PATCH] =?UTF-8?q?docs:=20update=20README=20for=20documents=20tab?=
 =?UTF-8?q?le=20(91=20=E2=86=92=2095=20tables)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add Documents / CMS row to 'What an agent actually needs' table
- Add documents to architecture diagram
- Add documents to Memory section (junctions, chunked search)
- Add Documents / CMS entry to World model section
- Update junction count (~25 → ~27)
- Update table counts: 91 → 95 everywhere
- Add sample query for documents
- Mention documents in chunked retrieval and auto-embed sections
---
 README.md | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 874faaec2a..fa0b8b3be3 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@ Once deployed, you can ask your agent questions in plain English and it translat
 - *"Find memories from conferences near San Francisco last spring."*
 - *"Remember who I met with at the partner summit last month?"*
 - *"Show me notes where I wrote about RAG architecture."*
+- *"Find all documents from the biz-docs repo tagged 'pitch'."*
 - *"Who have I met with more than three times this quarter?"*
 - *"What's the latest status on deals tagged `enterprise`?"*
 
@@ -39,6 +40,7 @@ Conversations, messages, tool calls, long-term memories, rules, skills, prompts,
 | Need for an agent | What agentic-db ships |
 |---|---|
 | Long-term memory | `memories` + agent-scoped `memories` with vector + BM25 + chunked embeddings |
+| Documents / CMS | `documents` with chunked embeddings + BM25, repo/path/commit tracking for Git-backed content; junction tables to companies and projects |
 | Working memory / conversation state | `conversations` + `messages` + `tool_calls` + `tool_results` |
 | Skills / tools registry | `skills`, `tool_definitions`, `tool_executions`, `prompts` (versioned) |
 | Behavior rules | `rules` with semantic `trigger_concept` matching |
@@ -72,6 +74,8 @@ It's all in one database, with vector + BM25 + full-text + trigram + PostGIS sea
 │                                                         │
 │  memories ── autonomy_records ── notes (chunked)        │
 │                                                         │
+│  documents (chunked) ── company_documents ── project_…   │
+│                                                         │
 │  contacts · companies · events · places · emails · …    │
 │                                                         │
 │  [ vector · BM25 · FTS · trigram · PostGIS, unified ]   │
@@ -92,8 +96,9 @@ It's all in one database, with vector + BM25 + full-text + trigram + PostGIS sea
 - **`autonomy_records`** — self-managed knowledge units the agent writes for itself (goals, notes-to-self, learned facts), with self-referential many-to-many links (`autonomy_record_links`) so the agent builds its own knowledge graph.
 - **`notes`** — long-form knowledge with **chunked embeddings**: a single note gets split into multiple vector rows automatically so retrieval works on long documents.
 - **Cross-domain memory junctions** — `contact_memories`, `company_memories` tie memories to the people/orgs they're about, so the agent can pull "everything I remember about Alice" in one query.
+- **Document junctions** — `company_documents`, `project_documents` link version-controlled documents to CRM entities.
 - **Agent-attributed memories** — every memory can carry an `agent_id` FK so multi-agent setups get isolated or shared memory.
-- **Chunk-aware search** — `contacts_chunks` and `notes_chunks` let the agent retrieve the *relevant paragraph* of a long record, not the whole record.
+- **Chunk-aware search** — `contacts_chunks`, `notes_chunks`, and `documents_chunks` let the agent retrieve the *relevant paragraph* of a long record, not the whole record.
 - **Tags as first-class citizens** — `citext[]` tag columns on every memory-ish table, GIN-indexed, so filtering by `['hackathon','kris-floyd']` is fast.
 
 ### 💬 Chats / Conversations
@@ -141,21 +146,22 @@ It's all in one database, with vector + BM25 + full-text + trigram + PostGIS sea
   - `memory.nearbyMemories` — self-join, "what else happened near this memory?" (1 km default)
 
   All 5 use `st_dwithin` with a per-query `distance` param, so radius is a query-time input not a schema constant. Each renders server-side as an `EXISTS (… ST_DWithin(geo_a, geo_b, $distance) …)` subquery — zero GeoJSON on the wire.
-- **Chunked long-doc retrieval** on contacts and notes.
+- **Chunked long-doc retrieval** on contacts, notes, and documents.
 
 ### 🌍 World model (context the agent needs to actually help you)
 
 - **CRM**: `contacts` (with denormalized primary email/phone/location + normalized `contact_emails` / `contact_phones` / `contact_addresses` children), `companies`, `deals`, `events`, `venues`, `notes`, `interactions`, `touchpoints`, `tags`, image galleries.
 - **Life-OS**: `goals`, `habits`, `activity_logs`, `memories`, `trips`, `places`.
 - **Projects & expenses**: `projects`, `expenses`, with cross-relations to contacts, trips, tasks.
+- **Documents / CMS**: `documents` — version-controlled files with `repo_name`, `file_path`, `commit_hash` for Git sync; `title`, `content`, `metadata` (jsonb), `tags`. Chunked embeddings via `documents_chunks` for long-doc RAG. Junction tables: `company_documents`, `project_documents`.
 - **Email & Calendar**: `email_threads`, `emails`, `email_attachments`, `calendars`, `calendar_events`, `calendar_attendees`, `provider_sync_states` (for Gmail / Google Calendar-style provider sync).
 - **Staging tables** (`raw_contacts`, `raw_contact_emails`, etc.) for messy import pipelines before normalizing into `contacts`.
-- **~25 cross-domain M:N junctions** so your agent can answer "notes about Alice from the partner summit" without schema gymnastics.
+- **~27 cross-domain M:N junctions** so your agent can answer "notes about Alice from the partner summit" or "documents linked to this project" without schema gymnastics.
 
 ### ⚙️ Platform / DX
 
 - **One command to deploy**: `pgpm deploy --createdb --database agentic-db --yes --package agentic-db`.
-- **[`@agentic-db/sdk`](sdk/sdk)** — Prisma-like typed ORM generated from the GraphQL schema (covers all 91 tables).
+- **[`@agentic-db/sdk`](sdk/sdk)** — Prisma-like typed ORM generated from the GraphQL schema (covers all 95 tables).
 - **[`@agentic-db/cli`](sdk/cli)** — CRUD + search + admin commands for every table.
 - **[`@agentic-db/rag`](packages/rag)** — hybrid search, batch embedding, multi-pass Q&A CLI tools.
 - **[`@agentic-db/worker`](packages/worker)** — background embedding worker.
@@ -204,7 +210,7 @@ pgpm init workspace
 cd my-app && pgpm init && cd packages/my-module
 pgpm install agentic-db
 
-# Create the database and deploy all 91 tables + indexes + triggers
+# Create the database and deploy all 95 tables + indexes + triggers
 pgpm deploy --createdb --database agentic-db --yes --package agentic-db
 ```
 
@@ -363,7 +369,7 @@ cd packages/worker
 pnpm run start
 ```
 
-The worker generates embeddings for all tables with `SearchUnified` or `SearchVector` nodes. Contacts and notes also get chunked embeddings for long-document search.
+The worker generates embeddings for all tables with `SearchUnified` or `SearchVector` nodes. Contacts, notes, and documents also get chunked embeddings for long-document search.
 
 ## Testing
 
@@ -386,8 +392,8 @@ This repo ships with [Agent Skills](https://github.com/agent-skills/agent-skills
 | Skill | Description |
 |-------|-------------|
 | `pgpm` | Install and deploy agentic-db using pgpm |
-| `cli-default` | CLI command reference for all 91 tables |
-| `orm-default` | Type-safe ORM client reference for all 91 tables |
+| `cli-default` | CLI command reference for all 95 tables |
+| `orm-default` | Type-safe ORM client reference for all 95 tables |
 | `agent/memories` | Storing and retrieving long-term agent memories |
 | `agent/tasks` | Managing agent task queues |
 | `rag-query` | Multi-collection RAG query patterns |