Merge pull request #52 from hotdata-dev/feat/context-cli

eddietejeda · web-flow · commit d50982fa21f8 · 2026-04-22T18:30:38.000-07:00
feat: workspace context CLI and API-first data model docs
diff --git a/README.md b/README.md
@@ -65,6 +65,7 @@ API key priority (lowest to highest): config file → `HOTDATA_API_KEY` env var
 | `connections` | `list`, `create`, `refresh`, `new` | Manage connections |
 | `tables` | `list` | List tables and columns |
 | `datasets` | `list`, `create` | Manage uploaded datasets |
+| `context` | `list`, `show`, `pull`, `push` | Workspace Markdown context (e.g. data model `DATAMODEL`) via the context API |
 | `query` | | Execute a SQL query |
 | `queries` | `list` | Inspect query run history |
 | `search` | | Full-text search across a table column |
@@ -147,6 +148,22 @@ hotdata datasets create --url "https://example.com/data.parquet" --label "My Dat
 - Format is auto-detected from file extension or content.
 - Piped stdin is supported: `cat data.csv | hotdata datasets create --label "My Dataset"`
 
+## Workspace context
+
+Named Markdown documents for a workspace (data model, glossary, etc.) are stored in the **context API**. The CLI treats the server as the **source of truth**; local files are only used where the tool requires a path on disk.
+
+```sh
+hotdata context list [-w <id>] [--prefix <stem>] [-o table|json|yaml]
+hotdata context show <name> [-w <id>]
+hotdata context pull <name> [-w <id>] [--force] [--dry-run]
+hotdata context push <name> [-w <id>] [--dry-run]
+```
+
+- **`show`** prints Markdown to stdout (no local file needed). Use this to read the workspace data model in scripts or agents.
+- **`pull`** writes `./<name>.md` in the **current directory** from the API. Refuses to overwrite an existing file unless `--force`.
+- **`push`** reads `./<name>.md` and upserts that name in the workspace. Use after editing the file in your project directory.
+- Names follow SQL identifier rules (ASCII letters, digits, underscore; max 128 characters; SQL reserved words are not allowed). The usual stem for the semantic data model is **`DATAMODEL`** (file **`DATAMODEL.md`** for push/pull only).
+
 ## Query
 
 ```sh
diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: hotdata
-description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", or asks you to use the hotdata CLI.
+description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, manage workspace context and the data model via the context API (`hotdata context`), or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", "workspace context", "pull context", "push context", "data model", or asks you to use the hotdata CLI.
 version: 0.1.11
 ---
 
@@ -29,19 +29,33 @@ API URL defaults to `https://api.hotdata.dev/v1` or overridden via `HOTDATA_API_
 
 All commands that accept `--workspace-id` are optional. If omitted, the active workspace is used. Use `hotdata workspaces set` to switch the active workspace interactively, or pass a workspace ID directly: `hotdata workspaces set <workspace_id>`. The active workspace is shown with a `*` marker in `hotdata workspaces list`. **Omit `--workspace-id` unless you need to target a specific workspace.**
 
+## Workspace context (API)
+
+The workspace stores **named Markdown documents** only through the Hotdata **context API** (`/v1/context`). The **authoritative** copy always lives on the server under a **name** (stem) such as `DATAMODEL` or `GLOSSARY`.
+
+The CLI command **`hotdata context push`** reads **`./<NAME>.md`** and **`pull`** writes that file in the **current working directory**—those files exist only as a **transport surface** for the API, not as a second source of truth. **`hotdata context show <name>`** prints Markdown to stdout so agents can read the model **without** any local file. Context names follow SQL table–identifier rules (ASCII letters, digits, underscore; no dot in the API name; max 128 characters; SQL reserved words are not allowed).
+
+**Agents (Claude and similar): treat workspace context as the only store for the data model and shared narrative docs.**
+
+1. **Before** planning queries, explaining schema, or modeling, load the workspace: `hotdata context show DATAMODEL` (and `hotdata context list` for other stems such as `GLOSSARY`). Handle a missing context by starting from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and pushing when ready.
+2. **After** you change the model, persist it with **`hotdata context push DATAMODEL`**. The CLI requires a local `./DATAMODEL.md` for that step: write the body there (from `context show`, the template, or your edits), then run `push` from the project directory.
+3. Use **`hotdata context pull DATAMODEL`** when you intentionally want a local `./DATAMODEL.md` copy (for example a human editor); it still reflects API state, not a parallel document.
+
+The standard stem for the workspace semantic model is **`DATAMODEL`**. Add other stems the same way (e.g. **`GLOSSARY`**) for glossary or runbooks.
+
+Use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [references/MODEL_BUILD.md](references/MODEL_BUILD.md) for **what to write inside** the Markdown you store in context. Never put workspace-specific model text inside agent skill install paths—only in **workspace context** (and transient `./<NAME>.md` for push/pull when needed).
+
 ## Multi-step workflows (Model, History, Chain, Indexes)
 
 These are **patterns** built from the commands below—not separate CLI subcommands:
 
-- **Model** — Markdown semantic map of your workspace (entities, keys, joins). Refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md).
+- **Model** — Markdown semantic map of your workspace (entities, keys, joins). **Store and read it via workspace context** (`hotdata context show DATAMODEL`, `context push DATAMODEL`); refresh content using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md).
 - **History** — Inspect prior activity via `hotdata queries list` (query runs) and `hotdata results list` / `results <id>` (row data).
 - **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets.main.<table>`.
 - **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes).
 
 Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md).
 
-**Project-owned files:** Put `DATA_MODEL.md` or `data_model.md` (e.g. under `docs/`) in the **directory where you run `hotdata`**—your repo or project—not under `~/.claude/skills/` or other agent skill paths. Copy the template from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) to start; use [references/MODEL_BUILD.md](references/MODEL_BUILD.md) when you need the full procedure.
-
 ## Available Commands
 
 ### List Workspaces
@@ -183,6 +197,24 @@ hotdata query "SELECT * FROM datasets.main.my_dataset LIMIT 10"
 ```
 Use `hotdata datasets <dataset_id>` to look up the `table_name` before writing queries.
 
+### Workspace context (named Markdown)
+
+Reads and writes workspace **context API** documents. **`show`** needs no local file; **`push`** / **`pull`** use **`./<NAME>.md`** in the current directory only as the CLI transport format. See [Workspace context (API)](#workspace-context-api).
+
+```
+hotdata context list [-w <workspace_id>] [--prefix <stem>] [-o table|json|yaml]
+hotdata context show <name> [-w <workspace_id>]
+hotdata context pull <name> [-w <workspace_id>] [--force] [--dry-run]
+hotdata context push <name> [-w <workspace_id>] [--dry-run]
+```
+
+- `list` — names, `updated_at`, and character counts for each stored context. Use `--prefix` to narrow names (case-sensitive).
+- `show` — print the Markdown body to **stdout** (use this when there is **no** local `./<NAME>.md`; ideal for agents).
+- `pull` — download context `name` to `./<NAME>.md`. Refuses to overwrite an existing file unless `--force`. `--dry-run` prints target path and size only.
+- `push` — upload `./<NAME>.md` to upsert context `name` on the server. `--dry-run` prints size only. Body size must stay within the API limit (order of 512k characters).
+
+**Convention:** `DATAMODEL` is the primary workspace data model; `GLOSSARY` (or other stems) for additional narrative context. Same identifier rules as SQL table names.
+
 ### Execute SQL Query
 ```
 hotdata query "<sql>" [-w <workspace_id>] [--connection <connection_id>] [-o table|json|csv]
@@ -330,12 +362,14 @@ Use a sandbox to explore tables and iteratively build a model description in the
    - check how line_items joins to deals
    - confirm revenue column semantics"
    ```
-5. Continue exploring and update the markdown as the model takes shape. The markdown is the living artifact — when the sandbox ends, its content captures what was learned.
+5. Continue exploring and update the markdown as the model takes shape. The sandbox markdown is the living artifact for **that sandbox**.
+6. When the model should **outlive the sandbox** or be shared with the whole workspace, promote it to workspace context: save the consolidated Markdown as `./DATAMODEL.md` in the project directory and run `hotdata context push DATAMODEL` (or merge with `hotdata context show DATAMODEL` first, then push).
 
 Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions <bash|zsh|fish>`.
 
 ## Workflow: Running a Query
 
+0. (Recommended for agents) Load the workspace data model when available: run `hotdata context show DATAMODEL`. If the command errors because no context exists yet, proceed without a stored model.
 1. List connections:
    ```
    hotdata connections list
diff --git a/skills/hotdata/references/DATA_MODEL.template.md b/skills/hotdata/references/DATA_MODEL.template.md
@@ -1,6 +1,6 @@
 # Data model — `<project name>`
 
-> Copy this file to your **project** directory (e.g. `./DATA_MODEL.md`, `./data_model.md`, or `./docs/DATA_MODEL.md`).  
+> **Storage:** This Markdown structure is kept in **workspace context** under the name **`DATAMODEL`**. Use `hotdata context show DATAMODEL` to read it; maintain `./DATAMODEL.md` in your **project directory** (where you run `hotdata`) only when editing, then `hotdata context push DATAMODEL`. Do not use `docs/DATA_MODEL.md` or other repo paths as the source of truth.  
 > Do not commit workspace-specific content into agent skill folders.  
 > For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s.
 
diff --git a/skills/hotdata/references/MODEL_BUILD.md b/skills/hotdata/references/MODEL_BUILD.md
@@ -1,8 +1,8 @@
 # Building a workspace data model (advanced)
 
-Optional **deep pass** for a single authoritative markdown model. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md).
+Optional **deep pass** for a single authoritative markdown model stored in **workspace context**. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md).
 
-**Output:** Save as `DATA_MODEL.md`, `data_model.md`, or `docs/DATA_MODEL.md` in the **project directory** where you run `hotdata` (not inside agent skill folders).
+**Output:** The live document is **`DATAMODEL`** in the context API. Maintain it with `hotdata context show DATAMODEL`, edit `./DATAMODEL.md` in the **project directory** where you run `hotdata`, then **`hotdata context push DATAMODEL`**. Do not use `docs/`, `DATA_MODEL.md`, or other repo-only paths as the system of record. Never store workspace-specific model text inside agent skill folders.
 
 ---
 
@@ -95,7 +95,7 @@ When suggesting a new index, use the same connection/schema/table/column names a
 
 ## 6. Document structure
 
-Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed:
+This Markdown body is what you store under **`DATAMODEL`** (`hotdata context push DATAMODEL`). Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed:
 
 - **Overview** — Domains and what the workspace is for.
 - **Per connection** — Optional subsection per source; for **deep** models, **repeat** one block per `connection.schema.table` (grain, column table with name/type/nullable/PK-FK/notes, relationships, queryability, caveats)—the template’s single `####` heading is a pattern to copy for each table.
diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md
@@ -2,14 +2,14 @@
 
 Procedures for **Model**, **History**, **Chain**, and **Indexes**. These compose existing `hotdata` commands; they are not separate subcommands.
 
-## Where files live
+## Where things live
 
 | Concept | Location |
 |--------|----------|
-| **Model** | Your **project** root or `docs/` (e.g. `DATA_MODEL.md` / `data_model.md`). Never store workspace-specific model text inside agent skill directories. |
+| **Model** | **Workspace context API** — stem **`DATAMODEL`** (`hotdata context show DATAMODEL`, `context push` / `pull` with `./DATAMODEL.md` in the project cwd only as the CLI file surface). Never store workspace-specific model text inside agent skill directories. |
 | **History** | `hotdata queries list` / `queries <query_run_id>` for query runs (execution history); `hotdata results list` / `results <id>` for row data. |
-| **Chain** | Intermediate tables in **`datasets.main.*`**; document stable ones in the Model file under **Derived tables (Chain)**. |
-| **Indexes** | Recommendations and decisions live in Hotdata (`indexes list` / `indexes create`). Optional project log (e.g. `INDEXES.md`) if you track rationale outside the catalog. |
+| **Chain** | Intermediate tables in **`datasets.main.*`**; document stable chains in **workspace context `DATAMODEL`** under **Derived tables (Chain)**. |
+| **Indexes** | Recommendations and live objects in Hotdata (`indexes list` / `indexes create`). Record rationale in **`DATAMODEL`** (e.g. Search & index summary) or a dedicated context stem if you split concerns. |
 
 ---
 
@@ -19,8 +19,9 @@ Procedures for **Model**, **History**, **Chain**, and **Indexes**. These compose
 
 ### Initialize
 
-1. Copy `references/DATA_MODEL.template.md` from this skill bundle to your project as `DATA_MODEL.md` or `docs/DATA_MODEL.md`.
-2. Fill workspace-specific sections as you discover schema.
+1. Use [DATA_MODEL.template.md](DATA_MODEL.template.md) in this skill bundle as the **structure** for what you store in workspace context.
+2. In the **project directory** where you run `hotdata`, create or refresh `./DATAMODEL.md` (from the template, from `hotdata context show DATAMODEL`, or from `hotdata context pull DATAMODEL`), fill workspace-specific sections as you discover schema, then **`hotdata context push DATAMODEL`** so the workspace owns the document.
+3. Agents that skip local files: `hotdata context show DATAMODEL` to read; when updating, write `./DATAMODEL.md` then `hotdata context push DATAMODEL`.
 
 ### Deep model pass (optional)
 
@@ -41,7 +42,7 @@ hotdata datasets list
 hotdata datasets <dataset_id>                # schema detail per dataset
 ```
 
-Use output to update **Connections**, **Tables**, **Columns**, and **Datasets** in the model. Optional: small exploratory queries once names are known:
+Use output to update **Connections**, **Tables**, **Columns**, and **Datasets** in **workspace context `DATAMODEL`** (edit via `./DATAMODEL.md` + `hotdata context push DATAMODEL`, or your editor workflow). Optional: small exploratory queries once names are known:
 
 ```bash
 hotdata query "SELECT * FROM <connection>.<schema>.<table> LIMIT 5"
@@ -107,7 +108,7 @@ Query footers include a `result-id` when applicable—record it for later, or pi
    hotdata query "SELECT * FROM datasets.main.<table_name> WHERE ..."
    ```
 
-**Naming:** Prefer predictable `--table-name` values, e.g. `chain_<topic>_<YYYYMMDD>`, and list long-lived chains in **Model → Derived tables (Chain)**.
+**Naming:** Prefer predictable `--table-name` values, e.g. `chain_<topic>_<YYYYMMDD>`, and list long-lived chains in **DATAMODEL → Derived tables (Chain)** in workspace context.
 
 ---
 
@@ -164,7 +165,7 @@ Large builds: add `--async` and track with **`hotdata jobs list`** / **`hotdata
 
 ### 4. Verify
 
-Re-run representative **`hotdata query`** or **`hotdata search`** workloads. Update **Model → Search & index summary** (if you maintain a data model doc) so future agents know what exists.
+Re-run representative **`hotdata query`** or **`hotdata search`** workloads. Update **DATAMODEL → Search & index summary** in workspace context (`hotdata context push DATAMODEL` after editing `./DATAMODEL.md`) so future agents see what exists.
 
 ### Guardrails
 
diff --git a/src/command.rs b/src/command.rs
@@ -189,6 +189,16 @@ pub enum Commands {
         command: Option<SandboxCommands>,
     },
 
+    /// Sync workspace text context with local Markdown (`./<NAME>.md` in the current directory)
+    Context {
+        /// Workspace ID (defaults to first workspace from login)
+        #[arg(long, short = 'w', global = true)]
+        workspace_id: Option<String>,
+
+        #[command(subcommand)]
+        command: ContextCommands,
+    },
+
     /// Generate shell completions
     Completions {
         /// Shell to generate completions for
@@ -557,6 +567,50 @@ pub enum SandboxCommands {
     },
 }
 
+#[derive(Subcommand)]
+pub enum ContextCommands {
+    /// List named contexts in the workspace
+    List {
+        /// Output format
+        #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
+        output: String,
+
+        /// Only include names starting with this prefix (case-sensitive)
+        #[arg(long)]
+        prefix: Option<String>,
+    },
+
+    /// Print context content to stdout
+    Show {
+        /// Context name (same rules as a SQL table identifier; local file is <NAME>.md)
+        name: String,
+    },
+
+    /// Download context from the workspace to ./<NAME>.md
+    Pull {
+        /// Context name
+        name: String,
+
+        /// Overwrite ./<NAME>.md if it already exists
+        #[arg(long)]
+        force: bool,
+
+        /// Print the target path and size only; do not write a file
+        #[arg(long)]
+        dry_run: bool,
+    },
+
+    /// Upload ./<NAME>.md to the workspace as named context
+    Push {
+        /// Context name
+        name: String,
+
+        /// Print what would be sent; do not POST
+        #[arg(long)]
+        dry_run: bool,
+    },
+}
+
 #[derive(Subcommand)]
 pub enum TablesCommands {
     /// List all tables in a workspace
diff --git a/src/context.rs b/src/context.rs
diff --git a/src/main.rs b/src/main.rs