Skip to content

Commit 5007ca8

Browse files
committed
docs(search): make auto-embedding flow explicit in the Search section
Reading just the Search section, a user might miss that vector search is end-to-end auto-embedded — both the column's embeddings (built when the index was created) and the query embedding (computed at search time) come from the same server-configured provider, with matching metric, model, and dimensions. Spells that out at the top of the `--type vector` bullet, and adds an explicit pointer to raw SQL via `hotdata query` for cases where the user needs a different model than the index, or has no index at all (the SQL reference covers the underlying distance functions and table UDFs).
1 parent b1bc72e commit 5007ca8

2 files changed

Lines changed: 4 additions & 2 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,8 +211,9 @@ hotdata search "<query>" --type bm25 --table <connection.schema.table> --column
211211
hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
212212
```
213213

214-
- **`--type vector`** runs server-side `vector_distance(col, 'query')`. The server resolves the embedding column, model, dimensions, and metric from the index metadata. Name the **source text column** (e.g. `title`), not the auto-generated `_embedding` column. No `OPENAI_API_KEY` required.
214+
- **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side.
215215
- **`--type bm25`** runs `bm25_search(table, col, 'query')` — requires a BM25 index on the column.
216+
- **No vector index, or want to use a different model than the index?** Skip `hotdata search` and use raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<your_vec>]) FROM ...`). The SQL reference covers the available distance functions and table UDFs.
216217
- BM25 results sort by score (descending). Vector results sort by distance (ascending).
217218
- `--select` specifies which columns to return (comma-separated, defaults to all).
218219
- The previous `--model` flag and stdin-piped-vector path are **removed** — both hardcoded `l2_distance` regardless of the index's actual metric, which silently produced wrong rankings on cosine indexes. For client-side embedding or precomputed-vector workflows, use raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<vec>]) ...`).

skills/hotdata/SKILL.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,8 +307,9 @@ hotdata search "<query>" --type bm25 --table <connection.schema.table> --column
307307
# Vector similarity search via server-side auto-embed (requires a vector index on the column)
308308
hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
309309
```
310-
- **`--type vector`** generates `vector_distance(col, 'text')` server-side. The server resolves the embedding column, model, and metric from the index metadata. Name the **source text column** (e.g. `title`), not the auto-generated `_embedding` column. No client-side embedding, no `OPENAI_API_KEY` required.
310+
- **`--type vector`** — pass the query as **plain text** and name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — distance metric, model, and dimensions match automatically. No client-side embedding, no `OPENAI_API_KEY` required. Generated SQL: `vector_distance(col, 'text')`.
311311
- **`--type bm25`** generates `bm25_search(table, col, 'text')` server-side; requires a BM25 index on the column.
312+
- **No vector index on the column, or want a different embedding model?** `hotdata search` won't help — drop down to raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<vec>]) FROM ...`). See the SQL reference for available distance functions and table UDFs.
312313
- BM25 results sort by score (descending). Vector results sort by distance (ascending).
313314
- `--select` specifies which columns to return (comma-separated, defaults to all).
314315
- Default limit is 10.

0 commit comments

Comments
 (0)