Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
89b2484
Pull in code-mode changes from ai-code-mode
jherr Feb 21, 2026
5dbebb7
more refinements
jherr Feb 26, 2026
f1f4baf
Simpler example
jherr Feb 26, 2026
2172f2c
Fewer examples
jherr Feb 26, 2026
798a1bf
Working skills example
jherr Feb 27, 2026
573298a
Home page really coming together
jherr Feb 27, 2026
a1f0859
Home page done
jherr Feb 27, 2026
7859d48
More realistic tools
jherr Feb 27, 2026
2f38fc0
Finished home page
jherr Feb 27, 2026
c87092c
Reworking the demo
jherr Mar 1, 2026
eb24789
final fixups
jherr Mar 8, 2026
708cb57
Merge remote-tracking branch 'origin/main' into code-mode
jherr Mar 10, 2026
bfc9173
last few fixups
jherr Mar 10, 2026
0d172ab
ci: apply automated fixes
autofix-ci[bot] Mar 10, 2026
8a9854d
fix: replace md-to-pdf with puppeteer+marked to fix provenance check
jherr Mar 10, 2026
b73b24f
Merge branch 'code-mode' of github.com:TanStack/ai into code-mode
jherr Mar 11, 2026
920f7a2
ci: apply automated fixes
autofix-ci[bot] Mar 11, 2026
2833b3b
add changeset for code mode packages and revert ai-devtools changes
jherr Mar 11, 2026
5550327
Merge remote-tracking branch 'origin/main' into code-mode
jherr Mar 12, 2026
1b05796
feat: add database tools demo route with in-memory DB
claude Mar 12, 2026
ca69fb5
database demo
jherr Mar 12, 2026
3c4b671
database demo
jherr Mar 12, 2026
6677dd5
feat: add gold-standard judging, export, layout fixes, and nav rename…
jherr Mar 12, 2026
904d81d
ci: apply automated fixes
autofix-ci[bot] Mar 12, 2026
48de0b4
Merge branch 'main' into code-mode
AlemTuzlak Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .changeset/code-mode-packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
'@tanstack/ai': minor
'@tanstack/ai-code-mode': minor
'@tanstack/ai-code-mode-skills': minor
'@tanstack/ai-isolate-cloudflare': minor
'@tanstack/ai-isolate-node': minor
'@tanstack/ai-isolate-quickjs': minor
---

Add code mode and isolate packages for secure AI code execution
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ test-results

STATUS_*.md

.skills
# Only .claude.settings.json should be committed
.claude/settings.local.json
.claude/worktrees/*
.claude/worktrees/*
326 changes: 326 additions & 0 deletions docs/guides/code-mode-with-skills.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
---
title: Code Mode with Skills
id: code-mode-with-skills
order: 20
---

Skills extend [Code Mode](./code-mode.md) with a persistent library of reusable TypeScript snippets. When the LLM writes a useful piece of code β€” say, a function that fetches and ranks NPM packages β€” it can save that code as a _skill_. On future requests, relevant skills are loaded from storage and made available as first-class tools the LLM can call without re-writing the logic.

## Overview

The skills system has two integration paths:

| Approach | Entry point | Skill selection | Best for |
|----------|-------------|----------------|----------|
| **High-level** | `codeModeWithSkills()` | Automatic (LLM-based) | New projects, turnkey setup |
| **Manual** | Individual functions (`skillsToTools`, `createSkillManagementTools`, etc.) | You decide which skills to load | Full control, existing setups |

Both paths share the same storage, trust, and execution primitives β€” they differ only in how skills are selected and assembled.

## How It Works

A request with skills enabled goes through these stages:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Load skill index (metadata only, no code) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2. Select relevant skills (LLM call β€” fast model) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3. Build tool registry β”‚
β”‚ β”œβ”€β”€ execute_typescript (Code Mode sandbox) β”‚
β”‚ β”œβ”€β”€ search_skills / get_skill / register_skill β”‚
β”‚ └── skill tools (one per selected skill) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4. Generate system prompt β”‚
β”‚ β”œβ”€β”€ Code Mode type stubs β”‚
β”‚ └── Skill library documentation β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 5. Main chat() call (strong model) β”‚
β”‚ β”œβ”€β”€ Can call skill tools directly β”‚
β”‚ β”œβ”€β”€ Can write code via execute_typescript β”‚
β”‚ └── Can register new skills for future use β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### LLM calls

There are **two** LLM interactions per request when using the high-level API:

1. **Skill selection** (`selectRelevantSkills`) β€” A single chat call using the adapter you provide. It sends the last 5 conversation messages plus a catalog of skill names/descriptions, and asks the model to return a JSON array of relevant skill names. This should be a cheap/fast model (e.g., `gpt-4o-mini`, `claude-haiku-4-5`).

2. **Main chat** β€” The primary `chat()` call with your full model. This is where the LLM reasons, calls tools, writes code, and registers skills.

The selection call is lightweight β€” it only sees skill metadata (names, descriptions, usage hints), not full code. If there are no skills in storage or no messages, it short-circuits and skips the LLM call entirely.

## High-Level API: `codeModeWithSkills()`

```typescript
import { chat, maxIterations, toServerSentEventsStream } from '@tanstack/ai'
import { createNodeIsolateDriver } from '@tanstack/ai-isolate-node'
import { codeModeWithSkills } from '@tanstack/ai-code-mode-skills'
import { createFileSkillStorage } from '@tanstack/ai-code-mode-skills/storage'
import { openaiText } from '@tanstack/ai-openai'

const storage = createFileSkillStorage({ directory: './.skills' })
const driver = createNodeIsolateDriver()

const { registry, systemPrompt, selectedSkills } = await codeModeWithSkills({
config: {
driver,
tools: [myTool1, myTool2],
timeout: 60_000,
memoryLimit: 128,
},
adapter: openaiText('gpt-4o-mini'), // cheap model for skill selection
skills: {
storage,
maxSkillsInContext: 5,
},
messages, // current conversation
})

const stream = chat({
adapter: openaiText('gpt-4o'), // strong model for reasoning
toolRegistry: registry,
messages,
systemPrompts: ['You are a helpful assistant.', systemPrompt],
agentLoopStrategy: maxIterations(15),
})
```

`codeModeWithSkills` returns:

| Property | Type | Description |
|----------|------|-------------|
| `registry` | `ToolRegistry` | Mutable registry containing all tools. Pass to `chat()` via `toolRegistry`. |
| `systemPrompt` | `string` | Combined Code Mode + skill library documentation. |
| `selectedSkills` | `Array<Skill>` | Skills the selection model chose for this conversation. |

### What goes into the registry

The registry is populated with:

- **`execute_typescript`** β€” The Code Mode sandbox tool. Inside the sandbox, skills are also available as `skill_*` functions (loaded dynamically at execution time).
- **`search_skills`** β€” Search the skill library by query. Returns matching skill metadata.
- **`get_skill`** β€” Retrieve full details (including code) for a specific skill.
- **`register_skill`** β€” Save working code as a new skill. Newly registered skills are immediately added to the registry as callable tools.
- **One tool per selected skill** β€” Each selected skill becomes a direct tool (prefixed with `[SKILL]` in its description) that the LLM can call without going through `execute_typescript`.

## Manual API

If you want full control β€” for example, loading all skills instead of using LLM-based selection β€” use the lower-level functions directly. This is the approach used in the `ts-code-mode-web` example.

```typescript
import { chat, maxIterations } from '@tanstack/ai'
import { createCodeModeToolAndPrompt } from '@tanstack/ai-code-mode'
import { createNodeIsolateDriver } from '@tanstack/ai-isolate-node'
import {
createAlwaysTrustedStrategy,
createSkillManagementTools,
createSkillsSystemPrompt,
skillsToTools,
} from '@tanstack/ai-code-mode-skills'
import { createFileSkillStorage } from '@tanstack/ai-code-mode-skills/storage'

const trustStrategy = createAlwaysTrustedStrategy()
const storage = createFileSkillStorage({
directory: './.skills',
trustStrategy,
})
const driver = createNodeIsolateDriver()

// 1. Create Code Mode tool + prompt
const { tool: codeModeTool, systemPrompt: codeModePrompt } =
createCodeModeToolAndPrompt({
driver,
tools: [myTool1, myTool2],
timeout: 60_000,
memoryLimit: 128,
})

// 2. Load all skills and convert to tools
const allSkills = await storage.loadAll()
const skillIndex = await storage.loadIndex()

const skillTools = allSkills.length > 0
? skillsToTools({
skills: allSkills,
driver,
tools: [myTool1, myTool2],
storage,
timeout: 60_000,
memoryLimit: 128,
})
: []

// 3. Create management tools
const managementTools = createSkillManagementTools({
storage,
trustStrategy,
})

// 4. Generate skill library prompt
const skillsPrompt = createSkillsSystemPrompt({
selectedSkills: allSkills,
totalSkillCount: skillIndex.length,
skillsAsTools: true,
})

// 5. Assemble and call chat()
const stream = chat({
adapter: openaiText('gpt-4o'),
tools: [codeModeTool, ...managementTools, ...skillTools],
messages,
systemPrompts: [BASE_PROMPT, codeModePrompt, skillsPrompt],
agentLoopStrategy: maxIterations(15),
})
```

This approach skips the selection LLM call entirely β€” you load whichever skills you want and pass them in directly.

## Skill Storage

Skills are persisted through the `SkillStorage` interface. Two implementations are provided:

### File storage (production)

```typescript
import { createFileSkillStorage } from '@tanstack/ai-code-mode-skills/storage'

const storage = createFileSkillStorage({
directory: './.skills',
trustStrategy, // optional, defaults to createDefaultTrustStrategy()
})
```

Creates a directory structure:

```
.skills/
_index.json # Lightweight catalog for fast loading
fetch_github_stats/
meta.json # Description, schemas, hints, stats
code.ts # TypeScript source
compare_npm_packages/
meta.json
code.ts
```

### Memory storage (testing)

```typescript
import { createMemorySkillStorage } from '@tanstack/ai-code-mode-skills/storage'

const storage = createMemorySkillStorage()
```

Keeps everything in memory. Useful for tests and demos.

### Storage interface

Both implementations satisfy this interface:

| Method | Description |
|--------|-------------|
| `loadIndex()` | Load lightweight metadata for all skills (no code) |
| `loadAll()` | Load all skills with full details including code |
| `get(name)` | Get a single skill by name |
| `save(skill)` | Create or update a skill |
| `delete(name)` | Remove a skill |
| `search(query, options?)` | Search skills by text query |
| `updateStats(name, success)` | Record an execution result for trust tracking |

## Trust Strategies

Skills start untrusted and earn trust through successful executions. The trust level is metadata only β€” it does not currently gate execution. Four built-in strategies are available:

```typescript
import {
createDefaultTrustStrategy,
createAlwaysTrustedStrategy,
createRelaxedTrustStrategy,
createCustomTrustStrategy,
} from '@tanstack/ai-code-mode-skills'
```

| Strategy | Initial level | Provisional | Trusted |
|----------|--------------|-------------|---------|
| **Default** | `untrusted` | 10+ runs, β‰₯90% success | 100+ runs, β‰₯95% success |
| **Relaxed** | `untrusted` | 3+ runs, β‰₯80% success | 10+ runs, β‰₯90% success |
| **Always trusted** | `trusted` | β€” | β€” |
| **Custom** | Configurable | Configurable | Configurable |

```typescript
const strategy = createCustomTrustStrategy({
initialLevel: 'untrusted',
provisionalThreshold: { executions: 5, successRate: 0.85 },
trustedThreshold: { executions: 50, successRate: 0.95 },
})
```

## Skill Lifecycle

### Registration

When the LLM produces useful code via `execute_typescript`, the system prompt instructs it to call `register_skill` with:

- `name` β€” snake_case identifier (becomes the tool name)
- `description` β€” what the skill does
- `code` β€” TypeScript source that receives an `input` variable
- `inputSchema` / `outputSchema` β€” JSON Schema strings
- `usageHints` β€” when to use this skill
- `dependsOn` β€” other skills this one calls

The skill is saved to storage and (if a `ToolRegistry` was provided) immediately added as a callable tool in the current session.

### Execution

When a skill tool is called, the system:

1. Wraps the skill code with `const input = <serialized input>;`
2. Strips TypeScript syntax to plain JavaScript
3. Creates a fresh sandbox context with `external_*` bindings
4. Executes the code and returns the result
5. Updates execution stats (success/failure count) asynchronously

### Selection (high-level API only)

On each new request, `selectRelevantSkills`:

1. Takes the last 5 conversation messages as context
2. Builds a catalog from the skill index (name + description + first usage hint)
3. Asks the adapter to return a JSON array of relevant skill names (max `maxSkillsInContext`)
4. Loads full skill data for the selected names

If parsing fails or the model returns invalid JSON, it falls back to an empty selection β€” the request proceeds without pre-loaded skills, but the LLM can still search and use skills via the management tools.

## Skills as Tools vs. Sandbox Bindings

The `skillsAsTools` option (default: `true`) controls how skills are exposed:

| Mode | How the LLM calls a skill | Pros | Cons |
|------|--------------------------|------|------|
| **As tools** (`true`) | Direct tool call: `skill_name({ ... })` | Simpler for the LLM, shows in tool-call UI, proper input validation | One tool per skill in the tool list |
| **As bindings** (`false`) | Inside `execute_typescript`: `await skill_fetch_data({ ... })` | Skills composable in code, fewer top-level tools | LLM must write code to use them |

When `skillsAsTools` is enabled, the system prompt documents each skill with its schema, usage hints, and example calls. When disabled, skills appear as typed `skill_*` functions in the sandbox type stubs.

## Custom Events

Skill execution emits events through the TanStack AI event system:

| Event | When | Payload |
|-------|------|---------|
| `code_mode:skill_call` | Skill tool invoked | `{ skill, input, timestamp }` |
| `code_mode:skill_result` | Skill completed successfully | `{ skill, result, duration, timestamp }` |
| `code_mode:skill_error` | Skill execution failed | `{ skill, error, duration, timestamp }` |
| `skill:registered` | New skill saved via `register_skill` | `{ id, name, description, timestamp }` |

## Tips

- **Use a cheap model for selection.** The selection call only needs to match skill names to conversation context β€” `gpt-4o-mini` or `claude-haiku-4-5` work well.
- **Start without skills.** Get Code Mode working first, then add `@tanstack/ai-code-mode-skills` once you have tools that produce reusable patterns.
- **Monitor the skill count.** As the library grows, consider increasing `maxSkillsInContext` or switching to the manual API where you control which skills load.
- **Newly registered skills are available on the next message,** not in the current turn's tool list (unless using `ToolRegistry` with the high-level API, which adds them immediately).
- **Skills can call other skills.** Inside the sandbox, both `external_*` and `skill_*` functions are available. Set `dependsOn` when registering to document these relationships.
Loading
Loading