Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions skills/_reference/config-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
| `targets` | `TargetConfig[]` | **Yes** | Non-empty array. Docker images for sandboxed execution. |
| `workspace` | `WorkspaceConfig` | No | Workspace template and setup. |
| `executorPlugins` | `ExecutorPlugin[]` | No | Plugin directories installed into the executor's agent CLI inside the sandbox (Claude marketplace, Codex skills, Gemini extensions). Not installed in the judge sandbox — that's intentional, so the judge stays independent of the executor's tooling. |
| `executorMcpServers` | `ExecutorMcpServer[]` | No | MCP servers wired into the executor's agent CLI inside the sandbox (Claude `--mcp-config`, Codex `config.toml`). A separate mechanism from `executorPlugins` — a plugin's bundled MCP servers are not loaded. Not installed in the judge sandbox. |
| `sandbox` | `SandboxConfig` | **Yes** | Must be an object (can be `{}`). Resource limits, secrets, env vars. |

## SourceConfig (discriminated union on `type`)
Expand Down Expand Up @@ -186,6 +187,63 @@ What an adapter requires inside the plugin directory:
Each adapter fails fast at install time if its required file is missing — the
A/B comparison won't silently no-op.

## ExecutorMcpServer (discriminated union on optional `type`)

An MCP server wired into the executor's agent CLI. Independent of
`ExecutorPlugin` — MCP servers are wired through each CLI's native MCP-config
surface, NOT through agent plugins (a Claude plugin's `mcpServers` are not
loaded by `claude --plugin-dir`). Installed **only** in the executor sandbox;
the judge sandbox is kept MCP-free.

Every entry shares three fields:

| Field | Type | Required |
|-------|------|----------|
| `name` | `string` | Yes — server slug (letters/digits/`.`/`_`/`-`), unique across the array. Used as the MCP server name and install dir name. |
| `command` | `string` | Yes — executable that launches the MCP server (e.g. `node`, `npx`). |
| `args` | `string[]` | Yes — args to `command`. May contain the literal `${MCP_ROOT}` placeholder when a source is present. |

The optional `type` discriminator selects the source shape:

### CommandExecutorMcpServer (no `type`)

No source — `command`/`args` are used as-is (e.g. a server launched via `npx`).
`${MCP_ROOT}` must NOT appear in `args` (there is nothing to substitute) — the
config validator rejects it.

### LocalExecutorMcpServer (`type: "local"`)

| Field | Type | Required |
|-------|------|----------|
| `type` | `"local"` | Yes |
| `path` | `string` | Yes — host directory tree uploaded into the sandbox |

### GitExecutorMcpServer (`type: "git"`)

| Field | Type | Required |
|-------|------|----------|
| `type` | `"git"` | Yes |
| `url` | `string` | Yes — git repository URL |
| `branch` | `string` | No |
| `subpath` | `string` | No — path within the repo to the server source |
| `sparse` | `string[]` | No — sparse checkout paths |

### `${MCP_ROOT}` substitution

For sourced servers (`local`/`git`), the source directory is uploaded into the
sandbox and the literal string `${MCP_ROOT}` in each `args` entry is replaced
with that server's absolute sandbox install dir at install time. Sourceless
servers cannot use the placeholder.

### Per-adapter wiring

| Adapter | Sandbox destination | Wiring |
|---|---|---|
| `claude` | sourced servers extracted to `$HOME/.mcp-servers/<name>/` (outside `/workspace`) | combined `$HOME/.mcp-servers/mcp-config.json` passed via the `--mcp-config` flag (no `--strict-mcp-config`) |
| `codex` | sourced servers extracted to `$CODEX_HOME/.mcp-servers/<name>/` | `[mcp_servers.<name>]` block appended to `$CODEX_HOME/config.toml` (auto-read; existing content preserved) |
| `gemini` | — | Not supported in non-interactive mode. Adapter throws a clear error if `executorMcpServers` is non-empty. |
| custom | — | Not supported. Adapter throws a clear error if `executorMcpServers` is non-empty. |

## Validation Rules

1. Root must be a JSON object
Expand All @@ -198,6 +256,7 @@ A/B comparison won't silently no-op.
8. Custom agents must provide `envVar` and `baseUrl` in their secret
9. `baseUrl` must be a parseable URL
10. `executorPlugins`, if present, must be an array; each entry needs a `name` (slug-safe) and a valid `type` (`local` or `git`); names must be unique
11. `executorMcpServers`, if present, must be an array; each entry needs a slug-safe unique `name`, a string `command`, and a `string[]` `args`; `type` is optional but when present must be `local` (needs `path`) or `git` (needs `url`); when `type` is absent, `${MCP_ROOT}` must not appear in `args`

## Minimal Examples

Expand Down
76 changes: 76 additions & 0 deletions src/agents/__tests__/claude.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { describe, it, expect, vi, beforeEach } from 'vitest';
import { access } from 'node:fs/promises';
import { spawnAgent, spawnInteractive } from '../spawn.js';
import { uploadDirToSandbox } from '../../sandbox/scaffolding.js';
import { uploadMcpServerSources } from '../../sandbox/mcp.js';
import { ClaudeAdapter } from '../claude.js';
import { makeAgentResult } from '../../__tests__/helpers/fixtures.js';
import { makeMockSandboxClient } from '../../__tests__/helpers/mock-sandbox-client.js';
Expand All @@ -19,10 +20,15 @@ vi.mock('../../sandbox/scaffolding.js', () => ({
uploadDirToSandbox: vi.fn(),
}));

vi.mock('../../sandbox/mcp.js', () => ({
uploadMcpServerSources: vi.fn(),
}));

const mockSpawnAgent = vi.mocked(spawnAgent);
const mockSpawnInteractive = vi.mocked(spawnInteractive);
const mockAccess = vi.mocked(access);
const mockUploadDir = vi.mocked(uploadDirToSandbox);
const mockUploadMcpSources = vi.mocked(uploadMcpServerSources);

describe('ClaudeAdapter', () => {
let adapter: ClaudeAdapter;
Expand Down Expand Up @@ -187,4 +193,74 @@ describe('ClaudeAdapter', () => {
expect(cmd).not.toContain('--plugin-dir');
});
});

describe('installMcpServersInSandbox', () => {
it('is a no-op when given an empty server list', async () => {
const client = makeMockSandboxClient();
await adapter.installMcpServersInSandbox(client as any, []);
expect(client.runCommand).not.toHaveBeenCalled();
});

it('delegates uploads to uploadMcpServerSources and renders its result into mcp-config.json', async () => {
const client = makeMockSandboxClient();
client.runCommand.mockResolvedValue({ stdout: '/root', stderr: '', exitCode: 0 });
mockUploadMcpSources.mockResolvedValue([
{ name: 'mine', command: 'node', args: ['/root/.mcp-servers/mine/server.js', '--flag'] },
{ name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

await adapter.installMcpServersInSandbox(client as any, [
{ kind: 'sourced', name: 'mine', command: 'node', args: ['${MCP_ROOT}/server.js', '--flag'], hostDir: '/tmp/mine' },
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

// Adapter passes the right mcpRoot + server list to the dedicated MCP upload helper.
expect(mockUploadMcpSources).toHaveBeenCalledTimes(1);
expect(mockUploadMcpSources).toHaveBeenCalledWith(client, '/root/.mcp-servers', [
{ kind: 'sourced', name: 'mine', command: 'node', args: ['${MCP_ROOT}/server.js', '--flag'], hostDir: '/tmp/mine' },
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);
// Source-only uploads (uploadDirToSandbox) are not used for MCP payloads.
expect(mockUploadDir).not.toHaveBeenCalled();

// The base64-decoded MCP config JSON reflects the resolved args returned by the helper.
const writeCall = client.runCommand.mock.calls.find((c: any[]) => String(c[0]).includes('base64 -d'));
expect(writeCall).toBeDefined();
const b64 = String(writeCall![0]).match(/printf %s '([A-Za-z0-9+/=]+)'/)![1];
const cfg = JSON.parse(Buffer.from(b64, 'base64').toString('utf-8'));
expect(cfg.mcpServers.mine).toEqual({ command: 'node', args: ['/root/.mcp-servers/mine/server.js', '--flag'] });
expect(cfg.mcpServers.fs).toEqual({ command: 'npx', args: ['-y', 'server-filesystem'] });

// sandboxCommand emits --mcp-config, not --strict-mcp-config.
const cmd = adapter.sandboxCommand('do the thing');
expect(cmd).toContain("--mcp-config '/root/.mcp-servers/mcp-config.json'");
expect(cmd).not.toContain('--strict-mcp-config');
});

it('falls back to /root when $HOME degenerates to /', async () => {
const client = makeMockSandboxClient();
// First runCommand call is the $HOME probe; later calls are the config write.
client.runCommand.mockImplementation(async (cmd: string) =>
cmd.includes('${HOME')
? { stdout: '/', stderr: '', exitCode: 0 }
: { stdout: '', stderr: '', exitCode: 0 },
);
mockUploadMcpSources.mockResolvedValue([]);

await adapter.installMcpServersInSandbox(client as any, [
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

// Adapter should treat $HOME=/ as degenerate and use /root, not //.mcp-servers.
expect(mockUploadMcpSources).toHaveBeenCalledWith(client, '/root/.mcp-servers', expect.any(Array));
const cmd = adapter.sandboxCommand('go');
expect(cmd).toContain("--mcp-config '/root/.mcp-servers/mcp-config.json'");
expect(cmd).not.toContain('//.mcp-servers');
});

it('sandboxCommand omits --mcp-config when no MCP servers were installed', () => {
const fresh = new ClaudeAdapter({ command: 'claude' });
expect(fresh.sandboxCommand('go')).not.toContain('--mcp-config');
});
});
});
74 changes: 74 additions & 0 deletions src/agents/__tests__/codex.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { describe, it, expect, vi, beforeEach } from 'vitest';
import { writeFile, readFile, rm, access, readdir, stat } from 'node:fs/promises';
import { spawnAgent, spawnInteractive } from '../spawn.js';
import { uploadDirToSandbox } from '../../sandbox/scaffolding.js';
import { uploadMcpServerSources } from '../../sandbox/mcp.js';
import { CodexAdapter } from '../codex.js';
import { makeAgentResult } from '../../__tests__/helpers/fixtures.js';
import { makeMockSandboxClient } from '../../__tests__/helpers/mock-sandbox-client.js';
Expand All @@ -24,6 +25,10 @@ vi.mock('../../sandbox/scaffolding.js', () => ({
uploadDirToSandbox: vi.fn(),
}));

vi.mock('../../sandbox/mcp.js', () => ({
uploadMcpServerSources: vi.fn(),
}));

const mockSpawnAgent = vi.mocked(spawnAgent);
const mockSpawnInteractive = vi.mocked(spawnInteractive);
const mockWriteFile = vi.mocked(writeFile);
Expand All @@ -33,6 +38,7 @@ const mockAccess = vi.mocked(access);
const mockReaddir = vi.mocked(readdir);
const mockStat = vi.mocked(stat);
const mockUploadDir = vi.mocked(uploadDirToSandbox);
const mockUploadMcpSources = vi.mocked(uploadMcpServerSources);

describe('CodexAdapter', () => {
let adapter: CodexAdapter;
Expand Down Expand Up @@ -230,4 +236,72 @@ describe('CodexAdapter', () => {
])).rejects.toThrow(/'plugin-a'.*'plugin-b'/);
});
});

describe('installMcpServersInSandbox', () => {
it('is a no-op when given an empty server list', async () => {
const client = makeMockSandboxClient();
await adapter.installMcpServersInSandbox(client as any, []);
expect(client.runCommand).not.toHaveBeenCalled();
});

it('delegates uploads to uploadMcpServerSources and renders its result into config.toml blocks', async () => {
const client = makeMockSandboxClient();
client.runCommand.mockResolvedValue({ stdout: '/root/.codex', stderr: '', exitCode: 0 });
mockUploadMcpSources.mockResolvedValue([
{ name: 'mine', command: 'node', args: ['/root/.codex/.mcp-servers/mine/server.js'] },
{ name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

await adapter.installMcpServersInSandbox(client as any, [
{ kind: 'sourced', name: 'mine', command: 'node', args: ['${MCP_ROOT}/server.js'], hostDir: '/tmp/mine' },
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

// Adapter passes the right mcpRoot + server list to the dedicated MCP upload helper.
expect(mockUploadMcpSources).toHaveBeenCalledTimes(1);
expect(mockUploadMcpSources).toHaveBeenCalledWith(client, '/root/.codex/.mcp-servers', [
{ kind: 'sourced', name: 'mine', command: 'node', args: ['${MCP_ROOT}/server.js'], hostDir: '/tmp/mine' },
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);
// Source-only uploads (uploadDirToSandbox) are not used for MCP payloads.
expect(mockUploadDir).not.toHaveBeenCalled();

// The base64-decoded TOML appended to config.toml reflects the resolved args returned by the helper.
const writeCall = client.runCommand.mock.calls.find((c: any[]) => String(c[0]).includes('config.toml'));
expect(writeCall).toBeDefined();
const b64 = String(writeCall![0]).match(/printf %s '([A-Za-z0-9+/=]+)'/)![1];
const toml = Buffer.from(b64, 'base64').toString('utf-8');
expect(toml).toContain('[mcp_servers.mine]');
expect(toml).toContain('command = "node"');
expect(toml).toContain('args = ["/root/.codex/.mcp-servers/mine/server.js"]');
expect(toml).toContain('[mcp_servers.fs]');
expect(toml).toContain('args = ["-y", "server-filesystem"]');
// Append, not clobber.
expect(String(writeCall![0])).toContain('>>');
});

it('falls back to /root/.codex when $HOME=/ leaks through as //.codex', async () => {
const client = makeMockSandboxClient();
// What `printf %s "${CODEX_HOME:-${HOME:-/root}/.codex}"` produces when
// $HOME=/ and $CODEX_HOME is unset: literal "/" + literal "/.codex".
client.runCommand.mockImplementation(async (cmd: string) =>
cmd.includes('CODEX_HOME')
? { stdout: '//.codex', stderr: '', exitCode: 0 }
: { stdout: '', stderr: '', exitCode: 0 },
);
mockUploadMcpSources.mockResolvedValue([]);

await adapter.installMcpServersInSandbox(client as any, [
{ kind: 'sourceless', name: 'fs', command: 'npx', args: ['-y', 'server-filesystem'] },
]);

// Adapter should recognise /.codex as the $HOME=/ degenerate case and rewrite to /root/.codex.
expect(mockUploadMcpSources).toHaveBeenCalledWith(client, '/root/.codex/.mcp-servers', expect.any(Array));
// The config write should target /root/.codex/config.toml, not /.codex/config.toml.
const writeCall = client.runCommand.mock.calls.find((c: any[]) => String(c[0]).includes('config.toml'));
expect(writeCall).toBeDefined();
expect(String(writeCall![0])).toContain("'/root/.codex/config.toml'");
expect(String(writeCall![0])).not.toContain("'/.codex/config.toml'");
});
});
});
12 changes: 11 additions & 1 deletion src/agents/adapter.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { AgentConfig, AgentResult, ResolvedExecutorPlugin } from '../types.js';
import { AgentConfig, AgentResult, ResolvedExecutorPlugin, ResolvedExecutorMcpServer } from '../types.js';
import type { MicrosandboxClient } from '../sandbox/microsandbox.js';
import { ClaudeAdapter } from './claude.js';
import { CodexAdapter } from './codex.js';
Expand Down Expand Up @@ -43,6 +43,16 @@ export interface AgentAdapter {
* clear error here instead of silently succeeding.
*/
installPluginsInSandbox(client: MicrosandboxClient, plugins: ResolvedExecutorPlugin[]): Promise<void>;

/**
* Install MCP servers into the running sandbox so the agent CLI connects to
* them at startup. Parallel to `installPluginsInSandbox` but a separate
* mechanism — each adapter wires servers through its CLI's native MCP-config
* surface (Claude: `--mcp-config`, Codex: `[mcp_servers.*]` in `config.toml`).
* Adapters whose CLI cannot load MCP servers in non-interactive mode raise a
* clear error here instead of silently succeeding.
*/
installMcpServersInSandbox(client: MicrosandboxClient, servers: ResolvedExecutorMcpServer[]): Promise<void>;
}

const KNOWN_ADAPTERS: Record<string, new (config: AgentConfig) => AgentAdapter> = {
Expand Down
19 changes: 18 additions & 1 deletion src/agents/base.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import type { AgentConfig, AgentResult, ResolvedExecutorPlugin } from '../types.js';
import type { AgentConfig, AgentResult, ResolvedExecutorPlugin, ResolvedExecutorMcpServer } from '../types.js';
import type { AgentAdapter } from './adapter.js';
import type { MicrosandboxClient } from '../sandbox/microsandbox.js';
import { spawnAgent, spawnInteractive } from './spawn.js';
Expand Down Expand Up @@ -102,6 +102,23 @@ export abstract class BaseAdapter implements AgentAdapter {
);
}

/**
* Install MCP servers into the sandbox VM. Subclasses override with
* CLI-specific wiring (Claude: `--mcp-config`, Codex: `config.toml`).
* Default raises a clear error so adapters that don't support MCP servers
* fail loudly when the user wires `executorMcpServers` against them.
*/
async installMcpServersInSandbox(
_client: MicrosandboxClient,
servers: ResolvedExecutorMcpServer[],
): Promise<void> {
if (servers.length === 0) return;
throw new Error(
`Agent adapter '${this.name}' does not support executorMcpServers. ` +
`Either remove executorMcpServers from config or switch executor to an adapter that supports MCP servers.`,
);
}

/** Shared helper: spawn the agent process with piped stdio. */
protected spawn(args: string[], workDir: string, env?: Record<string, string>, timeout?: number, stdin?: string): Promise<AgentResult> {
return spawnAgent(this.config.command, args, { cwd: workDir, env, timeout, stdin });
Expand Down
Loading
Loading