Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ claude setup-token # interactive — generates a long-lived OAuth to
export CLAUDE_CODE_OAUTH_TOKEN='<token>' # before running the eval
```

How it works: the runtime sniffs the resolved value's prefix at sandbox-create time. Anthropic OAuth tokens start with `sk-ant-oat` (e.g. `sk-ant-oat01-…`); API keys start with `sk-ant-api` (e.g. `sk-ant-api03-…`). Both paths flow through microsandbox's `Secret.env()` TLS substitution — cleartext never enters the VM; the env var inside the sandbox contains a placeholder, and microsandbox swaps it for the real value on outbound TLS to `api.anthropic.com` only. The prefix only decides which env var name (`CLAUDE_CODE_OAUTH_TOKEN` vs `ANTHROPIC_API_KEY`) carries the placeholder. Subscription concurrent-session caps apply.
How it works: config validation resolves the secret value's prefix at load time. Anthropic OAuth tokens start with `sk-ant-oat` (e.g. `sk-ant-oat01-…`); API keys start with `sk-ant-api` (e.g. `sk-ant-api03-…`). The prefix decides which env var name (`CLAUDE_CODE_OAUTH_TOKEN` vs `ANTHROPIC_API_KEY`) carries the placeholder — this is set on `secret.envVar` before the sandbox is created. Both paths flow through microsandbox's `Secret.env()` TLS substitution — cleartext never enters the VM; the env var inside the sandbox contains a placeholder, and microsandbox swaps it for the real value on outbound TLS to `api.anthropic.com` only. Subscription concurrent-session caps apply.

#### Custom agents

Expand Down
2 changes: 1 addition & 1 deletion skills/_reference/config-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ The resolved `secret.value` is wired into the sandbox via microsandbox `Secret.e

By default the placeholder lands under the adapter's API-key env var (e.g. `ANTHROPIC_API_KEY` for claude, see [Known Agent Defaults](#known-agent-defaults-auto-filled-when-field-is-absent) below).

**Claude-only: subscription auth.** When `command: "claude"` and the resolved value starts with `sk-ant-oat` (a Claude Code subscription OAuth token issued by `claude setup-token`, e.g. `sk-ant-oat01-…`), the placeholder lands under `CLAUDE_CODE_OAUTH_TOKEN` instead. This lets you bill the run against a Pro / Max / Team / Enterprise plan instead of per-token API charges. Point `secret.value` at `"$CLAUDE_CODE_OAUTH_TOKEN"` to opt in. Other adapters (codex, gemini, custom) only have the API-key path today.
**Claude-only: subscription auth.** When `command: "claude"` and the resolved value starts with `sk-ant-oat` (a Claude Code subscription OAuth token issued by `claude setup-token`, e.g. `sk-ant-oat01-…`), config validation sets `secret.envVar` to `CLAUDE_CODE_OAUTH_TOKEN` at load time. This lets you bill the run against a Pro / Max / Team / Enterprise plan instead of per-token API charges. Point `secret.value` at `"$CLAUDE_CODE_OAUTH_TOKEN"` to opt in. Other adapters (codex, gemini, custom) only have the API-key path today.

### AgentSecretConfig

Expand Down
4 changes: 4 additions & 0 deletions src/agents/adapter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ export interface AgentAdapter {
readonly defaultBaseUrl: string | null;
/** Extra hosts the agent needs to reach with the secret (e.g. telemetry endpoints). */
readonly additionalAllowHosts: string[];
/** Env var name for an alternative auth mode (e.g. OAuth token). Only set by adapters that support it. */
readonly oauthEnvVar?: string;
/** Value prefix that triggers the alternative auth mode (e.g. "sk-ant-oat"). */
readonly oauthValuePrefix?: string;

/** Full lifecycle: spawn with schema args → envelope unwrap → retry on parse failure → return clean result. */
run(prompt: string, schema: object, workDir: string, options?: {
Expand Down
2 changes: 2 additions & 0 deletions src/agents/claude.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ export class ClaudeAdapter extends BaseAdapter {
readonly baseUrlEnvVar = 'ANTHROPIC_BASE_URL';
readonly defaultEnvVar = 'ANTHROPIC_API_KEY';
readonly defaultBaseUrl = 'https://api.anthropic.com';
readonly oauthEnvVar = 'CLAUDE_CODE_OAUTH_TOKEN';
readonly oauthValuePrefix = 'sk-ant-oat';

/**
* Sandbox paths of plugins extracted by `installPluginsInSandbox()`.
Expand Down
5 changes: 2 additions & 3 deletions src/commands/execute.ts
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,8 @@ export async function executeTestCase(

const executorConfig: SandboxAgentConfig = config.agents?.executor
?? { command: 'claude', secret: { value: '$ANTHROPIC_API_KEY' } };
const execAdapter = createAdapter(executorConfig);
applyAgentAuth(executorConfig.secret, execAdapter, secrets, env);
const adapter = createAdapter(executorConfig);
applyAgentAuth(executorConfig.secret, adapter, secrets, env);

await client.create(
sandboxName(testCase.id),
Expand All @@ -183,7 +183,6 @@ export async function executeTestCase(
]);

// Install agent CLI inside the sandbox
const adapter = createAdapter(executorConfig);
const installCmd = adapter.installCommand;
if (installCmd) {
const installResult = await client.runCommand(installCmd);
Expand Down
40 changes: 32 additions & 8 deletions src/core/__tests__/config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -224,14 +224,38 @@ describe('loadConfig', () => {
await expect(loadConfig('/fake/config.json')).rejects.toThrow(/valid URL/);
});

it('accepts secret pointing at $CLAUDE_CODE_OAUTH_TOKEN (auth mode resolved later by value prefix)', async () => {
const config = {
...validConfig,
agents: { judge: { command: 'claude', secret: { value: '$CLAUDE_CODE_OAUTH_TOKEN' } } },
};
mockReadFile.mockResolvedValue(JSON.stringify(config));
const result = await loadConfig('/fake/config.json');
expect(result.agents?.judge?.secret?.value).toBe('$CLAUDE_CODE_OAUTH_TOKEN');
it('sets envVar to CLAUDE_CODE_OAUTH_TOKEN when resolved value has OAuth prefix', async () => {
const origOAuth = process.env.CLAUDE_CODE_OAUTH_TOKEN;
process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-fake-test-token';
try {
const config = {
...validConfig,
agents: { judge: { command: 'claude', secret: { value: '$CLAUDE_CODE_OAUTH_TOKEN' } } },
};
mockReadFile.mockResolvedValue(JSON.stringify(config));
const result = await loadConfig('/fake/config.json');
expect(result.agents?.judge?.secret?.envVar).toBe('CLAUDE_CODE_OAUTH_TOKEN');
} finally {
if (origOAuth === undefined) delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
else process.env.CLAUDE_CODE_OAUTH_TOKEN = origOAuth;
}
});

it('keeps envVar as ANTHROPIC_API_KEY when resolved value has API key prefix', async () => {
const origKey = process.env.ANTHROPIC_API_KEY;
process.env.ANTHROPIC_API_KEY = 'sk-ant-api03-fake-test-key';
try {
const config = {
...validConfig,
agents: { judge: { command: 'claude', secret: { value: '$ANTHROPIC_API_KEY' } } },
};
mockReadFile.mockResolvedValue(JSON.stringify(config));
const result = await loadConfig('/fake/config.json');
expect(result.agents?.judge?.secret?.envVar).toBe('ANTHROPIC_API_KEY');
} finally {
if (origKey === undefined) delete process.env.ANTHROPIC_API_KEY;
else process.env.ANTHROPIC_API_KEY = origKey;
}
});

describe('executorPlugins', () => {
Expand Down
22 changes: 19 additions & 3 deletions src/core/config.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { readFile } from 'node:fs/promises';
import { Config, AgentConfig } from '../types.js';
import { createAdapter } from '../agents/adapter.js';
import { resolveSecretValue } from './env.js';

export async function loadConfig(configPath: string): Promise<Config> {
let raw: string;
Expand Down Expand Up @@ -187,9 +188,7 @@ export function validateConfig(data: unknown, configPath?: string): Config {
const isSandboxRole = SANDBOX_ROLES.includes(role);

if (isSandboxRole) {
// Sandbox agents (executor/judge) require secret. Auth mode (API key vs Claude Code
// subscription OAuth token) is auto-detected from the resolved value's prefix at
// sandbox-create time.
// Sandbox agents (executor/judge) require secret.
if (!agent.secret || typeof agent.secret !== 'object' || Array.isArray(agent.secret)) {
throw new Error(`agents.${role} requires a secret with at least { value } for secure sandbox execution`);
}
Expand All @@ -204,6 +203,23 @@ export function validateConfig(data: unknown, configPath?: string): Config {
if (!secret.envVar) secret.envVar = adapter.defaultEnvVar;
if (!secret.baseUrl) secret.baseUrl = adapter.defaultBaseUrl;
if (!secret.baseUrlEnvVar) secret.baseUrlEnvVar = adapter.baseUrlEnvVar;

// Resolve auth mode from the credential's value prefix.
// E.g. Claude OAuth tokens (sk-ant-oat…) switch envVar to CLAUDE_CODE_OAUTH_TOKEN.
// If the env var isn't set yet (e.g. during config validation only),
// skip — the default envVar stays and resolution will happen at runtime.
if (adapter.oauthValuePrefix && adapter.oauthEnvVar) {
try {
const resolved = resolveSecretValue(secret.value as string, secret.envVar as string);
if (resolved.startsWith(adapter.oauthValuePrefix)) {
secret.envVar = adapter.oauthEnvVar;
}
} catch {
// Env var not set at config-load time — keep the default envVar.
// If secret.value is a $VAR reference, applyAgentAuth will resolve
// it again at sandbox-create time and fail loudly if still unset.
}
}
} else {
// Custom agents must specify envVar and baseUrl
if (!secret.envVar || typeof secret.envVar !== 'string') {
Expand Down
19 changes: 19 additions & 0 deletions src/core/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,25 @@ export async function loadDotenv(dir: string = process.cwd()): Promise<void> {
}
}

/**
* Resolve a config value that may be a `$ENV_VAR` reference.
* If the value starts with `$`, the remainder is looked up in `process.env`.
* Throws if the referenced env var is not set.
*/
export function resolveSecretValue(value: string, label: string): string {
if (value.startsWith('$')) {
const hostVar = value.slice(1);
const hostValue = process.env[hostVar];
if (hostValue === undefined) {
throw new Error(
`Environment variable '${hostVar}' referenced in sandbox config for ${label} is not set on the host`,
);
}
return hostValue;
}
return value;
}

/**
* Resolve a 1Password secret reference (op://vault/item/field) using the `op` CLI.
*/
Expand Down
25 changes: 3 additions & 22 deletions src/sandbox/__tests__/microsandbox.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -332,41 +332,22 @@ describe('MicrosandboxClient', () => {

describe('applyAgentAuth', () => {
const ORIGINAL_API_KEY = process.env.ANTHROPIC_API_KEY;
const ORIGINAL_OAUTH = process.env.CLAUDE_CODE_OAUTH_TOKEN;

beforeEach(() => {
vi.clearAllMocks();
});

afterEach(() => {
const restore = (key: string, value: string | undefined) => {
if (value === undefined) delete process.env[key];
else process.env[key] = value;
};
restore('ANTHROPIC_API_KEY', ORIGINAL_API_KEY);
restore('CLAUDE_CODE_OAUTH_TOKEN', ORIGINAL_OAUTH);
if (ORIGINAL_API_KEY === undefined) delete process.env.ANTHROPIC_API_KEY;
else process.env.ANTHROPIC_API_KEY = ORIGINAL_API_KEY;
});

const claudeAdapter = {
baseUrlEnvVar: 'ANTHROPIC_BASE_URL',
additionalAllowHosts: [],
};

it('routes an OAuth-prefixed value through Secret.env under CLAUDE_CODE_OAUTH_TOKEN', async () => {
const { Secret } = await import('microsandbox');
process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-fake-test-token';
applyAgentAuth({
envVar: 'CLAUDE_CODE_OAUTH_TOKEN',
value: '$CLAUDE_CODE_OAUTH_TOKEN',
baseUrl: 'https://api.anthropic.com',
}, claudeAdapter, [], {});
expect(Secret.env).toHaveBeenCalledWith('CLAUDE_CODE_OAUTH_TOKEN', expect.objectContaining({
value: 'sk-ant-oat01-fake-test-token',
allowHosts: ['api.anthropic.com'],
}));
});

it('routes an API-key value through Secret.env under the agent-specific env var', async () => {
it('passes secret.envVar through to Secret.env (auth mode decided by config)', async () => {
const { Secret } = await import('microsandbox');
process.env.ANTHROPIC_API_KEY = 'sk-ant-api03-fake-test-key';
const env: Record<string, string> = {};
Expand Down
49 changes: 8 additions & 41 deletions src/sandbox/microsandbox.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import type {
} from 'microsandbox';
import type { SandboxConfig, SecretConfig, AgentSecretConfig } from '../types.js';
import type { AgentAdapter } from '../agents/adapter.js';
import { resolveSecretValue } from '../core/env.js';

export interface CommandResult {
stdout: string;
Expand All @@ -23,7 +24,7 @@ export function buildSecrets(
if (!secrets) return [];
const entries: SecretEntry[] = [];
for (const [envVar, cfg] of Object.entries(secrets)) {
const value = resolveValue(cfg.value, envVar);
const value = resolveSecretValue(cfg.value, envVar);
entries.push(
Secret.env(envVar, {
value,
Expand All @@ -45,33 +46,17 @@ export function resolveEnv(
if (!env) return {};
const resolved: Record<string, string> = {};
for (const [key, value] of Object.entries(env)) {
resolved[key] = resolveValue(value, key);
resolved[key] = resolveSecretValue(value, key);
}
return resolved;
}

// Claude-specific credential format. Subscription OAuth tokens are prefixed
// `sk-ant-oat` followed by a version (e.g. `sk-ant-oat01-…`), issued by
// `claude setup-token`. API keys use `sk-ant-api`. The framework picks the
// env-var slot the placeholder lands under by inspecting the resolved
// value's prefix — no separate config flag needed.
const OAUTH_TOKEN_PREFIX = 'sk-ant-oat';
const OAUTH_TOKEN_ENV_VAR = 'CLAUDE_CODE_OAUTH_TOKEN';

/**
* Wire an agent's secret into the sandbox `secrets` and `env`.
*
* Both auth modes (API key and Claude Code subscription OAuth) go through
* microsandbox `Secret.env()` TLS substitution — the cleartext value never
* enters the VM. Inside the sandbox the env var contains the
* `$MSB_<env-var-name>` placeholder; microsandbox swaps it for the real value
* on outbound TLS to the allowed host only.
*
* The resolved value's prefix picks which env var name carries the placeholder:
* - `sk-ant-oat…` (Claude Code subscription OAuth, issued by `claude setup-token`)
* → `CLAUDE_CODE_OAUTH_TOKEN`
* - anything else (API keys for known agents, custom-agent secrets)
* → `secret.envVar` (= `ANTHROPIC_API_KEY` for claude, etc.)
* The auth mode (which env var to use) is already decided by config validation
* in `core/config.ts` — this function just injects whatever `secret.envVar`
* says via microsandbox `Secret.env()` TLS substitution.
*
* Mutates `secrets` and `env` in place.
*/
Expand All @@ -84,36 +69,18 @@ export function applyAgentAuth(
if (!secret.envVar || !secret.baseUrl) {
throw new Error('Agent secret must have envVar and baseUrl set (should be filled by config validation)');
}
const value = resolveValue(secret.value, secret.envVar);

const envVar = value.startsWith(OAUTH_TOKEN_PREFIX)
? OAUTH_TOKEN_ENV_VAR
: secret.envVar;
const value = resolveSecretValue(secret.value, secret.envVar);

const hostname = new URL(secret.baseUrl).hostname;
const allowHosts = [hostname, ...adapter.additionalAllowHosts];
secrets.push(Secret.env(envVar, { value, allowHosts }));
secrets.push(Secret.env(secret.envVar, { value, allowHosts }));

const baseUrlVar = secret.baseUrlEnvVar ?? adapter.baseUrlEnvVar;
if (baseUrlVar) {
env[baseUrlVar] = secret.baseUrl;
}
}

function resolveValue(value: string, envVar: string): string {
if (value.startsWith('$')) {
const hostVar = value.slice(1);
const hostValue = process.env[hostVar];
if (hostValue === undefined) {
throw new Error(
`Environment variable '${hostVar}' referenced in sandbox config for ${envVar} is not set on the host`,
);
}
return hostValue;
}
return value;
}

export class MicrosandboxClient {
private sandbox: Sandbox | null = null;
private readonly config: SandboxConfig;
Expand Down
24 changes: 3 additions & 21 deletions src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,28 +109,10 @@ export interface AgentConfig {
logPattern?: string;
}

/** Agent config for sandboxed execution (executor/judge).
*
* Both auth modes flow through microsandbox `Secret.env()` TLS substitution —
* the cleartext credential never enters the VM. Inside the sandbox the env
* var contains a `$MSB_<name>` placeholder; microsandbox swaps it for the
* real value on outbound TLS to the allowed host only.
*
* The resolved `secret.value`'s prefix picks which env var name carries the
* placeholder:
*
* - `sk-ant-oat…` (Claude Code subscription OAuth token, issued by
* `claude setup-token`, requires Pro / Max / Team / Enterprise) →
* `CLAUDE_CODE_OAUTH_TOKEN`. Avoids per-token API billing.
* - anything else (API keys for known agents, custom-agent secrets) →
* `secret.envVar` (= `ANTHROPIC_API_KEY` for claude, etc.).
*
* Point `secret.value` at the host env var that holds the credential —
* `$ANTHROPIC_API_KEY` for the API-key path, `$CLAUDE_CODE_OAUTH_TOKEN` for
* the subscription path.
*/
/** Agent config for sandboxed execution (executor/judge). Secret is required for microsandbox TLS injection.
* Auth mode (which env var carries the secret) is resolved at config-load time by inspecting
* the credential's value prefix against the adapter's `oauthValuePrefix`. */
export interface SandboxAgentConfig extends AgentConfig {
/** Agent's secret and base URL. Auth mode is determined from the resolved value's prefix. */
secret: AgentSecretConfig;
}

Expand Down
Loading