Hard spending limits for AI agents on AWS - outside the runtime, at cloud scale.
AWS makes it surprisingly difficult to set hard budget limits. CloudWatch billing alarms are delayed by hours. AWS Budgets can send notifications but can't stop a running workload. There's no native way to say "this workload can spend $5 on Bedrock, Anthropic, or OpenAI and then stop." For human operators that's annoying. For autonomous AI agents that make their own API calls, loop on failures, and spawn sub-agents - it's a real problem.
Klanker Maker is an open-source platform that puts enforceable spending limits between your AI agents and your AWS bill. It turns declarative YAML profiles into budget-capped, policy-locked AWS sandboxes - each with its own Security Group boundary, IAM role, network allowlists, and a dollar ceiling that actually stops the workload when the money runs out. The enforcement lives in the infrastructure (proxy layer + IAM revocation), not in the agent runtime, so no agent can spend past its budget regardless of how it's built or what SDK it uses.
Define what an agent is allowed to do. Set how much it can spend on compute and AI tokens. Walk away.
Art by Mike Wigmore (@mikewigmore)
$ km create profiles/goose.yaml
✓ Profile validated
✓ Budget: compute $0.50, AI $1.00, warning at 80%
✓ Budget enforcer Lambda deployed
✓ Metadata stored (alias: goose)
──────────────────────────────────────────────────
Sandbox goose-e6c7d024 created successfully. (43s)
TTL: 4h (expires 11:42:15 PM EDT)
$ km list
# ALIAS SANDBOX ID STATUS TTL
1 goose goose-e6c7d024 running 3h42m
$ km status goose-e6c7d024
Sandbox ID: goose-e6c7d024
Profile: goose
Substrate: ec2
Region: us-east-1
Status: running
Created At: 2026-03-31 7:42:15 PM EDT
TTL Expiry: 2026-03-31 11:42:15 PM EDT
Budget:
Compute: $0.0312 / $0.5000 (6.2%)
AI: $0.4200 / $1.0000 (42.0%)
The agent ecosystem is exploding. My GitHub stars tell the story - in the last few months alone I've starred:
Coding agents that need real compute, real network, and real credentials:
- Goose - Block's autonomous agent that installs deps, edits files, runs tests, orchestrates workflows
- Aider - AI pair programming in your terminal with automatic git commits
- OpenDev - open-source coding agent in the terminal
- open-swe - LangChain's asynchronous coding agent
- DeepCode - agentic coding for Paper2Code, Text2Web, Text2Backend
- deepagents - LangGraph harness with planning, filesystem, and sub-agent spawning
Multi-agent orchestrators that spawn fleets of workers:
- agent-orchestrator - parallel coding agents with autonomous CI fixes and code reviews
- nanoclaw - lightweight agent on Anthropic's Agent SDK, runs in containers
- openclaw - personal AI assistant across platforms
- pi-mono - coding agent CLI, unified LLM API, Slack bot, vLLM pods
- gobii-platform - always-on AI workforce
- autoresearch - Karpathy's agents running research on single-GPU training automatically
Security and red-team agents that definitely need containment:
- redamon - AI-powered red team framework, recon to exploitation, zero human intervention
- raptor - turns Claude Code into an offensive/defensive security agent
- hexstrike-ai - 150+ cybersecurity tools orchestrated by AI agents
- strix - open-source AI hackers that find and fix vulnerabilities
- shannon - autonomous white-box AI pentester for web apps and APIs
- EVA - AI-assisted penetration testing agent
Sandbox platforms solving the same problem from different angles:
- agent-sandbox - Kubernetes SIG for isolated agent runtimes
- E2B - secure cloud environments for enterprise agents
- OpenSandbox - Alibaba's sandbox platform with Docker/K8s runtimes
- void-box - composable agent runtime with enforced isolation
- monty - Pydantic's minimal, secure Python interpreter in Rust for AI
Every one of these projects needs somewhere safe to run. The common pattern is either "trust the agent" (bad) or "containerize it locally" (insufficient - no real cloud resources, no real credentials, no real network). What's missing is cloud-native physical isolation - a real VPC, real IAM boundaries, real network controls, with a budget ceiling that prevents a $10 experiment from becoming a $10,000 AWS bill.
That's what Klanker Maker builds. The sandbox is a compiled policy object - you declare the constraints and the infrastructure is the compiled artifact:
- Budget ceiling - set a dollar cap for compute and AI API spend per sandbox. At 80% you get a warning email. At 100% the sandbox is suspended, not destroyed - top it up and resume.
- Network enforcement (three modes) - choose your enforcement layer per profile:
- Proxy mode (default) - iptables DNAT redirects traffic to userspace proxy sidecars for MITM inspection. Traditional approach, works everywhere.
- eBPF mode - Cilium-style cgroup BPF programs enforce DNS/HTTP/TLS-SNI allowlists directly in the kernel. No iptables, no DNAT bypass possible (fixes the root-user escape). Four BPF programs (connect4, sendmsg4, sockops, egress) with an LPM trie for CIDR allowlists, a userspace DNS resolver that populates BPF maps on the fly, and a ring buffer for structured audit events. E2E verified on AL2023 kernel 6.18 across 14 iterations.
- Both mode (gatekeeper) - eBPF
connect4as the primary enforcer in block mode, with selective DNAT rewrite to a transparent proxy for L7-required hosts. The proxy recovers original destinations via pinned BPF maps (src_port_to_sock→sock_to_original_ip). Non-L7 traffic flows direct — never touches the proxy. E2E verified: allowed repos clone, blocked repos get 403, evil.com gets EPERM, non-proxy hosts go direct.
- Scoped identity - each sandbox gets its own IAM role session, region-locked, time-limited, with only the permissions the profile declares.
- Automatic lifecycle - TTL auto-destroy, idle timeout, artifact upload on exit (including on spot interruption), and email notifications for every lifecycle event.
- Spot-first economics - EC2 Spot and Fargate Spot by default. A
t3.mediumspot instance inus-east-1costs ~$0.01/hr. Run 10 agent sandboxes for a full workday for under $1 in compute.
The difference between Klanker Maker and the other sandbox platforms: this is pure AWS infrastructure - no orchestration layer to trust, no shared runtime, no container escape surface. Each region has a shared VPC (provisioned once by km init), and every sandbox gets its own Security Groups, IAM role, and sidecar enforcement. The isolation is at the network policy and identity layer, backed by real AWS primitives.
# Profile-level enforcement selection
spec:
network:
enforcement: "ebpf" # or "proxy" (default) or "both"
egress:
allowedDNSSuffixes: [".amazonaws.com", ".github.com", ...]
allowedHosts: ["api.anthropic.com", ...]Klanker Maker follows AWS Organizations best practices, supporting either a three-account or two-account topology. In both models, sandboxes run in a dedicated application account - completely separated from the account that owns the domain and applies SCP policies.
| Account | Role | What Lives Here | Why Separate |
|---|---|---|---|
| Management | DNS, identity, org root | Route53 hosted zone, domain registration, AWS SSO, Organizations root | Domain and identity are org-wide - they don't belong in a sandbox blast radius |
| Terraform | State and provisioning | S3 state buckets, DynamoDB lock tables, cross-account provisioning role | Terraform state contains every resource ARN and secret path - isolating it limits exposure if the application account is compromised |
| Application | Sandbox execution | Regional VPCs, EC2/ECS instances, IAM sandbox roles, SES, Lambda handlers, DynamoDB budget table, S3 artifacts, CloudWatch Logs | This is where agents run - if an agent escapes its sandbox, it can only reach resources in this account, not state or DNS |
In a two-account topology, the Terraform and Application accounts are the same - set both account IDs to the same value during km configure. This is simpler for development while keeping the management account separate for SCP policies and DNS.
The operator authenticates via AWS SSO with named profiles:
# Authenticate to all three accounts
aws sso login --profile klanker-management # DNS, domain
aws sso login --profile klanker-terraform # State, provisioning
aws sso login --profile klanker-application # VPC/network initThe km CLI selects the right AWS profile per command. Commands are grouped by workflow stage:
Setup (once)
| Command | AWS Profile | What it does |
|---|---|---|
km configure |
- | Set domain, account IDs, SSO URL, region, --max-sandboxes |
km configure github |
klanker-terraform |
Configure GitHub App token integration |
km bootstrap |
klanker-terraform |
Deploy SCP containment policy + KMS key + artifacts bucket |
km init |
klanker-application |
Build Lambdas/sidecars, provision shared VPC/network |
km doctor |
klanker-terraform |
Validate platform health across all accounts (20 checks) |
km info |
- | Show platform config, accounts, SES quota, AWS spend, DynamoDB tables |
Sandbox lifecycle
| Command | AWS Profile | What it does |
|---|---|---|
km validate <profile> |
- | Check a profile YAML against the schema |
km create <profile> |
klanker-terraform |
Provision a sandbox from a profile (--no-bedrock, --docker, --alias, --on-demand, --ttl, --idle) |
km clone <sandbox> |
klanker-terraform |
Duplicate a running sandbox (--alias, --count, --no-copy) |
km list (alias: ls) |
klanker-terraform |
List sandboxes with live status (DynamoDB scan; --wide, --json, --tags) |
km status <sandbox> |
klanker-terraform |
Budget, identity, idle countdown, resources |
km shell <sandbox> (alias: sh) |
klanker-terraform |
SSM session (--root, --ports, --no-bedrock, --learn) |
km agent <sandbox> |
klanker-terraform |
Launch AI agent in sandbox (--claude, --codex) |
Observability
| Command | AWS Profile | What it does |
|---|---|---|
km logs <sandbox> |
klanker-terraform |
Tail CloudWatch audit logs |
km otel <sandbox> |
klanker-terraform |
AI spend summary by provider + OTEL S3 data |
km otel --prompts |
klanker-terraform |
User prompts with timestamps |
km otel --timeline |
klanker-terraform |
Conversation turns with per-turn cost |
km otel --events |
klanker-terraform |
Full event stream (API calls, tool calls) |
km otel --tools |
klanker-terraform |
Tool call history with parameters and duration |
Budget and lifecycle management
| Command | AWS Profile | What it does |
|---|---|---|
km budget add <sandbox> |
klanker-terraform |
Top up compute or AI budget |
km extend <sandbox> <dur> |
klanker-terraform |
Add time before TTL expires |
km pause <sandbox> |
klanker-terraform |
Pause (hibernate) instance, preserve infrastructure |
km stop <sandbox> |
klanker-terraform |
Stop instance, preserve infrastructure |
km lock <sandbox> |
klanker-terraform |
Lock sandbox to prevent accidental destroy/stop/pause |
km unlock <sandbox> |
klanker-terraform |
Unlock sandbox, re-enable lifecycle commands |
km rsync save/load <sandbox> |
klanker-terraform |
Save/restore sandbox home directory snapshots |
km resume <sandbox> |
klanker-terraform |
Resume a paused or stopped sandbox |
km destroy <sandbox> |
klanker-terraform |
Teardown sandbox (--remote by default; forced local for Docker substrate) |
km kill <sandbox> |
klanker-terraform |
Alias for km destroy |
km at '<time>' <cmd> |
klanker-terraform |
Schedule a deferred/recurring operation (create, destroy, pause, resume, budget-add, etc.) |
km at list |
klanker-terraform |
List scheduled operations |
km at cancel <name> |
klanker-terraform |
Cancel a scheduled operation |
km roll |
klanker-terraform |
Rotate platform and sandbox credentials (--platform, --sandbox, --dry-run) |
Email (operator-side)
| Command | AWS Profile | What it does |
|---|---|---|
km email send |
klanker-terraform |
Send signed email between sandboxes or to/from operator (--cc, --use-bcc, --reply-to) |
km email read <sandbox> |
klanker-terraform |
Read a sandbox mailbox with signature verification and auto-decryption (--json, --mark-read) |
Teardown (reverse of setup)
| Command | AWS Profile | What it does |
|---|---|---|
km uninit |
klanker-application |
Destroy all shared regional infrastructure |
Klanker Maker is forkable. All platform-specific values - domain, account IDs, SSO URL, region preferences - are configurable via km configure:
km configure
Domain: mysandboxes.example.com
Management account ID: 111111111111
Terraform account ID: 222222222222
Application account ID: 333333333333
SSO start URL: https://myorg.awsapps.com/start
Primary region: us-east-1No hardcoded account IDs. No hardcoded domains. A fork with a different domain works end-to-end after km configure.
Klanker Maker uses explicit allowlists everywhere - if it's not in the policy, it's denied. There is no "default allow."
Sandboxes are accessed exclusively through AWS SSM Session Manager:
- Zero open inbound ports - Security Groups have no SSH ingress rules. Port 22 doesn't exist.
- No SSH keys to manage - no generation, rotation, distribution, or leaked keys on GitHub.
- IAM-gated access - who can connect is controlled by IAM policy, not by who has a
.pemfile. - Full session audit - every session and every command is logged to CloudTrail and CloudWatch. There is no "off the record."
- No bastion hosts - no jump boxes, no VPN. SSM connects through the agent, even in private subnets with no internet access.
Even if a sandbox IAM role is misconfigured - or an agent finds a way to escalate within the application account - the Service Control Policy (SCP) acts as an org-level backstop that cannot be bypassed from within the account. SCPs are enforced by AWS Organizations at the API layer, before IAM policy evaluation.
The km-sandbox-containment SCP is deployed to the management account and attached to the application account. It contains 6 deny statements:
| Statement | What It Blocks | Why It Matters |
|---|---|---|
| DenyInfraAndStorage | SG mutation, VPC/subnet/route/IGW/NAT creation, VPC peering, Transit Gateway, snapshot/image creation and export | A compromised sandbox cannot open new network paths, create escape routes to the internet, peer with other VPCs, or exfiltrate data via EBS snapshots or AMI copies |
| DenyInstanceMutation | RunInstances, ModifyInstanceAttribute, ModifyInstanceMetadataOptions |
Prevents launching rogue EC2 instances or disabling IMDSv2 (which would enable SSRF credential theft via the metadata service) |
| DenyIAMEscalation | CreateRole, AttachRolePolicy, DetachRolePolicy, PassRole, AssumeRole |
Blocks the classic IAM privilege escalation chain: create a new admin role → attach AdministratorAccess → assume it |
| DenySSMPivot | SendCommand, StartSession |
Prevents a compromised sandbox from using SSM to pivot laterally into other sandbox instances |
| DenyOrgDiscovery | organizations:List*, organizations:Describe* |
Prevents enumeration of the org structure, other accounts, and OUs - information useful for targeting lateral movement |
| DenyOutsideRegion | All regional actions outside allowed regions | Region-locks the entire account to prevent resource creation in regions where there's no monitoring or VPC infrastructure |
Each statement uses ArnNotLike conditions to carve out trusted operator roles (SSO, provisioner, lifecycle handlers). The carve-outs are minimal - for example, the budget enforcer Lambda only gets an IAM carve-out (it needs AttachRolePolicy/DetachRolePolicy to revoke Bedrock access), not a network or instance carve-out.
The SCP is deployed via km bootstrap --dry-run=false. Run km bootstrap --show-prereqs to see the exact IAM role and trust policy that must be created in the management account first.
| Layer | Control | Enforcement |
|---|---|---|
| Organization | SCP sandbox containment | Org-level deny on SG/network/IAM/instance/SSM/region - cannot be bypassed from within the account |
| Account | Three-account isolation | Sandbox blast radius limited to Application account; state and DNS unreachable |
| Network | VPC Security Groups | Primary boundary - blocks all egress except proxy paths |
| DNS | DNS proxy sidecar / eBPF resolver | Allowlisted suffixes only; non-matching → NXDOMAIN |
| HTTP | HTTP proxy sidecar / eBPF connect4 | Allowlisted hosts only; non-matching → 403 / EPERM |
| eBPF | Cgroup BPF programs (connect4, sendmsg4, sockops, egress) | Kernel-level enforcement; LPM trie allowlist; ring buffer audit; no root bypass |
| Identity | Scoped IAM sessions | Region-locked, time-limited, minimal permissions |
| Ed25519 signed email | Per-sandbox key pairs; profile-controlled signing, verification, and encryption policies | |
| Secrets | SSM Parameter Store + KMS | Allowlisted refs only; per-sandbox encryption key with auto-rotation |
| Metadata | IMDSv2 enforced | Token-required; blocks SSRF credential theft via instance metadata |
| Source | GitHub App scoped tokens | Per-repo, per-ref, per-permission; short-lived installation tokens refreshed via Lambda |
| Filesystem | Path-level enforcement | Writable vs read-only directories at OS level |
| Audit | Command + network logging | Secret-redacted; delivered to CloudWatch/S3 |
| TLS Observability | eBPF SSL uprobes (OpenSSL, Go, BoringSSL) | Passive plaintext capture without MITM certs; independent audit trail |
| Telemetry | OTEL observability | Claude Code prompts, tool calls, API requests, cost metrics → OTel Collector → S3 |
| Budget | Compute + AI spend tracking | DynamoDB real-time metering; proxy 403 + IAM revocation at ceiling |
When spec.network.enforcement is set to "ebpf" or "both", the sandbox uses Cilium-style cgroup BPF programs instead of (or alongside) iptables DNAT. This is the same approach used by Cilium in Kubernetes - attaching BPF programs to a cgroup to intercept all network syscalls from processes in that group. E2E verified across 14+ iterations on AL2023 kernel 6.18.
Four BPF programs, one cgroup:
Sandbox Cgroup (/sys/fs/cgroup/km.slice/km-{id}.scope)
│
├── cgroup/connect4 — TCP connect() hook
│ ├── Dual-PID exemption (enforcer + proxy sidecar)
│ ├── LPM trie lookup: is dest IP in allowed_cidrs?
│ ├── If denied → return EPERM (connection refused)
│ ├── If allowed + proxy-marked → stash original dest, rewrite to 127.0.0.1:3128
│ └── Emit structured audit event to ring buffer
│
├── cgroup/sendmsg4 — UDP sendmsg() hook
│ ├── Intercept DNS (port 53)
│ └── Redirect to local resolver (127.0.0.1:53)
│
├── sockops — TCP state transitions
│ └── Map source_port → socket_cookie (transparent proxy recovers real dest)
│
└── cgroup_skb/egress — Packet-level backstop
├── Parse IPv4 header, check allowed_cidrs
└── Drop packets to non-allowlisted IPs (L3 defense-in-depth)
How the allowlist stays fresh: A userspace DNS resolver (127.0.0.1:53) checks every DNS query against the profile's allowedDNSSuffixes. Allowed queries are forwarded to VPC DNS; resolved IPs are injected into the BPF allowed_cidrs LPM trie map with TTL-based expiry. For L7-required hosts (GitHub, Bedrock), IPs are also inserted into http_proxy_ips for selective proxy redirect. The allowlist is dynamic — it grows as the agent resolves new hosts and shrinks as DNS TTLs expire.
Why cgroups? The BPF programs are scoped to the sandbox cgroup, not the whole instance. The enforcer process, SSM agent, and sidecars run outside the cgroup and are unaffected. This is the same isolation model that makes this approach portable to EKS pods, Docker cgroups, and other container runtimes in future substrates.
Transparent proxy (both mode): When connect4 rewrites a connection's destination to the local proxy, the sandbox app sends raw TLS (not HTTP CONNECT). A TransparentListener in the HTTP proxy peeks the first byte (0x16 = TLS ClientHello), then recovers the original destination via a three-step BPF map lookup chain: src_port_to_sock[peer_port] → sock_to_original_ip[cookie] → sock_to_original_port[cookie]. This enables L7 inspection (GitHub repo filtering, Bedrock token metering) without HTTP_PROXY environment variable cooperation from the client.
Editable diagram: docs/diagrams/ebpf-architecture.excalidraw
Alongside kernel-level enforcement, eBPF uprobes provide passive TLS plaintext capture for audit and observability — without MITM certificates. E2E verified on AL2023 with 8 probes attaching to OpenSSL 3.2.2:
| TLS Library | Used By | Uprobe Target | Status |
|---|---|---|---|
| OpenSSL (libssl.so.3) | curl, wget, Python, Ruby | SSL_write / SSL_read / SSL_write_ex / SSL_read_ex |
E2E verified (8 probes) |
| Go crypto/tls | Goose (if Go) | writeRecordLocked / Read |
Schema-ready (per-RET offsets, no uretprobe) |
| BoringSSL (Bun) | Claude Code | SSL_write |
Schema-ready (byte-pattern offset discovery) |
| rustls | Future Rust agents | rustls_connection_write_tls |
Schema-ready |
What uprobes add that MITM can't: Visibility into traffic that bypasses the proxy (if any), audit trail independent of proxy logs, plaintext capture without certificate trust issues. The observer logs structured JSON events with HTTP method, URL, host, and response status for every TLS connection. Git-smart-HTTP (clone/push) uses HTTP/1.1 and is captured correctly.
What uprobes can't replace: Active request blocking (uprobes are passive — they observe but cannot deny), HTTP/2 body parsing (GitHub API and Bedrock use HTTP/2 — uprobe captures HPACK-compressed binary, not parseable HTTP/1.1), and the transparent proxy's active enforcement (repo filtering, budget 403s).
The two eBPF layers work together:
- Phase 40 (enforcement): cgroup BPF programs decide allow/deny/redirect at the kernel level
- Phase 41 (observability): SSL uprobes capture TLS plaintext for audit logging alongside the MITM proxy
Editable source: docs/sandbox-architecture.excalidraw - open in excalidraw.com or the VS Code Excalidraw extension.
- Configure with
km configure(alias:km conf) - set your domain, account IDs, SSO URL, region (once) - Bootstrap with
km bootstrap- deploys SCP, KMS key, artifacts bucket (once) - Initialize with
km init --region us-east-1- builds Lambdas/sidecars, provisions VPC, DynamoDB (budgets, identities, sandboxes), SES, TTL handler (once per region) - Check with
km doctor- validates all 12 platform health checks, assumes cross-account role for SCP verification - Define a SandboxProfile in YAML - budget, lifecycle, network policy, identity, sidecars
- Validate with
km validate <profile.yaml> - Create with
km create <profile>- compiles to Terragrunt inputs, provisions infrastructure, shows elapsed time - Monitor with
km list(alias-first, lock icons, narrow default,--widefor all columns) /km status(budget, identity, idle countdown with color) /km logs/km otel(telemetry + spend) - Connect with
km shell 1(restricted user) orkm shell 1 --root(operator access) - Port forward with
km shell 1 --ports 8080:80,3000(Docker-style syntax) - Extend with
km extend 1 2h- add time before TTL expires - Stop with
km stop 1- stop instance, preserve infrastructure for restart - Top-up with
km budget add 1 --compute 5.00- add compute or AI budget - Destroy with
km destroy 1(Lambda-dispatched by default) orkm destroy 1 --remote=false(local terragrunt) - Teardown with
km uninit --region us-east-1- reverse of init, destroys all regional infrastructure
Profiles use a Kubernetes-style schema at klankermaker.ai/v1alpha1. Here's the goose profile - a working example that provisions a Goose agent sandbox with Bedrock access, budget enforcement, OTEL telemetry, hibernation support, EFS shared storage, and GitHub repo allowlisting:
apiVersion: klankermaker.ai/v1alpha1
kind: SandboxProfile
metadata:
name: goose
labels:
tier: development
tool: goose
prefix: gebpfgk
spec:
lifecycle:
ttl: "4h"
idleTimeout: "1h"
teardownPolicy: stop
runtime:
substrate: ec2
spot: false
instanceType: t3.medium
region: us-east-1
rootVolumeSize: 15
hibernation: true # preserve RAM state on pause (on-demand only)
mountEFS: true # mount regional EFS shared filesystem
efsMountPoint: /shared # EFS mount path (default: /shared)
additionalVolume: # extra EBS volume for data
size: 20 # GB
mountPoint: /data
execution:
shell: /bin/bash
workingDir: /workspace
useBedrock: true # route Anthropic API via AWS Bedrock (SigV4 auth)
privileged: false
env:
SANDBOX_MODE: goose-ebpf-gatekeeper
GOOSE_PROVIDER: aws_bedrock
GOOSE_MODEL: us.anthropic.claude-opus-4-6-v1
GOOSE_MODE: auto
GOOSE_TELEMETRY_ENABLED: "false"
CODEX_CA_CERTIFICATE: /usr/local/share/ca-certificates/km-proxy-ca.crt
OPENAI_API_KEY: ""
configFiles:
"/home/sandbox/.claude/settings.json": |
{"trustedDirectories":["/home/sandbox","/workspace"]}
rsyncPaths:
- ".gitconfig"
- ".config/goose"
- ".claude"
- ".claude.json"
- ".codex"
initCommands:
- "yum install -y git nodejs npm python3 python3-pip bzip2 jq tar gzip unzip tmux"
- "HOME=/root curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | HOME=/root CONFIGURE=false bash"
- "npm install -g @anthropic-ai/claude-code@2.1.108"
- "mkdir -p /workspace && chown -R sandbox:sandbox /workspace"
budget:
compute:
maxSpendUSD: 0.50
ai:
maxSpendUSD: 1.00
warningThreshold: 0.80
network:
enforcement: both # "proxy" (default), "ebpf", or "both"
egress:
allowedDNSSuffixes:
- ".amazonaws.com"
- ".anthropic.com"
- ".claude.ai"
- ".claude.com"
- ".sentry.io"
- ".cloudfront.net"
- ".github.com"
- ".githubusercontent.com"
- ".npmjs.org"
- ".npmjs.com"
- ".nodejs.org"
- ".npmmirror.com"
- ".openai.com"
- ".chatgpt.com"
- ".pypi.org"
- ".pythonhosted.org"
- ".pulsemcp.com"
- ".google.com"
- ".google-analytics.com"
- ".googletagmanager.com"
- ".googleapis.com"
- ".featuregates.org"
- ".statsig.com"
allowedHosts:
- "statsig.com"
sourceAccess:
mode: allowlist
github:
allowedRepos:
- "whereiskurt/meshtk"
- "whereiskurt/defcon.run.34"
- "whereiskurt/klanker-maker"
allowedRefs:
- "main"
- "develop"
- "feature/*"
- "fix/*"
identity:
roleSessionDuration: "1h"
allowedRegions:
- us-east-1
sessionPolicy: minimal
sidecars:
dnsProxy:
enabled: true
image: km-dns-proxy:latest
httpProxy:
enabled: true
image: km-http-proxy:latest
auditLog:
enabled: true
image: km-audit-log:latest
tracing:
enabled: true
image: km-tracing:latest
observability:
commandLog:
destination: cloudwatch
logGroup: /klankrmkr/sandboxes
networkLog:
destination: cloudwatch
logGroup: /klankrmkr/network
claudeTelemetry:
enabled: true
logPrompts: true
logToolDetails: true
learnMode: false
tlsCapture: # eBPF SSL uprobe plaintext capture (Phase 41)
enabled: true
libraries: [openssl]
capturePayloads: false
artifacts:
paths:
- /workspace
maxSizeMB: 500
email:
signing: required
verifyInbound: required
encryption: required
cli:
noBedrock: true| Profile | TTL | Network | Budget | Use Case |
|---|---|---|---|---|
hardened |
4h | eBPF+proxy (both), AWS services only | No budget section | Production-adjacent testing |
sealed |
1h | Proxy, .anthropic.com + .npmjs.org only | $5 compute / $10 AI | Minimal egress, short-lived execution |
goose |
4h | eBPF+proxy (both), Anthropic, GitHub, npm, PyPI, OpenAI, Goose extensions | $0.50 compute / $1 AI | Goose agent (Block) with Bedrock, MCP extensions |
codex |
4h | Proxy, OpenAI, GitHub | $2 compute / $5 AI | OpenAI Codex agent |
ao |
8h | eBPF+proxy (both), Anthropic, GitHub, npm, OpenAI | $4 compute / $10 AI | Multi-agent orchestration (Claude + Codex + AO) |
learn |
2h | eBPF+proxy (both), wide-open TLD suffixes | $2 compute / $0 AI | Traffic observation for profile generation |
Profiles support inheritance via extends - start from a base and override what you need.
Run Claude (or any agent) non-interactively inside a sandbox via km agent run. Prompts are dispatched via SSM SendCommand, agents run in persistent tmux sessions that survive disconnects, and output is stored on disk + S3 for fast retrieval.
# Fire-and-forget — agent runs in tmux, returns immediately
km agent run sb-abc123 --prompt "fix the failing tests"
# Wait for completion — blocks until done, prints JSON result
km agent run sb-abc123 --prompt "What model are you?" --wait
# Interactive — attach to tmux, watch Claude work live (Ctrl-B d to detach)
km agent run sb-abc123 --prompt "refactor auth module" --interactive
# Attach to a running agent's tmux session
km agent attach sb-abc123
# Fetch results (S3 fast path, ~3s)
km agent results sb-abc123
km agent results sb-abc123 | jq '.result'
# List all runs with status
km agent list sb-abc123
# Schedule a future agent run (resumes sandbox if paused)
km at '5pm tomorrow' agent run sb-abc123 --prompt "nightly tests" --auto-start
# Use direct Anthropic API instead of Bedrock
km agent run sb-abc123 --prompt "..." --no-bedrock --waitProfile defaults: Set spec.cli.noBedrock: true to default to direct API. Use spec.execution.configFiles to pre-seed Claude settings (trusted directories, etc.).
Klanker Maker is workload-agnostic - any agent that runs on Linux works inside a sandbox. Here's how the controls map to real agent workloads:
| Agent | What It Does | Which Controls Matter |
|---|---|---|
| Goose | Installs deps, edits files, runs tests, orchestrates workflows | Budget cap - prevents runaway AI API costs when Goose loops |
| Aider | AI pair programming with auto git commits | Source access - controls which repos it can push to |
| agent-orchestrator | Spawns parallel coding agents, handles CI fixes autonomously | Budget + TTL - caps fleet cost; each spawned worker inherits the sandbox ceiling |
| deepagents | Planning + filesystem + sub-agent spawning via LangGraph | Network allowlist - limits where sub-agents can reach |
| open-swe | Async coding agent that clones, patches, and PRs | Source access - allowlist repos + refs; block push to protected branches |
| redamon | Automated red team: recon → exploitation → post-exploitation | Sealed profile - air-gapped, no egress, full audit trail |
| raptor | Claude Code as an offensive security agent | Hardened profile - minimal egress, short TTL, every command logged |
| autoresearch | Agents running research on GPU training | Compute budget - prevents a runaway training loop from burning hours of GPU |
| nanoclaw | Anthropic Agent SDK agent connected to messaging apps | HTTP proxy - controls which external APIs the agent can call |
| gobii-platform | Always-on AI workforce | Idle timeout - shuts down workers that stop producing; artifact upload preserves state |
Sandboxes communicate through digitally signed email (SES + Ed25519). Each sandbox gets a unique address derived from its ID (e.g., sb-a1b2c3d4@sandboxes.klankermaker.ai) and an Ed25519 key pair at creation time.
- Signing - outbound emails are signed with the sender's Ed25519 private key (stored in SSM, KMS-encrypted). The signature and sender ID are attached as
X-KM-SignatureandX-KM-Sender-IDheaders. - Verification - the receiver fetches the sender's public key from the
km-identitiesDynamoDB table and verifies the signature. WhenverifyInbound: required, unsigned or invalid emails are rejected. - Encryption - optional X25519 key exchange (NaCl box). When
encryption: required, the sender encrypts the body with the recipient's public key. Whenencryption: optional, it encrypts if the recipient has a published key, plaintext otherwise.
Profile controls (spec.email.signing, spec.email.verifyInbound, spec.email.encryption) govern policy per sandbox. Hardened and sealed profiles default to signing: required; goose defaults to signing: required.
This enables multi-agent pipelines where each worker is physically isolated but logically connected - with cryptographic proof of sender identity and optional confidentiality.
| Substrate | How It Works | Cost |
|---|---|---|
| EC2 Spot (default) | Shared regional VPC, per-sandbox SG, spot instance, SSM access, sidecar systemd services | ~$0.01/hr for t3.medium |
| EC2 On-Demand | Same as above, guaranteed capacity | ~$0.04/hr for t3.medium |
| ECS Fargate Spot | Fargate task with sidecar containers, service discovery | ~$0.01/hr for 1 vCPU / 2GB |
| ECS Fargate | Same as above, guaranteed capacity | ~$0.04/hr for 1 vCPU / 2GB |
| Docker (local) | Docker Compose on local machine, sidecar containers, IAM roles via STS | Free (local compute) |
Spot interruption handlers automatically upload artifacts to S3 before instances are reclaimed.
Budget enforcement tracks two spend pools per sandbox, stored in a DynamoDB global table replicated to every region where agents run. Reads from within the sandbox hit the local regional replica with sub-millisecond latency.
Tracked as spot rate x elapsed minutes, sourced from the AWS Price List API at sandbox creation. When the compute budget is exhausted, the sandbox is suspended - not destroyed:
- EC2:
StopInstancespreserves the EBS volume. No compute charges accrue while stopped. - ECS Fargate: Artifacts are uploaded, then the task is stopped. Re-provision from the stored S3 profile on top-up.
The HTTP proxy sidecar intercepts every AI API response - Bedrock (invoke-with-response-stream), Anthropic direct (api.anthropic.com, for Claude Code Max/API key users), and OpenAI-compatible endpoints. A tee-reader streams data through to the client without blocking, captures the full response, then extracts token counts asynchronously:
- Bedrock streaming: base64-decodes
{"bytes":"<b64>"}event-stream wrappers to findmessage_start/message_deltapayloads - Anthropic SSE: parses
data:lines for the same event types - Non-streaming: reads
usagefrom the JSON response body
Tokens are priced against static model rates and atomically incremented in the DynamoDB spend counter.
Dual-layer enforcement at 100%:
- Proxy layer (immediate) - HTTP proxy returns 403 for subsequent AI calls
- IAM layer (backstop) - a Lambda revokes the sandbox IAM role's Bedrock permissions, catching calls that bypass the proxy
km status shows per-model AI spend grouped by provider:
$ km status goose-e6c7d024
Sandbox ID: goose-e6c7d024
Profile: goose
...
Budget:
Compute: $0.0312 / $0.5000 (6.2%)
AI: $0.4200 / $1.0000 (42.0%)
anthropic.claude-sonnet-4-6: $0.85 (89K in / 34K out) # Bedrock
claude-opus-4-6: $0.55 (12K in / 8K out) # Max/API
Claude Code running inside sandboxes exports OpenTelemetry telemetry (prompts, tool calls, API requests, token usage, cost metrics) through an OTel Collector sidecar to S3. Profile-controlled via spec.observability.claudeTelemetry:
observability:
claudeTelemetry:
enabled: true # master switch
logPrompts: true # include actual prompt text
logToolDetails: true # include tool parameters (bash commands, file paths)km otel provides five views into this data:
$ km otel claude-e6c7d024 # summary: budget + S3 + metrics
$ km otel claude-e6c7d024 --prompts # user prompts with timestamps
$ km otel claude-e6c7d024 --events # full event stream
$ km otel claude-e6c7d024 --tools # tool calls with params + duration
$ km otel claude-e6c7d024 --timeline # conversation turns with per-turn cost
At 80% (configurable via spec.budget.warningThreshold) of either pool, the operator receives an email via SES.
$ km budget add claude-e6c7d024 --ai 3.00
AI budget: $5.00 → $8.00
Proxy: unblocked
IAM: restored
Status: running
Top-up unblocks the proxy, restores IAM permissions, and restarts suspended compute - all in one command.
km CLI / ConfigUI
├── cmd/km/ CLI entry point
├── cmd/configui/ Web dashboard (Go + embedded HTML)
├── cmd/ttl-handler/ Lambda: TTL expiry + artifact upload
├── cmd/budget-enforcer/ Lambda: budget ceiling enforcement
├── cmd/create-handler/ Lambda: remote sandbox creation via EventBridge
├── cmd/email-create-handler/ Lambda: email-driven sandbox creation
├── cmd/github-token-refresher/ Lambda: GitHub App installation token refresh
├── internal/app/cmd/ Cobra commands (configure, bootstrap, init, uninit, validate, create, clone, destroy/kill, pause, resume, lock, unlock, stop, extend, roll, at/schedule, list, status, logs, budget, shell, agent, doctor, otel, info, rsync, email)
├── internal/app/config/ Configuration (config.yaml, env vars, CLI flags)
├── pkg/
│ ├── profile/ SandboxProfile schema, validation, inheritance
│ ├── compiler/ Profile → Terragrunt artifacts (EC2 + ECS paths)
│ ├── ebpf/ eBPF enforcer (cgroup BPF programs, DNS resolver, audit consumer, SSL uprobes)
│ ├── aws/ SDK helpers (S3, SES, CloudWatch, EC2 metadata, DynamoDB, EventBridge Scheduler, identity/signing)
│ ├── terragrunt/ Runner + per-sandbox state isolation
│ ├── lifecycle/ TTL scheduling, idle detection, teardown
│ ├── allowlistgen/ Allowlist generation from observed traffic
│ ├── at/ Deferred/recurring operation scheduling
│ ├── github/ GitHub App token management
│ ├── localnumber/ Persistent local sandbox numbering
│ └── version/ Build version info
├── sidecars/
│ ├── dns-proxy/ DNS allowlist filter (UDP/TCP:53)
│ ├── http-proxy/ HTTP allowlist filter (TCP:3128) + AI token metering (Bedrock, Anthropic, OpenAI)
│ ├── audit-log/ Command + network log router with secret redaction
│ └── tracing/ OTel Collector sidecar (logs, metrics → S3)
├── profiles/ Built-in YAML profiles (sealed, hardened, goose, codex, ao, learn)
└── infra/
├── modules/ Terraform modules
│ ├── network/ VPC, subnets, security groups
│ ├── ec2spot/ Spot + on-demand instances, IMDSv2, IAM
│ ├── ecs-cluster/ ECS cluster, Fargate Spot capacity provider
│ ├── ecs-task/ Task definitions with sidecar containers
│ ├── ecs-service/ Service deployment + service discovery
│ ├── ecs-spot-handler/ Lambda: Fargate Spot interruption → artifact upload
│ ├── efs/ Regional EFS shared filesystem for cross-sandbox data
│ ├── secrets/ SSM Parameter Store + KMS encryption
│ ├── ses/ SES domain, DKIM, inbound email → S3
│ ├── scp/ SCP sandbox containment (deployed to management account)
│ ├── dynamodb-budget/ Budget enforcement table
│ ├── dynamodb-identities/ Sandbox identity public key table
│ ├── dynamodb-sandboxes/ Sandbox metadata table (km-sandboxes)
│ ├── dynamodb-schedules/ Scheduled operations table
│ ├── budget-enforcer/ Lambda: budget ceiling enforcement
│ ├── create-handler/ Lambda: remote sandbox creation
│ ├── email-handler/ Lambda: email-driven operations
│ ├── github-token/ Lambda: GitHub App token refresh
│ ├── s3-replication/ Cross-region artifact replication
│ └── ttl-handler/ Lambda: TTL expiry → artifacts + email + self-cleanup
└── live/ Terragrunt hierarchy (site.hcl, per-sandbox isolation)
# Install
go install github.com/whereiskurt/klankrmkr/cmd/km@latest
# Configure your platform (once)
km configure # or: km conf
# See what's needed in the management account before bootstrap
km bootstrap --show-prereqs
# Bootstrap SCP + KMS + artifacts bucket (once)
km bootstrap --dry-run=false
# Initialize the region - builds Lambdas, sidecars, deploys infra (once per region)
km init --region us-east-1
# Check platform health (20 checks)
km doctor
# Create a sandbox (shows progress dots + elapsed time)
km create profiles/goose.yaml
km create --on-demand profiles/sealed.yaml # skip spot, use on-demand
km create profiles/goose.yaml --no-bedrock # disable Bedrock, use direct API keys
km create profiles/goose.yaml --docker # shortcut for --substrate=docker
km create profiles/goose.yaml --alias mybot # override the sandbox alias
# List sandboxes (narrow default — alias first, live status)
km list
km list --wide # show profile, substrate, region columns
# Status with budget, identity, idle countdown
km status 1
# Connect as restricted user (no sudo)
km shell 1
km shell 1 --root # operator access
# Port forward (Docker-style)
km shell 1 --ports 8080 # localhost:8080 → remote:8080
km shell 1 --ports 8080:80,3000 # multiple ports
# Launch an AI agent inside a sandbox
km agent 1 --claude # interactive Claude Code
km agent run 1 --prompt "fix tests" # headless with prompt
km agent run 1 --prompt "fix tests" --wait # wait for completion
# Extend TTL
km extend 1 2h
# Pause (hibernate) — preserves RAM state
km pause 1
# Resume a paused or stopped sandbox
km resume 1
# Lock to prevent accidental destroy/stop/pause
km lock 1
km unlock 1 --yes
# Stop without destroying
km stop 1
# View audit logs
km logs 1
# OTEL telemetry + AI spend
km otel 1 # summary
km otel 1 --timeline # conversation turns with cost
km otel 1 --prompts # user prompts
km otel 1 --tools # tool call history
# Destroy (remote by default, or local)
km destroy 1 # Lambda-dispatched (default)
km destroy 1 --yes # skip confirmation prompt
km destroy 1 --remote=false # local terragrunt destroy
# km kill is an alias for km destroy
km kill 1 --yes
# Schedule a deferred or recurring operation
km at '10pm tomorrow' create profiles/goose.yaml # one-shot
km at 'every thursday at 3pm' kill 1 # recurring
km at list # list scheduled ops
km at cancel my-schedule-name # cancel one
# km schedule is an alias for km at
# Teardown region infrastructure
km uninit --region us-east-1| Document | Description |
|---|---|
| User Manual | Full command reference, walkthroughs (Claude Code, Goose, security agents), profile authoring |
| Operator Guide | AWS account setup, KMS, S3, SES, Lambda deployment - everything before km init |
| Profile Reference | Complete YAML schema with every field, type, default, and validation rule |
| Security Model | Deep dive on each security layer, from VPC to IMDSv2 to secret redaction |
| Budget Guide | DynamoDB schema, proxy metering, enforcement flow, threshold configuration |
| Docker Substrate | Running sandboxes locally via Docker Compose (km create --docker) |
| Sidecar Reference | Each sidecar's config, env vars, log formats, EC2 vs ECS deployment |
| Multi-Agent Email | SES setup, sandbox addressing, cross-sandbox orchestration patterns |
| ConfigUI Guide | Web dashboard setup, profile editor, secrets management |
| Phase | Description | Status |
|---|---|---|
| 1 | Schema, Compiler & AWS Foundation | Complete |
| 2 | Core Provisioning & Security Baseline | Complete |
| 3 | Sidecar Enforcement & Lifecycle Management | Complete |
| 4 | Lifecycle Hardening, Artifacts & Email | Complete |
| 5 | ConfigUI Web Dashboard | Complete |
| 6 | Budget Enforcement & Platform Configuration | Complete |
| 7 | Unwired Code Paths | Complete |
| 8 | Sidecar Build & Deployment Pipeline | Complete |
| 9 | Live Infrastructure & Operator Docs | Complete |
| 10 | SCP Sandbox Containment | Complete |
| 11 | Sandbox Auto-Destroy & Metadata Wiring | Complete |
| 12 | ECS Budget Top-Up & S3 Replication | Complete |
| 13 | GitHub App Token Integration | Complete |
| 14 | Sandbox Identity & Signed Email | Complete |
| 15 | km doctor - Platform Health Check | Complete |
| 16 | Documentation Refresh (Phases 6-15) | Complete |
| 17 | Sandbox Email Mailbox & Access Control | Complete |
| 18 | Loose Ends - km init, uninit, bootstrap KMS, github-token | Complete |
| 19 | Budget Enforcement Wiring - EC2 hard stop, IAM revocation | Complete |
| 20 | Anthropic API Metering & Terragrunt Output Suppression | Complete |
| 21 | Bug fixes and mini-features - budget precision, polish | Complete |
| 22 | Remote Sandbox Dispatch - km create/destroy/stop/extend --remote via Lambda |
Complete |
| 23 | Email-Driven Operations - operator inbox, email-to-create, safe phrase auth, EventBridge | Complete |
| 24 | Documentation Refresh - docs for Phases 22-32 | Complete |
| 25 | GitHub Source Access Restrictions - repo allowlists, deny-by-default | Complete |
| 26 | Live Operations Hardening - bootstrap, init, TTL, idle, sidecars, CLI polish | Complete |
| 27 | Claude Code OTEL Integration - sandbox observability via built-in telemetry | Complete |
| 28 | OTEL Observability Hardening - timeline view, events, tools flags | Complete |
| 29 | EC2 Hibernation & MaxLifetime Enforcement | Complete |
| 30 | Sandbox Pause, Lock, Unlock & km list Enhancements | Complete |
| 31 | Transparent HTTPS & Audit Log Improvements | Complete |
| 32 | Profile-Scoped Rsync Paths & External File Lists | Complete |
| 33 | EC2 Storage Customization, Hibernation & AMI Selection | Complete |
| 34 | Agent Profiles - Agent Orchestrator, Goose, and Codex | Complete |
| 35 | MITM CA Trust for Python, Node, and Non-System SSL Libraries | Complete |
| 36 | km-sandbox Base Container Image | Complete |
| 37 | Docker Compose Local Substrate | Complete |
| 38 | EKS / Kubernetes Substrate | Planned |
| 39 | DynamoDB Metadata Migration (S3 to DynamoDB) | Complete |
| 40 | eBPF Cgroup Network Enforcement (connect4, sendmsg4, sockops, egress) | Complete |
| 41 | eBPF SSL Uprobe TLS Observability (OpenSSL, Go, BoringSSL) | Complete |
| 42 | eBPF Gatekeeper Mode — connect4 DNAT Rewrite for L7 Proxy | Complete |
| 43 | Regional EFS Shared Filesystem | Complete |
| 44 | km at / km schedule — Deferred & Recurring Operations |
Complete |
| 45 | km-send/km-recv Sandbox Scripts & km email send/read CLI | Complete |
| 46 | AI Email-to-Command — Haiku Interprets Free-Form Operator Emails | Complete |
| 47 | Privileged Execution Mode & Learn Profile | Complete |
| 48 | Profile Override Flags for km create (--ttl, --idle) | Complete |
| 49 | Prebaked AMI Support | Planned |
| 50 | km agent Non-Interactive Execution (--prompt, results, list) | Complete |
| 51 | km agent Tmux Sessions (attach, --interactive) | Complete |
| 52 | km clone — Duplicate a Running Sandbox | Complete |
| 53 | Persistent Local Sandbox Numbering | Complete |
See .planning/ROADMAP.md for detailed phase breakdowns and success criteria.
TBD