Skip to content

whereiskurt/klanker-maker

Repository files navigation

Klanker Maker (km)

Hard spending limits for AI agents on AWS - outside the runtime, at cloud scale.

AWS makes it surprisingly difficult to set hard budget limits. CloudWatch billing alarms are delayed by hours. AWS Budgets can send notifications but can't stop a running workload. There's no native way to say "this workload can spend $5 on Bedrock, Anthropic, or OpenAI and then stop." For human operators that's annoying. For autonomous AI agents that make their own API calls, loop on failures, and spawn sub-agents - it's a real problem.

Klanker Maker is an open-source platform that puts enforceable spending limits between your AI agents and your AWS bill. It turns declarative YAML profiles into budget-capped, policy-locked AWS sandboxes - each with its own Security Group boundary, IAM role, network allowlists, and a dollar ceiling that actually stops the workload when the money runs out. The enforcement lives in the infrastructure (proxy layer + IAM revocation), not in the agent runtime, so no agent can spend past its budget regardless of how it's built or what SDK it uses.

Define what an agent is allowed to do. Set how much it can spend on compute and AI tokens. Walk away.

Klanker Maker - robots working inside a sandboxed dome
Art by Mike Wigmore (@mikewigmore)

$ km create profiles/goose.yaml
  ✓ Profile validated
  ✓ Budget: compute $0.50, AI $1.00, warning at 80%
  ✓ Budget enforcer Lambda deployed
  ✓ Metadata stored (alias: goose)
──────────────────────────────────────────────────
Sandbox goose-e6c7d024 created successfully. (43s)
  TTL: 4h (expires 11:42:15 PM EDT)

$ km list
#   ALIAS       SANDBOX ID        STATUS     TTL
1   goose       goose-e6c7d024    running    3h42m

$ km status goose-e6c7d024
Sandbox ID:  goose-e6c7d024
Profile:     goose
Substrate:   ec2
Region:      us-east-1
Status:      running
Created At:  2026-03-31 7:42:15 PM EDT
TTL Expiry:  2026-03-31 11:42:15 PM EDT
Budget:
  Compute: $0.0312 / $0.5000 (6.2%)
  AI:      $0.4200 / $1.0000 (42.0%)

Why This Exists

The agent ecosystem is exploding. My GitHub stars tell the story - in the last few months alone I've starred:

Coding agents that need real compute, real network, and real credentials:

  • Goose - Block's autonomous agent that installs deps, edits files, runs tests, orchestrates workflows
  • Aider - AI pair programming in your terminal with automatic git commits
  • OpenDev - open-source coding agent in the terminal
  • open-swe - LangChain's asynchronous coding agent
  • DeepCode - agentic coding for Paper2Code, Text2Web, Text2Backend
  • deepagents - LangGraph harness with planning, filesystem, and sub-agent spawning

Multi-agent orchestrators that spawn fleets of workers:

  • agent-orchestrator - parallel coding agents with autonomous CI fixes and code reviews
  • nanoclaw - lightweight agent on Anthropic's Agent SDK, runs in containers
  • openclaw - personal AI assistant across platforms
  • pi-mono - coding agent CLI, unified LLM API, Slack bot, vLLM pods
  • gobii-platform - always-on AI workforce
  • autoresearch - Karpathy's agents running research on single-GPU training automatically

Security and red-team agents that definitely need containment:

  • redamon - AI-powered red team framework, recon to exploitation, zero human intervention
  • raptor - turns Claude Code into an offensive/defensive security agent
  • hexstrike-ai - 150+ cybersecurity tools orchestrated by AI agents
  • strix - open-source AI hackers that find and fix vulnerabilities
  • shannon - autonomous white-box AI pentester for web apps and APIs
  • EVA - AI-assisted penetration testing agent

Sandbox platforms solving the same problem from different angles:

  • agent-sandbox - Kubernetes SIG for isolated agent runtimes
  • E2B - secure cloud environments for enterprise agents
  • OpenSandbox - Alibaba's sandbox platform with Docker/K8s runtimes
  • void-box - composable agent runtime with enforced isolation
  • monty - Pydantic's minimal, secure Python interpreter in Rust for AI

Every one of these projects needs somewhere safe to run. The common pattern is either "trust the agent" (bad) or "containerize it locally" (insufficient - no real cloud resources, no real credentials, no real network). What's missing is cloud-native physical isolation - a real VPC, real IAM boundaries, real network controls, with a budget ceiling that prevents a $10 experiment from becoming a $10,000 AWS bill.

That's what Klanker Maker builds. The sandbox is a compiled policy object - you declare the constraints and the infrastructure is the compiled artifact:

  • Budget ceiling - set a dollar cap for compute and AI API spend per sandbox. At 80% you get a warning email. At 100% the sandbox is suspended, not destroyed - top it up and resume.
  • Network enforcement (three modes) - choose your enforcement layer per profile:
    • Proxy mode (default) - iptables DNAT redirects traffic to userspace proxy sidecars for MITM inspection. Traditional approach, works everywhere.
    • eBPF mode - Cilium-style cgroup BPF programs enforce DNS/HTTP/TLS-SNI allowlists directly in the kernel. No iptables, no DNAT bypass possible (fixes the root-user escape). Four BPF programs (connect4, sendmsg4, sockops, egress) with an LPM trie for CIDR allowlists, a userspace DNS resolver that populates BPF maps on the fly, and a ring buffer for structured audit events. E2E verified on AL2023 kernel 6.18 across 14 iterations.
    • Both mode (gatekeeper) - eBPF connect4 as the primary enforcer in block mode, with selective DNAT rewrite to a transparent proxy for L7-required hosts. The proxy recovers original destinations via pinned BPF maps (src_port_to_socksock_to_original_ip). Non-L7 traffic flows direct — never touches the proxy. E2E verified: allowed repos clone, blocked repos get 403, evil.com gets EPERM, non-proxy hosts go direct.
  • Scoped identity - each sandbox gets its own IAM role session, region-locked, time-limited, with only the permissions the profile declares.
  • Automatic lifecycle - TTL auto-destroy, idle timeout, artifact upload on exit (including on spot interruption), and email notifications for every lifecycle event.
  • Spot-first economics - EC2 Spot and Fargate Spot by default. A t3.medium spot instance in us-east-1 costs ~$0.01/hr. Run 10 agent sandboxes for a full workday for under $1 in compute.

The difference between Klanker Maker and the other sandbox platforms: this is pure AWS infrastructure - no orchestration layer to trust, no shared runtime, no container escape surface. Each region has a shared VPC (provisioned once by km init), and every sandbox gets its own Security Groups, IAM role, and sidecar enforcement. The isolation is at the network policy and identity layer, backed by real AWS primitives.

# Profile-level enforcement selection
spec:
  network:
    enforcement: "ebpf"   # or "proxy" (default) or "both"
    egress:
      allowedDNSSuffixes: [".amazonaws.com", ".github.com", ...]
      allowedHosts: ["api.anthropic.com", ...]

AWS Account Architecture

Klanker Maker follows AWS Organizations best practices, supporting either a three-account or two-account topology. In both models, sandboxes run in a dedicated application account - completely separated from the account that owns the domain and applies SCP policies.

Security & Network Architecture - 3 accounts, shared VPC, per-sandbox Security Groups

Why Three Accounts?

Account Role What Lives Here Why Separate
Management DNS, identity, org root Route53 hosted zone, domain registration, AWS SSO, Organizations root Domain and identity are org-wide - they don't belong in a sandbox blast radius
Terraform State and provisioning S3 state buckets, DynamoDB lock tables, cross-account provisioning role Terraform state contains every resource ARN and secret path - isolating it limits exposure if the application account is compromised
Application Sandbox execution Regional VPCs, EC2/ECS instances, IAM sandbox roles, SES, Lambda handlers, DynamoDB budget table, S3 artifacts, CloudWatch Logs This is where agents run - if an agent escapes its sandbox, it can only reach resources in this account, not state or DNS

In a two-account topology, the Terraform and Application accounts are the same - set both account IDs to the same value during km configure. This is simpler for development while keeping the management account separate for SCP policies and DNS.

The operator authenticates via AWS SSO with named profiles:

# Authenticate to all three accounts
aws sso login --profile klanker-management     # DNS, domain
aws sso login --profile klanker-terraform      # State, provisioning
aws sso login --profile klanker-application    # VPC/network init

The km CLI selects the right AWS profile per command. Commands are grouped by workflow stage:

Setup (once)

Command AWS Profile What it does
km configure - Set domain, account IDs, SSO URL, region, --max-sandboxes
km configure github klanker-terraform Configure GitHub App token integration
km bootstrap klanker-terraform Deploy SCP containment policy + KMS key + artifacts bucket
km init klanker-application Build Lambdas/sidecars, provision shared VPC/network
km doctor klanker-terraform Validate platform health across all accounts (20 checks)
km info - Show platform config, accounts, SES quota, AWS spend, DynamoDB tables

Sandbox lifecycle

Command AWS Profile What it does
km validate <profile> - Check a profile YAML against the schema
km create <profile> klanker-terraform Provision a sandbox from a profile (--no-bedrock, --docker, --alias, --on-demand, --ttl, --idle)
km clone <sandbox> klanker-terraform Duplicate a running sandbox (--alias, --count, --no-copy)
km list (alias: ls) klanker-terraform List sandboxes with live status (DynamoDB scan; --wide, --json, --tags)
km status <sandbox> klanker-terraform Budget, identity, idle countdown, resources
km shell <sandbox> (alias: sh) klanker-terraform SSM session (--root, --ports, --no-bedrock, --learn)
km agent <sandbox> klanker-terraform Launch AI agent in sandbox (--claude, --codex)

Observability

Command AWS Profile What it does
km logs <sandbox> klanker-terraform Tail CloudWatch audit logs
km otel <sandbox> klanker-terraform AI spend summary by provider + OTEL S3 data
km otel --prompts klanker-terraform User prompts with timestamps
km otel --timeline klanker-terraform Conversation turns with per-turn cost
km otel --events klanker-terraform Full event stream (API calls, tool calls)
km otel --tools klanker-terraform Tool call history with parameters and duration

Budget and lifecycle management

Command AWS Profile What it does
km budget add <sandbox> klanker-terraform Top up compute or AI budget
km extend <sandbox> <dur> klanker-terraform Add time before TTL expires
km pause <sandbox> klanker-terraform Pause (hibernate) instance, preserve infrastructure
km stop <sandbox> klanker-terraform Stop instance, preserve infrastructure
km lock <sandbox> klanker-terraform Lock sandbox to prevent accidental destroy/stop/pause
km unlock <sandbox> klanker-terraform Unlock sandbox, re-enable lifecycle commands
km rsync save/load <sandbox> klanker-terraform Save/restore sandbox home directory snapshots
km resume <sandbox> klanker-terraform Resume a paused or stopped sandbox
km destroy <sandbox> klanker-terraform Teardown sandbox (--remote by default; forced local for Docker substrate)
km kill <sandbox> klanker-terraform Alias for km destroy
km at '<time>' <cmd> klanker-terraform Schedule a deferred/recurring operation (create, destroy, pause, resume, budget-add, etc.)
km at list klanker-terraform List scheduled operations
km at cancel <name> klanker-terraform Cancel a scheduled operation
km roll klanker-terraform Rotate platform and sandbox credentials (--platform, --sandbox, --dry-run)

Email (operator-side)

Command AWS Profile What it does
km email send klanker-terraform Send signed email between sandboxes or to/from operator (--cc, --use-bcc, --reply-to)
km email read <sandbox> klanker-terraform Read a sandbox mailbox with signature verification and auto-decryption (--json, --mark-read)

Teardown (reverse of setup)

Command AWS Profile What it does
km uninit klanker-application Destroy all shared regional infrastructure

Platform Configuration

Klanker Maker is forkable. All platform-specific values - domain, account IDs, SSO URL, region preferences - are configurable via km configure:

km configure
  Domain:                 mysandboxes.example.com
  Management account ID:  111111111111
  Terraform account ID:   222222222222
  Application account ID: 333333333333
  SSO start URL:          https://myorg.awsapps.com/start
  Primary region:         us-east-1

No hardcoded account IDs. No hardcoded domains. A fork with a different domain works end-to-end after km configure.

Security Model

Klanker Maker uses explicit allowlists everywhere - if it's not in the policy, it's denied. There is no "default allow."

No SSH. No Bastion. No Keys.

Sandboxes are accessed exclusively through AWS SSM Session Manager:

  • Zero open inbound ports - Security Groups have no SSH ingress rules. Port 22 doesn't exist.
  • No SSH keys to manage - no generation, rotation, distribution, or leaked keys on GitHub.
  • IAM-gated access - who can connect is controlled by IAM policy, not by who has a .pem file.
  • Full session audit - every session and every command is logged to CloudTrail and CloudWatch. There is no "off the record."
  • No bastion hosts - no jump boxes, no VPN. SSM connects through the agent, even in private subnets with no internet access.

SCP Sandbox Containment

Even if a sandbox IAM role is misconfigured - or an agent finds a way to escalate within the application account - the Service Control Policy (SCP) acts as an org-level backstop that cannot be bypassed from within the account. SCPs are enforced by AWS Organizations at the API layer, before IAM policy evaluation.

The km-sandbox-containment SCP is deployed to the management account and attached to the application account. It contains 6 deny statements:

Statement What It Blocks Why It Matters
DenyInfraAndStorage SG mutation, VPC/subnet/route/IGW/NAT creation, VPC peering, Transit Gateway, snapshot/image creation and export A compromised sandbox cannot open new network paths, create escape routes to the internet, peer with other VPCs, or exfiltrate data via EBS snapshots or AMI copies
DenyInstanceMutation RunInstances, ModifyInstanceAttribute, ModifyInstanceMetadataOptions Prevents launching rogue EC2 instances or disabling IMDSv2 (which would enable SSRF credential theft via the metadata service)
DenyIAMEscalation CreateRole, AttachRolePolicy, DetachRolePolicy, PassRole, AssumeRole Blocks the classic IAM privilege escalation chain: create a new admin role → attach AdministratorAccess → assume it
DenySSMPivot SendCommand, StartSession Prevents a compromised sandbox from using SSM to pivot laterally into other sandbox instances
DenyOrgDiscovery organizations:List*, organizations:Describe* Prevents enumeration of the org structure, other accounts, and OUs - information useful for targeting lateral movement
DenyOutsideRegion All regional actions outside allowed regions Region-locks the entire account to prevent resource creation in regions where there's no monitoring or VPC infrastructure

Each statement uses ArnNotLike conditions to carve out trusted operator roles (SSO, provisioner, lifecycle handlers). The carve-outs are minimal - for example, the budget enforcer Lambda only gets an IAM carve-out (it needs AttachRolePolicy/DetachRolePolicy to revoke Bedrock access), not a network or instance carve-out.

The SCP is deployed via km bootstrap --dry-run=false. Run km bootstrap --show-prereqs to see the exact IAM role and trust policy that must be created in the management account first.

Defense in Depth

Layer Control Enforcement
Organization SCP sandbox containment Org-level deny on SG/network/IAM/instance/SSM/region - cannot be bypassed from within the account
Account Three-account isolation Sandbox blast radius limited to Application account; state and DNS unreachable
Network VPC Security Groups Primary boundary - blocks all egress except proxy paths
DNS DNS proxy sidecar / eBPF resolver Allowlisted suffixes only; non-matching → NXDOMAIN
HTTP HTTP proxy sidecar / eBPF connect4 Allowlisted hosts only; non-matching → 403 / EPERM
eBPF Cgroup BPF programs (connect4, sendmsg4, sockops, egress) Kernel-level enforcement; LPM trie allowlist; ring buffer audit; no root bypass
Identity Scoped IAM sessions Region-locked, time-limited, minimal permissions
Email Ed25519 signed email Per-sandbox key pairs; profile-controlled signing, verification, and encryption policies
Secrets SSM Parameter Store + KMS Allowlisted refs only; per-sandbox encryption key with auto-rotation
Metadata IMDSv2 enforced Token-required; blocks SSRF credential theft via instance metadata
Source GitHub App scoped tokens Per-repo, per-ref, per-permission; short-lived installation tokens refreshed via Lambda
Filesystem Path-level enforcement Writable vs read-only directories at OS level
Audit Command + network logging Secret-redacted; delivered to CloudWatch/S3
TLS Observability eBPF SSL uprobes (OpenSSL, Go, BoringSSL) Passive plaintext capture without MITM certs; independent audit trail
Telemetry OTEL observability Claude Code prompts, tool calls, API requests, cost metrics → OTel Collector → S3
Budget Compute + AI spend tracking DynamoDB real-time metering; proxy 403 + IAM revocation at ceiling

eBPF Network Enforcement

When spec.network.enforcement is set to "ebpf" or "both", the sandbox uses Cilium-style cgroup BPF programs instead of (or alongside) iptables DNAT. This is the same approach used by Cilium in Kubernetes - attaching BPF programs to a cgroup to intercept all network syscalls from processes in that group. E2E verified across 14+ iterations on AL2023 kernel 6.18.

Four BPF programs, one cgroup:

Sandbox Cgroup (/sys/fs/cgroup/km.slice/km-{id}.scope)
│
├── cgroup/connect4   — TCP connect() hook
│   ├── Dual-PID exemption (enforcer + proxy sidecar)
│   ├── LPM trie lookup: is dest IP in allowed_cidrs?
│   ├── If denied → return EPERM (connection refused)
│   ├── If allowed + proxy-marked → stash original dest, rewrite to 127.0.0.1:3128
│   └── Emit structured audit event to ring buffer
│
├── cgroup/sendmsg4   — UDP sendmsg() hook
│   ├── Intercept DNS (port 53)
│   └── Redirect to local resolver (127.0.0.1:53)
│
├── sockops           — TCP state transitions
│   └── Map source_port → socket_cookie (transparent proxy recovers real dest)
│
└── cgroup_skb/egress — Packet-level backstop
    ├── Parse IPv4 header, check allowed_cidrs
    └── Drop packets to non-allowlisted IPs (L3 defense-in-depth)

How the allowlist stays fresh: A userspace DNS resolver (127.0.0.1:53) checks every DNS query against the profile's allowedDNSSuffixes. Allowed queries are forwarded to VPC DNS; resolved IPs are injected into the BPF allowed_cidrs LPM trie map with TTL-based expiry. For L7-required hosts (GitHub, Bedrock), IPs are also inserted into http_proxy_ips for selective proxy redirect. The allowlist is dynamic — it grows as the agent resolves new hosts and shrinks as DNS TTLs expire.

Why cgroups? The BPF programs are scoped to the sandbox cgroup, not the whole instance. The enforcer process, SSM agent, and sidecars run outside the cgroup and are unaffected. This is the same isolation model that makes this approach portable to EKS pods, Docker cgroups, and other container runtimes in future substrates.

Transparent proxy (both mode): When connect4 rewrites a connection's destination to the local proxy, the sandbox app sends raw TLS (not HTTP CONNECT). A TransparentListener in the HTTP proxy peeks the first byte (0x16 = TLS ClientHello), then recovers the original destination via a three-step BPF map lookup chain: src_port_to_sock[peer_port]sock_to_original_ip[cookie]sock_to_original_port[cookie]. This enables L7 inspection (GitHub repo filtering, Bedrock token metering) without HTTP_PROXY environment variable cooperation from the client.

Editable diagram: docs/diagrams/ebpf-architecture.excalidraw

eBPF SSL Uprobe Observability

Alongside kernel-level enforcement, eBPF uprobes provide passive TLS plaintext capture for audit and observability — without MITM certificates. E2E verified on AL2023 with 8 probes attaching to OpenSSL 3.2.2:

TLS Library Used By Uprobe Target Status
OpenSSL (libssl.so.3) curl, wget, Python, Ruby SSL_write / SSL_read / SSL_write_ex / SSL_read_ex E2E verified (8 probes)
Go crypto/tls Goose (if Go) writeRecordLocked / Read Schema-ready (per-RET offsets, no uretprobe)
BoringSSL (Bun) Claude Code SSL_write Schema-ready (byte-pattern offset discovery)
rustls Future Rust agents rustls_connection_write_tls Schema-ready

What uprobes add that MITM can't: Visibility into traffic that bypasses the proxy (if any), audit trail independent of proxy logs, plaintext capture without certificate trust issues. The observer logs structured JSON events with HTTP method, URL, host, and response status for every TLS connection. Git-smart-HTTP (clone/push) uses HTTP/1.1 and is captured correctly.

What uprobes can't replace: Active request blocking (uprobes are passive — they observe but cannot deny), HTTP/2 body parsing (GitHub API and Bedrock use HTTP/2 — uprobe captures HPACK-compressed binary, not parseable HTTP/1.1), and the transparent proxy's active enforcement (repo filtering, budget 403s).

The two eBPF layers work together:

  • Phase 40 (enforcement): cgroup BPF programs decide allow/deny/redirect at the kernel level
  • Phase 41 (observability): SSL uprobes capture TLS plaintext for audit logging alongside the MITM proxy

Architecture Diagrams

Editable source: docs/sandbox-architecture.excalidraw - open in excalidraw.com or the VS Code Excalidraw extension.

How It Works

Sandbox Lifecycle & Pipeline - configure through destroy, automatic exit triggers

  1. Configure with km configure (alias: km conf) - set your domain, account IDs, SSO URL, region (once)
  2. Bootstrap with km bootstrap - deploys SCP, KMS key, artifacts bucket (once)
  3. Initialize with km init --region us-east-1 - builds Lambdas/sidecars, provisions VPC, DynamoDB (budgets, identities, sandboxes), SES, TTL handler (once per region)
  4. Check with km doctor - validates all 12 platform health checks, assumes cross-account role for SCP verification
  5. Define a SandboxProfile in YAML - budget, lifecycle, network policy, identity, sidecars
  6. Validate with km validate <profile.yaml>
  7. Create with km create <profile> - compiles to Terragrunt inputs, provisions infrastructure, shows elapsed time
  8. Monitor with km list (alias-first, lock icons, narrow default, --wide for all columns) / km status (budget, identity, idle countdown with color) / km logs / km otel (telemetry + spend)
  9. Connect with km shell 1 (restricted user) or km shell 1 --root (operator access)
  10. Port forward with km shell 1 --ports 8080:80,3000 (Docker-style syntax)
  11. Extend with km extend 1 2h - add time before TTL expires
  12. Stop with km stop 1 - stop instance, preserve infrastructure for restart
  13. Top-up with km budget add 1 --compute 5.00 - add compute or AI budget
  14. Destroy with km destroy 1 (Lambda-dispatched by default) or km destroy 1 --remote=false (local terragrunt)
  15. Teardown with km uninit --region us-east-1 - reverse of init, destroys all regional infrastructure

SandboxProfile

Profiles use a Kubernetes-style schema at klankermaker.ai/v1alpha1. Here's the goose profile - a working example that provisions a Goose agent sandbox with Bedrock access, budget enforcement, OTEL telemetry, hibernation support, EFS shared storage, and GitHub repo allowlisting:

apiVersion: klankermaker.ai/v1alpha1
kind: SandboxProfile
metadata:
  name: goose
  labels:
    tier: development
    tool: goose
  prefix: gebpfgk

spec:
  lifecycle:
    ttl: "4h"
    idleTimeout: "1h"
    teardownPolicy: stop

  runtime:
    substrate: ec2
    spot: false
    instanceType: t3.medium
    region: us-east-1
    rootVolumeSize: 15
    hibernation: true              # preserve RAM state on pause (on-demand only)
    mountEFS: true                 # mount regional EFS shared filesystem
    efsMountPoint: /shared         # EFS mount path (default: /shared)
    additionalVolume:              # extra EBS volume for data
      size: 20                     # GB
      mountPoint: /data

  execution:
    shell: /bin/bash
    workingDir: /workspace
    useBedrock: true               # route Anthropic API via AWS Bedrock (SigV4 auth)
    privileged: false
    env:
      SANDBOX_MODE: goose-ebpf-gatekeeper
      GOOSE_PROVIDER: aws_bedrock
      GOOSE_MODEL: us.anthropic.claude-opus-4-6-v1
      GOOSE_MODE: auto
      GOOSE_TELEMETRY_ENABLED: "false"
      CODEX_CA_CERTIFICATE: /usr/local/share/ca-certificates/km-proxy-ca.crt
      OPENAI_API_KEY: ""
    configFiles:
      "/home/sandbox/.claude/settings.json": |
        {"trustedDirectories":["/home/sandbox","/workspace"]}
    rsyncPaths:
      - ".gitconfig"
      - ".config/goose"
      - ".claude"
      - ".claude.json"
      - ".codex"
    initCommands:
      - "yum install -y git nodejs npm python3 python3-pip bzip2 jq tar gzip unzip tmux"
      - "HOME=/root curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | HOME=/root CONFIGURE=false bash"
      - "npm install -g @anthropic-ai/claude-code@2.1.108"
      - "mkdir -p /workspace && chown -R sandbox:sandbox /workspace"

  budget:
    compute:
      maxSpendUSD: 0.50
    ai:
      maxSpendUSD: 1.00
    warningThreshold: 0.80

  network:
    enforcement: both              # "proxy" (default), "ebpf", or "both"
    egress:
      allowedDNSSuffixes:
        - ".amazonaws.com"
        - ".anthropic.com"
        - ".claude.ai"
        - ".claude.com"
        - ".sentry.io"
        - ".cloudfront.net"
        - ".github.com"
        - ".githubusercontent.com"
        - ".npmjs.org"
        - ".npmjs.com"
        - ".nodejs.org"
        - ".npmmirror.com"
        - ".openai.com"
        - ".chatgpt.com"
        - ".pypi.org"
        - ".pythonhosted.org"
        - ".pulsemcp.com"
        - ".google.com"
        - ".google-analytics.com"
        - ".googletagmanager.com"
        - ".googleapis.com"
        - ".featuregates.org"
        - ".statsig.com"
      allowedHosts:
        - "statsig.com"

  sourceAccess:
    mode: allowlist
    github:
      allowedRepos:
        - "whereiskurt/meshtk"
        - "whereiskurt/defcon.run.34"
        - "whereiskurt/klanker-maker"
      allowedRefs:
        - "main"
        - "develop"
        - "feature/*"
        - "fix/*"

  identity:
    roleSessionDuration: "1h"
    allowedRegions:
      - us-east-1
    sessionPolicy: minimal

  sidecars:
    dnsProxy:
      enabled: true
      image: km-dns-proxy:latest
    httpProxy:
      enabled: true
      image: km-http-proxy:latest
    auditLog:
      enabled: true
      image: km-audit-log:latest
    tracing:
      enabled: true
      image: km-tracing:latest

  observability:
    commandLog:
      destination: cloudwatch
      logGroup: /klankrmkr/sandboxes
    networkLog:
      destination: cloudwatch
      logGroup: /klankrmkr/network
    claudeTelemetry:
      enabled: true
      logPrompts: true
      logToolDetails: true
    learnMode: false
    tlsCapture:                    # eBPF SSL uprobe plaintext capture (Phase 41)
      enabled: true
      libraries: [openssl]
      capturePayloads: false

  artifacts:
    paths:
      - /workspace
    maxSizeMB: 500

  email:
    signing: required
    verifyInbound: required
    encryption: required

  cli:
    noBedrock: true

Built-in Profiles

Profile TTL Network Budget Use Case
hardened 4h eBPF+proxy (both), AWS services only No budget section Production-adjacent testing
sealed 1h Proxy, .anthropic.com + .npmjs.org only $5 compute / $10 AI Minimal egress, short-lived execution
goose 4h eBPF+proxy (both), Anthropic, GitHub, npm, PyPI, OpenAI, Goose extensions $0.50 compute / $1 AI Goose agent (Block) with Bedrock, MCP extensions
codex 4h Proxy, OpenAI, GitHub $2 compute / $5 AI OpenAI Codex agent
ao 8h eBPF+proxy (both), Anthropic, GitHub, npm, OpenAI $4 compute / $10 AI Multi-agent orchestration (Claude + Codex + AO)
learn 2h eBPF+proxy (both), wide-open TLD suffixes $2 compute / $0 AI Traffic observation for profile generation

Profiles support inheritance via extends - start from a base and override what you need.

Non-Interactive Agent Execution

Run Claude (or any agent) non-interactively inside a sandbox via km agent run. Prompts are dispatched via SSM SendCommand, agents run in persistent tmux sessions that survive disconnects, and output is stored on disk + S3 for fast retrieval.

# Fire-and-forget — agent runs in tmux, returns immediately
km agent run sb-abc123 --prompt "fix the failing tests"

# Wait for completion — blocks until done, prints JSON result
km agent run sb-abc123 --prompt "What model are you?" --wait

# Interactive — attach to tmux, watch Claude work live (Ctrl-B d to detach)
km agent run sb-abc123 --prompt "refactor auth module" --interactive

# Attach to a running agent's tmux session
km agent attach sb-abc123

# Fetch results (S3 fast path, ~3s)
km agent results sb-abc123
km agent results sb-abc123 | jq '.result'

# List all runs with status
km agent list sb-abc123

# Schedule a future agent run (resumes sandbox if paused)
km at '5pm tomorrow' agent run sb-abc123 --prompt "nightly tests" --auto-start

# Use direct Anthropic API instead of Bedrock
km agent run sb-abc123 --prompt "..." --no-bedrock --wait

Profile defaults: Set spec.cli.noBedrock: true to default to direct API. Use spec.execution.configFiles to pre-seed Claude settings (trusted directories, etc.).

Running Agents in Sandboxes

Klanker Maker is workload-agnostic - any agent that runs on Linux works inside a sandbox. Here's how the controls map to real agent workloads:

Agent What It Does Which Controls Matter
Goose Installs deps, edits files, runs tests, orchestrates workflows Budget cap - prevents runaway AI API costs when Goose loops
Aider AI pair programming with auto git commits Source access - controls which repos it can push to
agent-orchestrator Spawns parallel coding agents, handles CI fixes autonomously Budget + TTL - caps fleet cost; each spawned worker inherits the sandbox ceiling
deepagents Planning + filesystem + sub-agent spawning via LangGraph Network allowlist - limits where sub-agents can reach
open-swe Async coding agent that clones, patches, and PRs Source access - allowlist repos + refs; block push to protected branches
redamon Automated red team: recon → exploitation → post-exploitation Sealed profile - air-gapped, no egress, full audit trail
raptor Claude Code as an offensive security agent Hardened profile - minimal egress, short TTL, every command logged
autoresearch Agents running research on GPU training Compute budget - prevents a runaway training loop from burning hours of GPU
nanoclaw Anthropic Agent SDK agent connected to messaging apps HTTP proxy - controls which external APIs the agent can call
gobii-platform Always-on AI workforce Idle timeout - shuts down workers that stop producing; artifact upload preserves state

Multi-Agent Orchestration via Signed Email

Sandboxes communicate through digitally signed email (SES + Ed25519). Each sandbox gets a unique address derived from its ID (e.g., sb-a1b2c3d4@sandboxes.klankermaker.ai) and an Ed25519 key pair at creation time.

  • Signing - outbound emails are signed with the sender's Ed25519 private key (stored in SSM, KMS-encrypted). The signature and sender ID are attached as X-KM-Signature and X-KM-Sender-ID headers.
  • Verification - the receiver fetches the sender's public key from the km-identities DynamoDB table and verifies the signature. When verifyInbound: required, unsigned or invalid emails are rejected.
  • Encryption - optional X25519 key exchange (NaCl box). When encryption: required, the sender encrypts the body with the recipient's public key. When encryption: optional, it encrypts if the recipient has a published key, plaintext otherwise.

Profile controls (spec.email.signing, spec.email.verifyInbound, spec.email.encryption) govern policy per sandbox. Hardened and sealed profiles default to signing: required; goose defaults to signing: required.

This enables multi-agent pipelines where each worker is physically isolated but logically connected - with cryptographic proof of sender identity and optional confidentiality.

Substrates

Substrate How It Works Cost
EC2 Spot (default) Shared regional VPC, per-sandbox SG, spot instance, SSM access, sidecar systemd services ~$0.01/hr for t3.medium
EC2 On-Demand Same as above, guaranteed capacity ~$0.04/hr for t3.medium
ECS Fargate Spot Fargate task with sidecar containers, service discovery ~$0.01/hr for 1 vCPU / 2GB
ECS Fargate Same as above, guaranteed capacity ~$0.04/hr for 1 vCPU / 2GB
Docker (local) Docker Compose on local machine, sidecar containers, IAM roles via STS Free (local compute)

Spot interruption handlers automatically upload artifacts to S3 before instances are reclaimed.

Budget Enforcement

Budget enforcement tracks two spend pools per sandbox, stored in a DynamoDB global table replicated to every region where agents run. Reads from within the sandbox hit the local regional replica with sub-millisecond latency.

Budget Enforcement Flow - proxy metering, DynamoDB tracking, dual-layer enforcement

Compute Budget

Tracked as spot rate x elapsed minutes, sourced from the AWS Price List API at sandbox creation. When the compute budget is exhausted, the sandbox is suspended - not destroyed:

  • EC2: StopInstances preserves the EBS volume. No compute charges accrue while stopped.
  • ECS Fargate: Artifacts are uploaded, then the task is stopped. Re-provision from the stored S3 profile on top-up.

AI Budget (Bedrock, Anthropic, OpenAI)

The HTTP proxy sidecar intercepts every AI API response - Bedrock (invoke-with-response-stream), Anthropic direct (api.anthropic.com, for Claude Code Max/API key users), and OpenAI-compatible endpoints. A tee-reader streams data through to the client without blocking, captures the full response, then extracts token counts asynchronously:

  • Bedrock streaming: base64-decodes {"bytes":"<b64>"} event-stream wrappers to find message_start/message_delta payloads
  • Anthropic SSE: parses data: lines for the same event types
  • Non-streaming: reads usage from the JSON response body

Tokens are priced against static model rates and atomically incremented in the DynamoDB spend counter.

Dual-layer enforcement at 100%:

  1. Proxy layer (immediate) - HTTP proxy returns 403 for subsequent AI calls
  2. IAM layer (backstop) - a Lambda revokes the sandbox IAM role's Bedrock permissions, catching calls that bypass the proxy

km status shows per-model AI spend grouped by provider:

$ km status goose-e6c7d024
Sandbox ID:  goose-e6c7d024
Profile:     goose
...
Budget:
  Compute: $0.0312 / $0.5000 (6.2%)
  AI:      $0.4200 / $1.0000 (42.0%)
    anthropic.claude-sonnet-4-6:  $0.85  (89K in / 34K out)   # Bedrock
    claude-opus-4-6:              $0.55  (12K in / 8K out)     # Max/API

OTEL Telemetry

Claude Code running inside sandboxes exports OpenTelemetry telemetry (prompts, tool calls, API requests, token usage, cost metrics) through an OTel Collector sidecar to S3. Profile-controlled via spec.observability.claudeTelemetry:

observability:
  claudeTelemetry:
    enabled: true        # master switch
    logPrompts: true     # include actual prompt text
    logToolDetails: true # include tool parameters (bash commands, file paths)

km otel provides five views into this data:

$ km otel claude-e6c7d024              # summary: budget + S3 + metrics
$ km otel claude-e6c7d024 --prompts    # user prompts with timestamps
$ km otel claude-e6c7d024 --events     # full event stream
$ km otel claude-e6c7d024 --tools      # tool calls with params + duration
$ km otel claude-e6c7d024 --timeline   # conversation turns with per-turn cost

Warnings and Top-Up

At 80% (configurable via spec.budget.warningThreshold) of either pool, the operator receives an email via SES.

$ km budget add claude-e6c7d024 --ai 3.00
  AI budget: $5.00 → $8.00
  Proxy: unblocked
  IAM: restored
  Status: running

Top-up unblocks the proxy, restores IAM permissions, and restarts suspended compute - all in one command.

Architecture

km CLI / ConfigUI
├── cmd/km/                  CLI entry point
├── cmd/configui/            Web dashboard (Go + embedded HTML)
├── cmd/ttl-handler/         Lambda: TTL expiry + artifact upload
├── cmd/budget-enforcer/     Lambda: budget ceiling enforcement
├── cmd/create-handler/      Lambda: remote sandbox creation via EventBridge
├── cmd/email-create-handler/ Lambda: email-driven sandbox creation
├── cmd/github-token-refresher/ Lambda: GitHub App installation token refresh
├── internal/app/cmd/        Cobra commands (configure, bootstrap, init, uninit, validate, create, clone, destroy/kill, pause, resume, lock, unlock, stop, extend, roll, at/schedule, list, status, logs, budget, shell, agent, doctor, otel, info, rsync, email)
├── internal/app/config/     Configuration (config.yaml, env vars, CLI flags)
├── pkg/
│   ├── profile/             SandboxProfile schema, validation, inheritance
│   ├── compiler/            Profile → Terragrunt artifacts (EC2 + ECS paths)
│   ├── ebpf/                eBPF enforcer (cgroup BPF programs, DNS resolver, audit consumer, SSL uprobes)
│   ├── aws/                 SDK helpers (S3, SES, CloudWatch, EC2 metadata, DynamoDB, EventBridge Scheduler, identity/signing)
│   ├── terragrunt/          Runner + per-sandbox state isolation
│   ├── lifecycle/           TTL scheduling, idle detection, teardown
│   ├── allowlistgen/        Allowlist generation from observed traffic
│   ├── at/                  Deferred/recurring operation scheduling
│   ├── github/              GitHub App token management
│   ├── localnumber/         Persistent local sandbox numbering
│   └── version/             Build version info
├── sidecars/
│   ├── dns-proxy/           DNS allowlist filter (UDP/TCP:53)
│   ├── http-proxy/          HTTP allowlist filter (TCP:3128) + AI token metering (Bedrock, Anthropic, OpenAI)
│   ├── audit-log/           Command + network log router with secret redaction
│   └── tracing/             OTel Collector sidecar (logs, metrics → S3)
├── profiles/                Built-in YAML profiles (sealed, hardened, goose, codex, ao, learn)
└── infra/
    ├── modules/             Terraform modules
    │   ├── network/         VPC, subnets, security groups
    │   ├── ec2spot/         Spot + on-demand instances, IMDSv2, IAM
    │   ├── ecs-cluster/     ECS cluster, Fargate Spot capacity provider
    │   ├── ecs-task/        Task definitions with sidecar containers
    │   ├── ecs-service/     Service deployment + service discovery
    │   ├── ecs-spot-handler/  Lambda: Fargate Spot interruption → artifact upload
    │   ├── efs/             Regional EFS shared filesystem for cross-sandbox data
    │   ├── secrets/         SSM Parameter Store + KMS encryption
    │   ├── ses/             SES domain, DKIM, inbound email → S3
    │   ├── scp/             SCP sandbox containment (deployed to management account)
    │   ├── dynamodb-budget/ Budget enforcement table
    │   ├── dynamodb-identities/ Sandbox identity public key table
    │   ├── dynamodb-sandboxes/ Sandbox metadata table (km-sandboxes)
    │   ├── dynamodb-schedules/ Scheduled operations table
    │   ├── budget-enforcer/ Lambda: budget ceiling enforcement
    │   ├── create-handler/  Lambda: remote sandbox creation
    │   ├── email-handler/   Lambda: email-driven operations
    │   ├── github-token/    Lambda: GitHub App token refresh
    │   ├── s3-replication/  Cross-region artifact replication
    │   └── ttl-handler/     Lambda: TTL expiry → artifacts + email + self-cleanup
    └── live/                Terragrunt hierarchy (site.hcl, per-sandbox isolation)

Quick Start

# Install
go install github.com/whereiskurt/klankrmkr/cmd/km@latest

# Configure your platform (once)
km configure    # or: km conf

# See what's needed in the management account before bootstrap
km bootstrap --show-prereqs

# Bootstrap SCP + KMS + artifacts bucket (once)
km bootstrap --dry-run=false

# Initialize the region - builds Lambdas, sidecars, deploys infra (once per region)
km init --region us-east-1

# Check platform health (20 checks)
km doctor

# Create a sandbox (shows progress dots + elapsed time)
km create profiles/goose.yaml
km create --on-demand profiles/sealed.yaml  # skip spot, use on-demand
km create profiles/goose.yaml --no-bedrock  # disable Bedrock, use direct API keys
km create profiles/goose.yaml --docker      # shortcut for --substrate=docker
km create profiles/goose.yaml --alias mybot # override the sandbox alias

# List sandboxes (narrow default — alias first, live status)
km list
km list --wide    # show profile, substrate, region columns

# Status with budget, identity, idle countdown
km status 1

# Connect as restricted user (no sudo)
km shell 1
km shell 1 --root    # operator access

# Port forward (Docker-style)
km shell 1 --ports 8080           # localhost:8080 → remote:8080
km shell 1 --ports 8080:80,3000   # multiple ports

# Launch an AI agent inside a sandbox
km agent 1 --claude                          # interactive Claude Code
km agent run 1 --prompt "fix tests"          # headless with prompt
km agent run 1 --prompt "fix tests" --wait   # wait for completion

# Extend TTL
km extend 1 2h

# Pause (hibernate) — preserves RAM state
km pause 1

# Resume a paused or stopped sandbox
km resume 1

# Lock to prevent accidental destroy/stop/pause
km lock 1
km unlock 1 --yes

# Stop without destroying
km stop 1

# View audit logs
km logs 1

# OTEL telemetry + AI spend
km otel 1                  # summary
km otel 1 --timeline       # conversation turns with cost
km otel 1 --prompts        # user prompts
km otel 1 --tools          # tool call history

# Destroy (remote by default, or local)
km destroy 1                    # Lambda-dispatched (default)
km destroy 1 --yes              # skip confirmation prompt
km destroy 1 --remote=false     # local terragrunt destroy

# km kill is an alias for km destroy
km kill 1 --yes

# Schedule a deferred or recurring operation
km at '10pm tomorrow' create profiles/goose.yaml     # one-shot
km at 'every thursday at 3pm' kill 1                  # recurring
km at list                                            # list scheduled ops
km at cancel my-schedule-name                         # cancel one
# km schedule is an alias for km at

# Teardown region infrastructure
km uninit --region us-east-1

Documentation

Document Description
User Manual Full command reference, walkthroughs (Claude Code, Goose, security agents), profile authoring
Operator Guide AWS account setup, KMS, S3, SES, Lambda deployment - everything before km init
Profile Reference Complete YAML schema with every field, type, default, and validation rule
Security Model Deep dive on each security layer, from VPC to IMDSv2 to secret redaction
Budget Guide DynamoDB schema, proxy metering, enforcement flow, threshold configuration
Docker Substrate Running sandboxes locally via Docker Compose (km create --docker)
Sidecar Reference Each sidecar's config, env vars, log formats, EC2 vs ECS deployment
Multi-Agent Email SES setup, sandbox addressing, cross-sandbox orchestration patterns
ConfigUI Guide Web dashboard setup, profile editor, secrets management

Roadmap

Phase Description Status
1 Schema, Compiler & AWS Foundation Complete
2 Core Provisioning & Security Baseline Complete
3 Sidecar Enforcement & Lifecycle Management Complete
4 Lifecycle Hardening, Artifacts & Email Complete
5 ConfigUI Web Dashboard Complete
6 Budget Enforcement & Platform Configuration Complete
7 Unwired Code Paths Complete
8 Sidecar Build & Deployment Pipeline Complete
9 Live Infrastructure & Operator Docs Complete
10 SCP Sandbox Containment Complete
11 Sandbox Auto-Destroy & Metadata Wiring Complete
12 ECS Budget Top-Up & S3 Replication Complete
13 GitHub App Token Integration Complete
14 Sandbox Identity & Signed Email Complete
15 km doctor - Platform Health Check Complete
16 Documentation Refresh (Phases 6-15) Complete
17 Sandbox Email Mailbox & Access Control Complete
18 Loose Ends - km init, uninit, bootstrap KMS, github-token Complete
19 Budget Enforcement Wiring - EC2 hard stop, IAM revocation Complete
20 Anthropic API Metering & Terragrunt Output Suppression Complete
21 Bug fixes and mini-features - budget precision, polish Complete
22 Remote Sandbox Dispatch - km create/destroy/stop/extend --remote via Lambda Complete
23 Email-Driven Operations - operator inbox, email-to-create, safe phrase auth, EventBridge Complete
24 Documentation Refresh - docs for Phases 22-32 Complete
25 GitHub Source Access Restrictions - repo allowlists, deny-by-default Complete
26 Live Operations Hardening - bootstrap, init, TTL, idle, sidecars, CLI polish Complete
27 Claude Code OTEL Integration - sandbox observability via built-in telemetry Complete
28 OTEL Observability Hardening - timeline view, events, tools flags Complete
29 EC2 Hibernation & MaxLifetime Enforcement Complete
30 Sandbox Pause, Lock, Unlock & km list Enhancements Complete
31 Transparent HTTPS & Audit Log Improvements Complete
32 Profile-Scoped Rsync Paths & External File Lists Complete
33 EC2 Storage Customization, Hibernation & AMI Selection Complete
34 Agent Profiles - Agent Orchestrator, Goose, and Codex Complete
35 MITM CA Trust for Python, Node, and Non-System SSL Libraries Complete
36 km-sandbox Base Container Image Complete
37 Docker Compose Local Substrate Complete
38 EKS / Kubernetes Substrate Planned
39 DynamoDB Metadata Migration (S3 to DynamoDB) Complete
40 eBPF Cgroup Network Enforcement (connect4, sendmsg4, sockops, egress) Complete
41 eBPF SSL Uprobe TLS Observability (OpenSSL, Go, BoringSSL) Complete
42 eBPF Gatekeeper Mode — connect4 DNAT Rewrite for L7 Proxy Complete
43 Regional EFS Shared Filesystem Complete
44 km at / km schedule — Deferred & Recurring Operations Complete
45 km-send/km-recv Sandbox Scripts & km email send/read CLI Complete
46 AI Email-to-Command — Haiku Interprets Free-Form Operator Emails Complete
47 Privileged Execution Mode & Learn Profile Complete
48 Profile Override Flags for km create (--ttl, --idle) Complete
49 Prebaked AMI Support Planned
50 km agent Non-Interactive Execution (--prompt, results, list) Complete
51 km agent Tmux Sessions (attach, --interactive) Complete
52 km clone — Duplicate a Running Sandbox Complete
53 Persistent Local Sandbox Numbering Complete

See .planning/ROADMAP.md for detailed phase breakdowns and success criteria.

License

TBD

About

klanker-maker ('km') is a policy-driven sandbox platform for your AI klankers. Define execution environments as declarative YAML profiles, compile them into real AWS infrastructure (EC2/ECS). Go CLI + ConfigUI web dashboard.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors