GitHub - nerdsane/temper: A machine tool for agents: a verified runtime for systems agents build.

A verified, policy-driven runtime for agents that build their own tools.

Why Temper

Temper is a machine tool for agents.

Agents are fast at writing code, but the code drifts — invariants get missed, integrations break in ways no one anticipated. Temper inverts the path: agents describe what a system should do, and Temper builds the running version from the description.

The description hot-reloads. What an agent shipped keeps running while the agent revises it.

Agent
  understands a need
    |
    | writes
    v
Spec
  state machine + WASM modules + policies + data
    |
    | verifies
    v
Kernel
  SMT + model checking + simulation + property tests
    |
    | deploys
    v
Runtime
  live state machine + typed API + audit log
    |
    | used by
    v
Same or another agent
  calls in, composes, writes the next spec

Tools build on tools.

Quick start

Temper is an HTTP server with an OData API. Agents talk to it directly, through one of the SDKs, or via the MCP bridge for stdio agent clients.

Start the kernel:

temper serve --port 3000          # HTTP server, OData API, Observe UI
temper decide --port 3000         # interactive review of pending governance decisions

Connect an agent (via MCP):

{
  "mcpServers": {
    "temper": {
      "command": "temper",
      "args": ["mcp", "--port", "3000"]
    }
  }
}

temper mcp is a stdio bridge that proxies to the running server and exposes a sandboxed Python REPL with a temper.* API for submitting specs, creating entities, and invoking actions.

Or call directly: the Rust SDK (temper-sdk), the TypeScript SDK (packages/temper-sdk-ts), or any HTTP client against /tdata.

When an agent invokes an action no policy permits, the request is denied and recorded as a pending decision. temper decide walks the queue and lets you approve at a chosen scope — narrow, medium, or broad. The new rule loads without restart.

The shape of an app

A description in Temper has four parts.

Behavior — what the system can do, and what it must never do. States, transitions, preconditions, and the safety properties that must hold in every reachable state.
Data contract — what the system exposes to callers. Entity types, properties, relationships, and the actions each type supports — published as a typed API.
Authorization — who can invoke which action on which resource under what conditions. Default-deny. Scope-based human approval, hot-loaded as the policy set grows.
Application logic — what runs inside the state machine. Sandboxed modules with per-call resource budgets, triggered inline by transitions.

Each part is small enough for an agent to author and a human to read. The state machine is the contract; the application logic is what runs inside it.

Description                          Runtime
+----------------------+             +----------------------+
| Behavior             |             | Typed API            |
| Data contract        |  verified   | Enforced state       |
| Authorization        | --------->  | machine              |
| Application logic    |  deployed   | Audit log            |
+----------------------+             +----------------------+

A heartbeat scheduler with one inline-triggered module — taken from reference-apps/crucible:

[automaton]
name = "CrucibleScheduler"
states = ["Idle", "Checking"]
initial = "Idle"

[[action]]
name = "Start"
kind = "input"
from = ["Idle"]
to = "Checking"
effect = [{ type = "trigger", name = "crucible_check_schedules" }]

[[action.triggers]]
name = "crucible_check_schedules"
kind = "wasm"
module = "crucible_scheduler_check"
on_success = "CheckComplete"
on_failure = "CheckFailed"

Start moves the entity from Idle to Checking and dispatches the WASM module. On success the runtime fires CheckComplete; on failure, CheckFailed. Either path is a verified transition.

Hot reload

The kernel keeps running while specs and modules update. New revisions go live without a restart.

Evolution

The system is split into a behavioral contract (the state machine) and the application logic (sandboxed modules). The split enables a self-improvement loop on either side, independent of the other.

What gets verified

Every spec passes four layers before it is allowed to deploy.

Symbolic — every guard satisfiable, every invariant inductive.
Model checking — every reachable state visited; counterexamples printed on failure.
Simulation — the production code path runs against a fault-injected sandbox; failures reproduce under deterministic seeds.
Property — randomized action sequences with automatic counterexample shrinking.

Runs on every build, in well under a second on a small spec.

$ temper verify --specs-dir ./specs
L0 Symbolic:    PASSED  guards satisfiable, invariants inductive
L1 Model Check: PASSED  reachable states explored
L2 Simulation:  PASSED  fault-injected runs
L3 Property:    PASSED  randomized cases

Built on Temper

Katagami by @arni0x9053 — a library of agent-researched design languages. Each language ships as a verified spec plus a rendered embodiment of canonical UI elements. Writeup.
Crucible by Arun Parthiban — agentic infrastructure: agents, environments, sessions, governed lifecycle.

Compatible agents

Temper exposes an HTTP API and a stdio MCP bridge. Anything that speaks HTTP or MCP can drive it.

Claude Code · OpenClaw · Pydantic AI · LangChain · custom HTTP / MCP clients

Architecture

┌─────────────────────────────────────────────────────┐
│  Agent  (Claude Code, OpenClaw, custom, ...)        │
└────────────────────────┬────────────────────────────┘
                         │  MCP (optional)
                         ▼
┌─────────────────────────────────────────────────────┐
│  temper mcp  —  stdio bridge                        │
│  sandboxed Python REPL, temper.* API                │
└────────────────────────┬────────────────────────────┘
                         │  HTTP / OData
                         ▼
┌─────────────────────────────────────────────────────┐
│  Temper Kernel  (HTTP + OData server)               │
│                                                     │
│   Specs → Verify → Deploy                           │
│   AuthZ · WASM Triggers · Query                     │
│   Events · Observe · Evolve                         │
└─────────────────────────────────────────────────────┘

The kernel is static across deployments. Specs, data models, policies, and application modules are what change — and they hot-reload.

Full architecture in docs/PAPER.md. Positioning in docs/POSITIONING.md.

What Temper is and is not

Temper is a verified runtime for the systems agents build.


Not the runtime the agent runs in.	Hosted agent platforms and harness CLIs run the agent itself. Temper is what the agent reaches into from wherever it runs. (You can also build a hosted agent runtime on Temper — Crucible does.)
Not a framework for the agent loop.	SDKs and harnesses handle prompts, tools, and conversation. Temper holds what the loop calls into: verified state, governed integrations, typed APIs. (An agent's state can also live entirely on Temper — see TemperPaw.)
Not a backend-as-a-service.	A BaaS gives you CRUD from a data schema; the rules of the system are implicit. Temper compiles a runtime from an explicit behavioral contract — legal transitions, required invariants — and verifies the contract before it runs.
Not a workflow builder.	No imperative or visual flow editor. Capabilities are declared as verified state machines that any caller can use.

Status

Version 0.1.0. The architecture is stabilizing; the API surface is not frozen. Deployed on Railway; Katagami runs on it in production.

How it's implemented

The specification model, in detail

Behavior is expressed as an I/O Automaton specification (Nancy Lynch and Mark Tuttle, 1987), serialized as TOML and conventionally named *.ioa.toml. I/O Automata were chosen over TLA+ because the precondition/effect structure of actions maps directly onto how the runtime evaluates a transition, and the input/output/internal classification of actions maps cleanly onto how actors process messages. The same artifact is the verification target and the runtime execution artifact, which is the property that keeps proof and implementation aligned.

Data contract is expressed in CSDL (Common Schema Definition Language) from the OData v4 standard, serialized as XML and conventionally named *.csdl.xml. CSDL was chosen over GraphQL because agents need a rigid, machine-parseable contract rather than negotiated response shapes. A running Temper server publishes the full schema at GET /tdata/$metadata.

Authorization is expressed in Cedar, Amazon's declarative policy language. Cedar's (principal, action, resource, context) evaluation model maps onto the request structure exposed by the OData layer. Its default-deny posture is enforced at the policy engine, and generated policies from approved decisions are hot-loaded without downtime.

The verification cascade, in detail

L0 — symbolic reasoning. Z3 SMT solver. Checks guard satisfiability (no dead transitions) and invariant inductiveness.
L1 — exhaustive model checking. Stateright. Breadth-first exploration of the reachable state space; every reachable state is visited, every invariant is checked at every state.
L2 — deterministic simulation testing (DST). The same Rust TransitionTable the server runs is executed against a simulated backend with seeded fault injection. Failures reproduce deterministically under the same seed.
L3 — property-based testing. proptest. Randomized action sequences with automatic shrinking on failure.

Crate overview

Crate	Purpose
temper-spec	IOA TOML + CSDL parsers, compiles to StateMachine IR
temper-verify	L0–L3 verification cascade (Z3, Stateright, DST, proptest)
temper-jit	TransitionTable builder, hot-swap controller
temper-runtime	Actor system, bounded mailboxes, event sourcing, SimScheduler
temper-server	HTTP/axum, OData routing, entity dispatch, idempotency
temper-odata	OData v4: path parsing, query options, `$filter`/`$select`/`$expand`
temper-authz	Cedar-based authorization engine
temper-observe	OTEL spans + metrics, trajectory tracking
temper-evolution	O-P-A-D-I record chain, evolution engine
temper-wasm	WASM sandboxed integrations with per-call resource budgets
temper-mcp	MCP server, Monty sandbox (execute tool)
temper-platform	Hosting platform, verify-deploy pipeline, skill catalog
temper-optimize	Query + cache optimizer, N+1 detection
temper-store-postgres	Postgres event journal + snapshots (multi-tenant)
temper-store-turso	Turso/libSQL event journal + snapshots
temper-store-redis	Distributed mailbox, placement, cache traits
temper-cli	CLI: parse, verify, serve, mcp, decide
temper-sandbox	Shared Monty sandbox infrastructure
temper-sdk	HTTP client library for Temper server
temper-codegen	Generates Rust actor code from CSDL + behavioral specs
temper-store-sim	In-memory deterministic event store with fault injection
temper-wasm-sdk	SDK for writing WASM integration modules
temper-macros	Proc macros: `#[derive(Message)]`, `#[derive(DomainEvent)]`
temper-ots	Open Trajectory Specification — DST-compatible trajectory capture for agent decisions

Contributing

See CONTRIBUTING.md.

License

Dual-licensed under MIT or Apache-2.0, at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
.cargo		.cargo
.ci		.ci
.claude		.claude
.codex/specs		.codex/specs
.github/workflows		.github/workflows
.progress		.progress
.proof		.proof
.proofs		.proofs
.vision		.vision
assets		assets
crates		crates
docs		docs
os-apps		os-apps
packages/temper-sdk-ts		packages/temper-sdk-ts
plugins/openclaw-temper		plugins/openclaw-temper
reference-apps		reference-apps
scripts		scripts
skills		skills
test-fixtures/specs		test-fixtures/specs
ui		ui
wasm-modules		wasm-modules
.code-review-pass		.code-review-pass
.dockerignore		.dockerignore
.dst-review-pass		.dst-review-pass
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODING_GUIDELINES.md		CODING_GUIDELINES.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
blog.md		blog.md
blog.pdf		blog.pdf
discord-ask.png		discord-ask.png
discord-ingest.png		discord-ingest.png
dist-workspace.toml		dist-workspace.toml
docker-compose.yml		docker-compose.yml
railway.toml		railway.toml
rust-toolchain.toml		rust-toolchain.toml
temper-course.html		temper-course.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Temper

Quick start

The shape of an app

Hot reload

Evolution

What gets verified

Built on Temper

Compatible agents

Architecture

What Temper is and is not

Status

How it's implemented

Contributing

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why Temper

Quick start

The shape of an app

Hot reload

Evolution

What gets verified

Built on Temper

Compatible agents

Architecture

What Temper is and is not

Status

How it's implemented

Contributing

License

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages