aibtcdev · azagh72-creator · Apr 4, 2026 · Apr 4, 2026
diff --git a/execution-guard/AGENT.md b/execution-guard/AGENT.md
@@ -0,0 +1,51 @@
+---
+name: execution-guard-agent
+skill: execution-guard
+description: "Autonomous decision engine agent that gates agent operations behind multi-layer health consensus. Produces RUN/CAUTION/SOFT_PAUSE/HARD_STOP verdicts via 4-layer quorum."
+---
+
+# execution-guard-agent
+
+Autonomous agent persona for operating the `execution-guard` skill. This agent sits upstream of all operational agents and gates execution based on multi-layer health consensus.
+
+## Prerequisites
+
+- No wallet required (read-only skill)
+- Network access to: mempool.space, Hiro API, x402-relay.aibtc.com
+
+## Decision Logic
+
+1. Run `doctor` to verify upstream dependencies are reachable.
+2. Run `evaluate --address <addr>` to get a 4-layer verdict.
+3. Route the verdict to the appropriate action:
+   - `RUN` → allow pending operations
+   - `CAUTION` → allow with reduced position sizing
+   - `SOFT_PAUSE` → hold queue, notify operator
+   - `HARD_STOP` → freeze all, alert operator
+4. Before executing any job, run `check-job` to prevent duplicates.
+
+## Safety Checks
+
+- Never use a single layer's output to make a decision — all verdicts come from the quorum engine.
+- Chain liveness (Layer 1) has veto power: if it scores 0, verdict is always HARD_STOP.
+- App signal (Layer 3) is supplementary: it can contribute to a quorum downgrade but never drives a STOP alone.
+- Each evaluation is fresh — do not cache verdicts across invocations in production.
+
+## Error Handling
+
+| Error | Behavior |
+|---|---|
+| Single layer timeout | Score that layer at 0, continue evaluation with remaining layers |
+| All layers timeout | Return HARD_STOP with reason "all layers unreachable" |
+| Anti-replay store full | Evict oldest entries automatically, continue |
+| Unexpected exception | Catch at top level, return HARD_STOP (fail-safe default) |
+
+## Output Contract
+
+Each subcommand outputs a single JSON object to stdout.
+
+- `evaluate` → `{ verdict, reason, quorum, avgScore, action, layers[], evaluationMs, antiReplay }`
+- `check-job` → `{ allowed, hash, reason?, originalExecution? }`
+- `doctor` → `{ overall, network, endpoints{} }`
+
+Exit code 0 on success, 1 on error.
diff --git a/execution-guard/SKILL.md b/execution-guard/SKILL.md
@@ -0,0 +1,84 @@
+---
+name: execution-guard
+description: "Multi-layer decision engine for Stacks agent operations. 4 independent layers (Chain Liveness, Payment Health, App Signal, Internal Sanity) vote via quorum to produce RUN/CAUTION/SOFT_PAUSE/HARD_STOP verdicts. Includes anti-replay protection."
+metadata:
+  author: "azagh72-creator"
+  author-agent: "Flying Whale"
+  user-invocable: "false"
+  arguments: "evaluate [--address <stx-address>] | check-job --job-id <id> --nonce <n> --timestamp <ts> | doctor"
+  entry: "execution-guard/execution-guard.ts"
+  mcp-tools: "check_relay_health, get_network_status, get_transaction_status"
+  requires: ""
+  tags: "l1, l2, read-only, infrastructure"
+---
+
+# execution-guard
+
+Multi-layer decision engine for autonomous agent operations on Stacks. Four independent layers vote on system health via quorum — the engine compares signals, it does not trust any single source.
+
+## Problem
+
+Agents that rely on a single API endpoint to decide whether to operate are vulnerable to false signals. If an activity API returns zero but the blockchain is live, the agent should not stop. If the blockchain is down but the API looks healthy, the agent should stop immediately. A single-source decision creates a single point of failure.
+
+## Solution
+
+Four independent layers check different aspects of system health in parallel. A quorum engine aggregates their scores and produces one of four verdicts. Chain liveness holds veto power — if both Bitcoin and Stacks are unreachable, the verdict is always HARD_STOP regardless of other layers.
+
+## Layers
+
+| # | Layer | What it checks | Role |
+|---|---|---|---|
+| 1 | **Chain Liveness** | Bitcoin block height, Stacks block height, BTC-STX sync drift | Veto power — score 0 forces HARD_STOP |
+| 2 | **Payment Health** | x402 relay status, sponsor nonce gaps (queried directly from Hiro), mempool desync | Standard quorum member |
+| 3 | **App Signal** | Recent transaction activity for a given address | Supplementary — never drives decisions alone |
+| 4 | **Internal Sanity** | Hiro API latency, memory pressure, anti-replay store health | Standard quorum member |
+
+## Verdicts
+
+| Verdict | Condition | Behavior |
+|---|---|---|
+| `RUN` | 3-4/4 layers score >= 60 | Operate normally |
+| `CAUTION` | 2/4 layers healthy | Proceed with reduced exposure, avoid new large positions |
+| `SOFT_PAUSE` | 1/4 layers healthy | Halt execution but preserve queue |
+| `HARD_STOP` | 0/4 layers healthy OR chain dead | Freeze everything, preserve queue, wait for recovery |
+
+## Anti-replay
+
+Every job gets a deterministic hash from `job_id + nonce + timestamp`. Executed jobs are tracked in a rolling 24-hour window (max 1,000 entries). Duplicate hashes are rejected.
+
+## Subcommands
+
+### `evaluate`
+
+Run full 4-layer evaluation. Optional `--address` enables the App Signal layer with real tx history.
+
+```
+bun run execution-guard/execution-guard.ts evaluate
+bun run execution-guard/execution-guard.ts evaluate --address SP322ZK4VXT3KGDT9YQANN9R28SCT02MZ97Y24BRW
+```
+
+**Output**: verdict, reason, quorum, per-layer scores and signals, evaluation time, anti-replay stats.
+
+### `check-job`
+
+Anti-replay check. Returns `allowed: true` if the job is new, `allowed: false` with the original execution timestamp if duplicate.
+
+```
+bun run execution-guard/execution-guard.ts check-job --job-id "rebalance-001" --nonce 42 --timestamp 1711843200000
+```
+
+### `doctor`
+
+Health check across Bitcoin, Stacks, x402 relay, and sponsor nonce state.
+
+```
+bun run execution-guard/execution-guard.ts doctor
+```
+
+## Safety
+
+- **Read-only** — zero on-chain transactions. All checks are HTTP GET.
+- **No private keys** — never requests, accepts, or stores keys.
+- **No wallet required** — uses only public blockchain data.
+- **Graceful degradation** — each layer fails independently with a timeout.
+- **Deterministic** — same inputs produce same verdict.