Autonomous agents with access to external tools (email, APIs, production systems) can take actions that are difficult or impossible to undo. A misinterpreted request can send an email to the wrong person, deploy broken code, or delete production data.
Classify all agent actions as internal (safe for autonomous execution) or external (require human approval before execution).
| Category | Examples | Gate |
|---|---|---|
| Internal read | Read files, search web, check logs | Autonomous |
| Internal write | Write to workspace, update memory | Autonomous |
| Internal compute | Analyze data, generate reports | Autonomous |
| External read | Check email, query APIs | Autonomous |
| External write | Send email, post to service | Approval required |
| External modify | Deploy code, modify server | Approval required |
| External delete | Remove data, terminate service | Double approval |
When an agent needs to perform a gated action, it previews the action and waits:
- Agent prepares the action (drafts the email, stages the deployment)
- Agent presents the full preview to the human
- Human approves, modifies, or rejects
- Only on approval does the agent execute
Key: the preview must show exactly what will happen. "I'll send an email" is insufficient. Show the recipient, subject, and body.
Over time, frequently-approved actions can be promoted:
- Week 1: Every email send requires approval
- Month 2: Emails to known internal recipients are auto-approved
- Month 3: Only emails to new external contacts need approval
This balances safety with productivity. Start restrictive, relax as trust builds.
Log every gated action with:
- What was requested
- Whether it was approved or rejected
- Who approved it
- What was actually executed
This is essential for debugging ("why did the agent send that email?") and for building the trust history that enables escalation.
When possible, use reversible operations:
trashinstead ofrm- Soft delete instead of hard delete
- Feature flags instead of code removal
- Draft/preview instead of direct send
- Friction: Approval gates slow down autonomous operation. That's the point, but it frustrates when the approval is obviously safe.
- Human bottleneck: The system is only as fast as the human approver. Batch approvals help.
- False security: A human rubber-stamping approvals without reading them defeats the purpose.
- Purely internal agents that never interact with external systems.
- Sandboxed environments where all actions are safely reversible.
- Emergency operations where speed matters more than caution (define these in advance).
- Isolated Workspaces - credential scoping limits what agents can access
- Identity as Architecture - identity helps define which agents need stricter gates