Skip to content

Latest commit

 

History

History
79 lines (53 loc) · 3.08 KB

File metadata and controls

79 lines (53 loc) · 3.08 KB

External Action Gates

Problem

Autonomous agents with access to external tools (email, APIs, production systems) can take actions that are difficult or impossible to undo. A misinterpreted request can send an email to the wrong person, deploy broken code, or delete production data.

Solution

Classify all agent actions as internal (safe for autonomous execution) or external (require human approval before execution).

Implementation

1. Classify Actions

Category Examples Gate
Internal read Read files, search web, check logs Autonomous
Internal write Write to workspace, update memory Autonomous
Internal compute Analyze data, generate reports Autonomous
External read Check email, query APIs Autonomous
External write Send email, post to service Approval required
External modify Deploy code, modify server Approval required
External delete Remove data, terminate service Double approval

2. Approval Flow

When an agent needs to perform a gated action, it previews the action and waits:

  1. Agent prepares the action (drafts the email, stages the deployment)
  2. Agent presents the full preview to the human
  3. Human approves, modifies, or rejects
  4. Only on approval does the agent execute

Key: the preview must show exactly what will happen. "I'll send an email" is insufficient. Show the recipient, subject, and body.

3. Trust Escalation

Over time, frequently-approved actions can be promoted:

  • Week 1: Every email send requires approval
  • Month 2: Emails to known internal recipients are auto-approved
  • Month 3: Only emails to new external contacts need approval

This balances safety with productivity. Start restrictive, relax as trust builds.

4. Audit Trail

Log every gated action with:

  • What was requested
  • Whether it was approved or rejected
  • Who approved it
  • What was actually executed

This is essential for debugging ("why did the agent send that email?") and for building the trust history that enables escalation.

5. Prefer Reversible Over Destructive

When possible, use reversible operations:

  • trash instead of rm
  • Soft delete instead of hard delete
  • Feature flags instead of code removal
  • Draft/preview instead of direct send

Trade-offs

  • Friction: Approval gates slow down autonomous operation. That's the point, but it frustrates when the approval is obviously safe.
  • Human bottleneck: The system is only as fast as the human approver. Batch approvals help.
  • False security: A human rubber-stamping approvals without reading them defeats the purpose.

When to Skip

  • Purely internal agents that never interact with external systems.
  • Sandboxed environments where all actions are safely reversible.
  • Emergency operations where speed matters more than caution (define these in advance).

Related Patterns