Agent System Design

Overview

HyperAgent uses a 3-component architecture with an orchestrator, two specialized agents, and a composable skills system:

Orchestrator — Classifies queries as simple or complex, invokes PlannerAgent for complex tasks, and dispatches ExecutorAgent per step.
ExecutorAgent — Self-contained ReAct loop for executing a single step or task. Extracted from the old Task Agent. Handles ~80% of requests including chat, data analysis, app building, image generation, slide creation, and browser automation.
PlannerAgent — Decomposes complex queries into a structured list of steps for the Orchestrator to dispatch.
Research Agent — Specialized multi-step research with search, analysis, synthesis, and report writing (unchanged).

Core Philosophy

Skills = Composable capabilities (LangGraph subgraphs invoked as tools) Agents = Workflow orchestration (ReAct loops with tool calling) Tools = Atomic operations (web search, code execution, browser actions)

Architecture

┌──────────────────────────────────┐
│          User Request            │
└───────────────┬──────────────────┘
                │
         ┌──────▼──────┐
         │ Orchestrator │
         │  (classify)  │
         └──────┬───────┘
                │
      ┌─────────┼──────────────┐
      │         │              │
   simple    complex     research mode
      │         │              │
      │    ┌────▼─────┐        │
      │    │ Planner  │        │
      │    │  Agent   │        │
      │    └────┬─────┘        │
      │         │ steps        │
      │    ┌────▼─────┐        │
      │    │ Executor │        │
      ▼    │ (per step)│       ▼
┌───────────┐  │       ┌───────────┐
│ EXECUTOR  │◄─┘       │ RESEARCH  │
│   Agent   │          │   Agent   │
│           │          │           │
│ ReAct loop│   ────►  │ search →  │
│ + tools   │  hand    │ analyze → │
│ + skills  │   off    │ synthesize│
└─────┬─────┘          │ → write   │
      │                └───────────┘
      │         │
      │    ┌────▼─────┐
      │    │  verify   │
      │    │ + re-plan │
      │    └────┬──────┘
      │         │
      └────┬────┘
           │
      ┌────▼─────┐
      │ finalize  │
      └────┬──────┘
           │ invoke_skill
           ▼
┌──────────────────────────┐
│     Skills System        │
├──────────────────────────┤
│ image_generation         │
│ code_generation          │
│ web_research             │
│ data_analysis            │
│ slide_generation         │
│ app_builder              │
│ task_planning            │
└──────────────────────────┘

Routing

File: backend/app/agents/classifier.py

The Orchestrator uses heuristic-based classification (no LLM call) to determine query complexity and routing:

Is it research mode?
  YES → Research Agent
Is it a dedicated mode (app, image, slide, data)?
  YES → simple → ExecutorAgent (direct skill invocation)
Is the query short / single-step?
  YES → simple → ExecutorAgent
Does the query contain multi-step patterns or complexity keywords?
  YES → complex → PlannerAgent → ExecutorAgent (per step)
Default → simple → ExecutorAgent

Classification Heuristics

Dedicated modes (app, image, slide, data) → always classified as simple
Short queries (few tokens, single sentence) → simple
Multi-step patterns (e.g., "first... then...", "step 1... step 2...") → complex
Complexity keywords (e.g., "analyze and compare", "build a pipeline") → complex

Mode-to-Agent Mapping

Mode	Agent	Notes
`task` (default)	ExecutorAgent	General chat, Q&A
`research`	Research	Deep multi-source research
`data`	ExecutorAgent	Data analysis via code execution
`app`	ExecutorAgent	Direct skill invocation → `app_builder`
`image`	ExecutorAgent	Direct skill invocation → `image_generation`
`slide`	ExecutorAgent	Direct skill invocation → `slide_generation`

For dedicated modes (app, image, slide), the ExecutorAgent bypasses LLM reasoning and directly synthesizes an invoke_skill tool call on the first iteration.

ExecutorAgent

File: backend/app/agents/executor.py

A self-contained ReAct loop for executing a single step or task. Extracted from the old Task Agent, the ExecutorAgent focuses purely on execution — plan tracking and verification have been moved to the Orchestrator.

ReAct Loop

reason → act → wait_interrupt → reason → ... → complete

reason: LLM reasons about what to do, may emit tool calls
act: Executes tool calls (with HITL approval for high-risk tools)
wait_interrupt: Waits for user response to ask_user interrupts
complete: Signals step/task completion, returns result to caller

Available Tools

Category	Tools
Search	`web_search`
Image	`generate_image`, `analyze_image`
Browser	`browser_navigate`, `browser_screenshot`, `browser_click`, `browser_type`, `browser_press_key`, `browser_scroll`, `browser_get_stream_url`
Code	`execute_code`
CodeAct	`execute_script` (opt-in via `execution_mode: "codeact"`)
Data	`sandbox_file`
App	`create_app_project`, `app_write_file`, `app_install_packages`, `app_start_server`
Skills	`invoke_skill`, `list_skills`
Slides	`generate_slides`
Handoff	`delegate_to_research`
HITL	`ask_user`

Anti-Repetition Detection

Detects when the agent falls into repetitive tool-calling patterns (same error → same retry):

Computes MD5 hash of tool_name + sorted(args) for each tool call batch
Tracks consecutive identical hashes in last_tool_calls_hash state field
After 3+ consecutive identical batches, injects a variation prompt suggesting alternative approaches
Fires before the existing plan_revision mechanism (which triggers at 5 consecutive errors)

Context Compression

Applied in reason_node before each LLM call when token count exceeds threshold:

Estimates token count for all messages
If above threshold (default 60k), compresses older messages using FLASH-tier LLM
Injects summary as system context message
Falls back to message truncation if compression fails

KV-Cache-Friendly Prompt Construction

Messages are split into a "stable prefix" and "dynamic suffix" to maximize LLM KV-cache hit rates:

Stable prefix: System prompt + tool schemas + history summary (never reordered between iterations)
Dynamic suffix: New tool calls/results appended after prefix
prefix_hash in state tracks whether the prefix has changed; if unchanged, the LLM can reuse cached KV entries
Tool filtering uses "soft disable" (system message noting unavailable tools) instead of removing tool schemas, preserving prefix stability
Helpers: get_stable_prefix() and get_dynamic_suffix() in context_compression.py

PlannerAgent

File: backend/app/agents/planner.py

A single-node subgraph that decomposes complex queries into structured execution steps for the Orchestrator.

Input

User query
Conversation context
Optional revision_context (feedback from a failed verification, used for re-planning)

Output

list[PlanStep] — ordered list of steps, each with a description, expected outcome, and dependencies

Details

Uses PRO tier LLM for balanced quality and speed
Stateless: no persistent memory between invocations; re-planning uses explicit revision_context
Emits plan_overview event at plan creation (with all steps) for frontend rendering
Frontend renders plan as interactive checklist with progress bar

Research Agent

File: backend/app/agents/subagents/research.py

Pipeline

init_config → search_loop → analyze → synthesize → write_report

init_config: Determines search depth and scenario (academic, market, technical, news)
search_loop: Iterative web search with source collection
analyze: Analyzes gathered sources for relevance and key findings
synthesize: Synthesizes findings across sources
write_report: Generates structured report with citations

Skills System

Base Classes

File: backend/app/agents/skills/skill_base.py

Skill — Base class with create_graph() returning a LangGraph StateGraph
ToolSkill — Simplified subclass with execute() (auto-generates single-node graph)
SkillMetadata — Pydantic model: id, version, description, category, parameters, output_schema
SkillContext — Execution context with invoke_skill() for skill composition

Builtin Skills

Skill	Category	Description
`image_generation`	creative	AI image generation via Gemini or DALL-E
`code_generation`	code	Generate code snippets for specific tasks
`web_research`	research	Focused web research with source summarization
`data_analysis`	data	Full data analysis: plan, execute code in sandbox, summarize results
`slide_generation`	creative	Create PPTX presentations with research and outlines
`app_builder`	automation	Build web apps (React, Next.js, Vue, Express, FastAPI, Flask) with live preview. Planning uses MAX tier, code generation uses PRO.
`task_planning`	automation	Analyze complex tasks and create execution plans

Invocation

Agents invoke skills via two tools:

invoke_skill(skill_id, params) — Execute a skill with parameters
list_skills() — Discover available skills

Skills can compose with each other via SkillContext.invoke_skill().

Orchestrator

File: backend/app/agents/orchestrator.py

The AgentOrchestrator class is the main entry point for all agent workflows, replacing the old AgentSupervisor. It manages query classification, planning, step dispatch, verification, and finalization.

Graph Nodes

classify → plan (if complex) → dispatch_step → verify → finalize
                                    ↑               │
                                    └───────────────┘
                                     (re-plan loop)

classify: Heuristic-based classification (simple, complex, or research) — see Routing section
plan: Invokes PlannerAgent to decompose the query into steps
dispatch_step: Invokes ExecutorAgent for the current step, advances step index
verify: Checks step results; if unsatisfactory, re-invokes PlannerAgent with revision_context and loops back to dispatch
finalize: Aggregates results from all steps, extracts final response, streams completion events

Step Dispatch Loop

For complex queries, the Orchestrator iterates through plan steps:

PlannerAgent generates list[PlanStep]
For each step, Orchestrator dispatches ExecutorAgent with step-specific context
After all steps, Orchestrator runs verification
If verification fails, re-plans with feedback and re-dispatches remaining steps
Emits plan_step_completed and todo_update events as each step finishes

Todo File Persistence

The Orchestrator maintains a persistent todo checklist in the sandbox filesystem at /home/user/.hyperagent/todo.md. This prevents goal drift across long multi-step executions.

On plan creation: writes execution plan as markdown checklist to sandbox
Each step dispatch: reads current todo state and injects as [Active Task Context] system message (capped at 2000 chars)
On step completion: updates checklist items in sandbox file
Emits todo_update events for frontend rendering

Handoff

ExecutorAgent can delegate to Research agent via delegate_to_research tool (max 3 handoffs). Handoffs can include handoff_artifacts — files transferred from the source sandbox to the target sandbox via storage (see Cross-Sandbox Handoff Artifacts).

Event Streaming

Normalizes and deduplicates events from LangGraph via StreamProcessor.

State Hierarchy

OrchestratorState (main graph)
├── PlannerState (planner subgraph)
├── ExecutorState (executor subgraph, replaces TaskState)
└── ResearchState (unchanged)

LLM Provider System

Built-in Providers

Anthropic, OpenAI, Google Gemini — each with per-tier model defaults.

Custom Providers

Any OpenAI-compatible API can be registered via OPENAI_COMPATIBLE_PROVIDERS env var:

[{
  "name": "qwen",
  "api_key": "sk-...",
  "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "tier_models": {"pro": "qwen3.5-plus", "flash": "qwen3.5-flash"},
  "enable_thinking": true
}]

Model Tiers

Tier	Purpose	Example (Anthropic)
MAX	Best quality, complex reasoning	claude-opus-4
PRO	Balanced quality/speed	claude-sonnet-4
FLASH	Fast, cost-efficient	claude-3.5-haiku

Per-tier provider overrides: MAX_MODEL_PROVIDER, PRO_MODEL_PROVIDER, LITE_MODEL_PROVIDER.

Thinking Mode

File: backend/app/ai/thinking.py

ThinkingAwareChatOpenAI handles providers that return reasoning_content (Qwen, DeepSeek, Kimi):

Always captures reasoning_content from API responses (streaming and non-streaming)
Auto-detects thinking mode when reasoning_content first appears
Patches outgoing assistant messages with captured reasoning_content for multi-turn replay

Human-in-the-Loop (HITL)

Architecture

Redis pub/sub-based interrupt lifecycle:

Agent creates interrupt via ask_user tool
Interrupt stored in Redis with TTL, streamed as SSE event to frontend
Frontend shows approval/decision/input dialog
User response published to Redis channel
Agent receives response and continues

Interrupt Types

APPROVAL — Approve/deny high-risk tool execution (120s timeout)
DECISION — Choose between multiple options (300s timeout)
INPUT — Free-form text input (300s timeout)

Tool Risk Assessment

High-risk tools require user approval before execution. Users can "approve always" to auto-approve specific tools for the session.

Safety Guardrails

Three scanners integrated at different points in the request lifecycle:

Input Scanner

Scans user input before processing:

Prompt injection detection via llm-guard
Jailbreak pattern matching (regex-based)

Output Scanner

Scans LLM responses before streaming:

Toxicity detection
PII detection with redaction
Harmful content pattern matching

Tool Scanner

Validates tool arguments before execution:

URL validation (blocks file://, localhost, private IPs)
Code safety (blocks rm -rf /, fork bombs, remote code execution)

Configuration

Setting	Default	Description
`GUARDRAILS_ENABLED`	`true`	Master toggle
`GUARDRAILS_INPUT_ENABLED`	`true`	Input scanning
`GUARDRAILS_OUTPUT_ENABLED`	`true`	Output scanning
`GUARDRAILS_TOOL_ENABLED`	`true`	Tool argument scanning
`GUARDRAILS_VIOLATION_ACTION`	`block`	Action: `block`, `warn`, `log`
`GUARDRAILS_TIMEOUT_MS`	`500`	Scan timeout

Sandbox System

Providers

	E2B	BoxLite
Type	Cloud	Local (Docker)
Requires	`E2B_API_KEY`	Docker
Code execution	Python, JS, TS, Bash	Python, JS, TS, Bash
Browser automation	E2B Desktop	Docker desktop image
App hosting	Cloud URLs	Local ports

Configured via SANDBOX_PROVIDER env var.

Sandbox Types

Code Executor — Run code snippets with output capture
Desktop Executor — Browser automation with screenshots and streaming
App Runtime — Scaffold, build, and host web applications

Unified Sandbox Manager

File: backend/app/sandbox/unified_sandbox_manager.py

The UnifiedSandboxManager provides a single shared SandboxRuntime for both code execution and app development within one agent run, avoiding the overhead of separate VMs for the same task.

Session key: unified:{user_id}:{task_id}
Default timeout: 30 minutes
get_or_create_runtime(user_id, task_id) — shared runtime for both code and app
get_code_executor(user_id, task_id) — wraps shared runtime as BaseCodeExecutor
get_app_session(user_id, task_id, template) — wraps shared runtime as AppSandboxSession
Desktop sandboxes remain separate (different VM image requirement)
Cleanup: cleanup_sandboxes_for_task() prioritizes unified sessions first

Persistent Sandbox Snapshots

File: backend/app/services/snapshot_service.py

Sandbox state (installed packages, generated files) is preserved across SSE disconnects and timeouts via workspace snapshots.

save_snapshot(runtime, user_id, task_id, sandbox_type) — tar key directories, upload to storage
restore_snapshot(runtime, user_id, task_id, sandbox_type) — download and restore on reconnect
Auto-snapshot on disconnect/cleanup (execution and app managers)
Storage: R2 (production) or local filesystem (development)
Retention: 24 hours (configurable via SNAPSHOT_RETENTION_HOURS)
Max size: 100MB per snapshot (configurable via SNAPSHOT_MAX_SIZE_BYTES)
Default paths: /home/user, /tmp/outputs (execution); /home/user/app (app)
DB model: SandboxSnapshot with indexes on (user_id, task_id, sandbox_type)

Cross-Sandbox Handoff Artifacts

File: backend/app/sandbox/artifact_transfer.py

When agents hand off work (Task → Research), files from the source sandbox can be transferred to the target sandbox.

collect_artifacts(runtime, patterns, max_files=10, max_size_mb=50) — find and upload files
restore_artifacts(runtime, artifacts) — download files into target sandbox
cleanup_artifacts(artifacts) — remove transferred files from storage after completion
Default patterns: *.py, *.csv, *.json, *.txt, *.md, *.html, *.js, *.ts
HandoffInfo includes optional handoff_artifacts field
Orchestrator restores artifacts and appends summary to handoff context

Hybrid CodeAct Mode

File: backend/app/agents/tools/codeact.py

An opt-in execute_script tool that accepts multi-line Python scripts with access to a pre-installed hyperagent helper library in the sandbox. Gated behind execution_mode: "codeact" configuration.

Helper Library

The hyperagent library (backend/app/sandbox/hyperagent_lib/__init__.py) is auto-installed in the sandbox on first use and provides:

Function	Description
`hyperagent.web_search(query)`	Search the web
`hyperagent.read_file(path)`	Read a file
`hyperagent.write_file(path, content)`	Write a file
`hyperagent.run_command(cmd)`	Run a shell command
`hyperagent.browse(url)`	Fetch a URL
`hyperagent.list_files(dir)`	List directory contents

Execution Flow

Agent emits execute_script tool call with multi-line Python code
Sandbox session is retrieved or created (via ExecutionSandboxManager)
hyperagent library installed if not already present (tracked per sandbox ID)
Script written to /tmp/hyperagent/current_script.py and executed
Returns JSON with success, stdout, stderr, exit_code, created_files

Configuration

Setting	Default	Description
`execution_mode`	`"standard"`	Set to `"codeact"` to enable `execute_script` tool

Event System

30+ event types streamed via SSE:

Lifecycle: stage, complete, error Content: token, image, code_result Tools: tool_call, tool_result Research: source, routing, handoff Sandbox: browser_stream, browser_action, workspace_update, terminal_command, terminal_output, terminal_error, terminal_complete Skills: skill_output, plan_step HITL: interrupt, interrupt_response Task Planning: plan_overview, plan_step_completed, todo_update Parallel Execution: parallel_start, parallel_task, parallel_complete

Context Compression

Configuration

Setting	Default	Description
`CONTEXT_COMPRESSION_ENABLED`	`true`	Enable/disable
`CONTEXT_COMPRESSION_TOKEN_THRESHOLD`	`60000`	Token count trigger
`CONTEXT_COMPRESSION_PRESERVE_RECENT`	`10`	Recent messages to keep

Process

Estimate token count before each LLM call
If above threshold, separate system/old/recent messages
Summarize old messages using FLASH-tier LLM
Inject summary as system context message
Preserve tool message pairs (AIMessage + ToolMessage)
Fall back to truncation if compression fails
Maintain stable prefix hash across iterations to maximize KV-cache reuse

Backward Compatibility

Deprecated agent types are mapped to ExecutorAgent:

IMAGE → ExecutorAgent + image_generation skill
WRITING → ExecutorAgent (handled directly by LLM)
CODE → ExecutorAgent + code_generation skill
DATA → ExecutorAgent + data_analysis skill
APP → ExecutorAgent + app_builder skill
SLIDE → ExecutorAgent + slide_generation skill

Deprecated Files

The following files are deprecated but kept for backward compatibility. Imports are redirected to their replacements:

backend/app/agents/supervisor.py → use backend/app/agents/orchestrator.py (AgentOrchestrator)
backend/app/agents/subagents/task.py → use backend/app/agents/executor.py (ExecutorAgent)

FilesExpand file tree

Agent-System-Design.md

Latest commit

History

Agent-System-Design.md

File metadata and controls

Agent System Design

Overview

Core Philosophy

Architecture

Routing

Classification Heuristics

Mode-to-Agent Mapping

ExecutorAgent

ReAct Loop

Available Tools

Anti-Repetition Detection

Context Compression

KV-Cache-Friendly Prompt Construction

PlannerAgent

Input

Output

Details

Research Agent

Pipeline

Skills System

Base Classes

Builtin Skills

Invocation

Orchestrator

Graph Nodes

Step Dispatch Loop

Todo File Persistence

Handoff

Event Streaming

State Hierarchy

LLM Provider System

Built-in Providers

Custom Providers

Model Tiers

Thinking Mode

Human-in-the-Loop (HITL)

Architecture

Interrupt Types

Tool Risk Assessment

Safety Guardrails

Input Scanner

Output Scanner

Tool Scanner

Configuration

Sandbox System

Providers

Sandbox Types

Unified Sandbox Manager

Persistent Sandbox Snapshots

Cross-Sandbox Handoff Artifacts

Hybrid CodeAct Mode

Helper Library

Execution Flow

Configuration

Event System

Context Compression

Configuration

Process

Backward Compatibility

Deprecated Files