HyperAgent uses a 3-component architecture with an orchestrator, two specialized agents, and a composable skills system:
- Orchestrator — Classifies queries as simple or complex, invokes PlannerAgent for complex tasks, and dispatches ExecutorAgent per step.
- ExecutorAgent — Self-contained ReAct loop for executing a single step or task. Extracted from the old Task Agent. Handles ~80% of requests including chat, data analysis, app building, image generation, slide creation, and browser automation.
- PlannerAgent — Decomposes complex queries into a structured list of steps for the Orchestrator to dispatch.
- Research Agent — Specialized multi-step research with search, analysis, synthesis, and report writing (unchanged).
Skills = Composable capabilities (LangGraph subgraphs invoked as tools) Agents = Workflow orchestration (ReAct loops with tool calling) Tools = Atomic operations (web search, code execution, browser actions)
┌──────────────────────────────────┐
│ User Request │
└───────────────┬──────────────────┘
│
┌──────▼──────┐
│ Orchestrator │
│ (classify) │
└──────┬───────┘
│
┌─────────┼──────────────┐
│ │ │
simple complex research mode
│ │ │
│ ┌────▼─────┐ │
│ │ Planner │ │
│ │ Agent │ │
│ └────┬─────┘ │
│ │ steps │
│ ┌────▼─────┐ │
│ │ Executor │ │
▼ │ (per step)│ ▼
┌───────────┐ │ ┌───────────┐
│ EXECUTOR │◄─┘ │ RESEARCH │
│ Agent │ │ Agent │
│ │ │ │
│ ReAct loop│ ────► │ search → │
│ + tools │ hand │ analyze → │
│ + skills │ off │ synthesize│
└─────┬─────┘ │ → write │
│ └───────────┘
│ │
│ ┌────▼─────┐
│ │ verify │
│ │ + re-plan │
│ └────┬──────┘
│ │
└────┬────┘
│
┌────▼─────┐
│ finalize │
└────┬──────┘
│ invoke_skill
▼
┌──────────────────────────┐
│ Skills System │
├──────────────────────────┤
│ image_generation │
│ code_generation │
│ web_research │
│ data_analysis │
│ slide_generation │
│ app_builder │
│ task_planning │
└──────────────────────────┘
File: backend/app/agents/classifier.py
The Orchestrator uses heuristic-based classification (no LLM call) to determine query complexity and routing:
Is it research mode?
YES → Research Agent
Is it a dedicated mode (app, image, slide, data)?
YES → simple → ExecutorAgent (direct skill invocation)
Is the query short / single-step?
YES → simple → ExecutorAgent
Does the query contain multi-step patterns or complexity keywords?
YES → complex → PlannerAgent → ExecutorAgent (per step)
Default → simple → ExecutorAgent
- Dedicated modes (app, image, slide, data) → always classified as
simple - Short queries (few tokens, single sentence) →
simple - Multi-step patterns (e.g., "first... then...", "step 1... step 2...") →
complex - Complexity keywords (e.g., "analyze and compare", "build a pipeline") →
complex
| Mode | Agent | Notes |
|---|---|---|
task (default) |
ExecutorAgent | General chat, Q&A |
research |
Research | Deep multi-source research |
data |
ExecutorAgent | Data analysis via code execution |
app |
ExecutorAgent | Direct skill invocation → app_builder |
image |
ExecutorAgent | Direct skill invocation → image_generation |
slide |
ExecutorAgent | Direct skill invocation → slide_generation |
For dedicated modes (app, image, slide), the ExecutorAgent bypasses LLM reasoning and directly synthesizes an invoke_skill tool call on the first iteration.
File: backend/app/agents/executor.py
A self-contained ReAct loop for executing a single step or task. Extracted from the old Task Agent, the ExecutorAgent focuses purely on execution — plan tracking and verification have been moved to the Orchestrator.
reason → act → wait_interrupt → reason → ... → complete
- reason: LLM reasons about what to do, may emit tool calls
- act: Executes tool calls (with HITL approval for high-risk tools)
- wait_interrupt: Waits for user response to
ask_userinterrupts - complete: Signals step/task completion, returns result to caller
| Category | Tools |
|---|---|
| Search | web_search |
| Image | generate_image, analyze_image |
| Browser | browser_navigate, browser_screenshot, browser_click, browser_type, browser_press_key, browser_scroll, browser_get_stream_url |
| Code | execute_code |
| CodeAct | execute_script (opt-in via execution_mode: "codeact") |
| Data | sandbox_file |
| App | create_app_project, app_write_file, app_install_packages, app_start_server |
| Skills | invoke_skill, list_skills |
| Slides | generate_slides |
| Handoff | delegate_to_research |
| HITL | ask_user |
Detects when the agent falls into repetitive tool-calling patterns (same error → same retry):
- Computes MD5 hash of
tool_name + sorted(args)for each tool call batch - Tracks consecutive identical hashes in
last_tool_calls_hashstate field - After 3+ consecutive identical batches, injects a variation prompt suggesting alternative approaches
- Fires before the existing
plan_revisionmechanism (which triggers at 5 consecutive errors)
Applied in reason_node before each LLM call when token count exceeds threshold:
- Estimates token count for all messages
- If above threshold (default 60k), compresses older messages using FLASH-tier LLM
- Injects summary as system context message
- Falls back to message truncation if compression fails
Messages are split into a "stable prefix" and "dynamic suffix" to maximize LLM KV-cache hit rates:
- Stable prefix: System prompt + tool schemas + history summary (never reordered between iterations)
- Dynamic suffix: New tool calls/results appended after prefix
prefix_hashin state tracks whether the prefix has changed; if unchanged, the LLM can reuse cached KV entries- Tool filtering uses "soft disable" (system message noting unavailable tools) instead of removing tool schemas, preserving prefix stability
- Helpers:
get_stable_prefix()andget_dynamic_suffix()incontext_compression.py
File: backend/app/agents/planner.py
A single-node subgraph that decomposes complex queries into structured execution steps for the Orchestrator.
- User query
- Conversation context
- Optional
revision_context(feedback from a failed verification, used for re-planning)
list[PlanStep]— ordered list of steps, each with a description, expected outcome, and dependencies
- Uses PRO tier LLM for balanced quality and speed
- Stateless: no persistent memory between invocations; re-planning uses explicit
revision_context - Emits
plan_overviewevent at plan creation (with all steps) for frontend rendering - Frontend renders plan as interactive checklist with progress bar
File: backend/app/agents/subagents/research.py
init_config → search_loop → analyze → synthesize → write_report
- init_config: Determines search depth and scenario (academic, market, technical, news)
- search_loop: Iterative web search with source collection
- analyze: Analyzes gathered sources for relevance and key findings
- synthesize: Synthesizes findings across sources
- write_report: Generates structured report with citations
File: backend/app/agents/skills/skill_base.py
Skill— Base class withcreate_graph()returning a LangGraphStateGraphToolSkill— Simplified subclass withexecute()(auto-generates single-node graph)SkillMetadata— Pydantic model: id, version, description, category, parameters, output_schemaSkillContext— Execution context withinvoke_skill()for skill composition
| Skill | Category | Description |
|---|---|---|
image_generation |
creative | AI image generation via Gemini or DALL-E |
code_generation |
code | Generate code snippets for specific tasks |
web_research |
research | Focused web research with source summarization |
data_analysis |
data | Full data analysis: plan, execute code in sandbox, summarize results |
slide_generation |
creative | Create PPTX presentations with research and outlines |
app_builder |
automation | Build web apps (React, Next.js, Vue, Express, FastAPI, Flask) with live preview. Planning uses MAX tier, code generation uses PRO. |
task_planning |
automation | Analyze complex tasks and create execution plans |
Agents invoke skills via two tools:
invoke_skill(skill_id, params)— Execute a skill with parameterslist_skills()— Discover available skills
Skills can compose with each other via SkillContext.invoke_skill().
File: backend/app/agents/orchestrator.py
The AgentOrchestrator class is the main entry point for all agent workflows, replacing the old AgentSupervisor. It manages query classification, planning, step dispatch, verification, and finalization.
classify → plan (if complex) → dispatch_step → verify → finalize
↑ │
└───────────────┘
(re-plan loop)
- classify: Heuristic-based classification (simple, complex, or research) — see Routing section
- plan: Invokes PlannerAgent to decompose the query into steps
- dispatch_step: Invokes ExecutorAgent for the current step, advances step index
- verify: Checks step results; if unsatisfactory, re-invokes PlannerAgent with
revision_contextand loops back to dispatch - finalize: Aggregates results from all steps, extracts final response, streams completion events
For complex queries, the Orchestrator iterates through plan steps:
- PlannerAgent generates
list[PlanStep] - For each step, Orchestrator dispatches ExecutorAgent with step-specific context
- After all steps, Orchestrator runs verification
- If verification fails, re-plans with feedback and re-dispatches remaining steps
- Emits
plan_step_completedandtodo_updateevents as each step finishes
The Orchestrator maintains a persistent todo checklist in the sandbox filesystem at /home/user/.hyperagent/todo.md. This prevents goal drift across long multi-step executions.
- On plan creation: writes execution plan as markdown checklist to sandbox
- Each step dispatch: reads current todo state and injects as
[Active Task Context]system message (capped at 2000 chars) - On step completion: updates checklist items in sandbox file
- Emits
todo_updateevents for frontend rendering
ExecutorAgent can delegate to Research agent via delegate_to_research tool (max 3 handoffs). Handoffs can include handoff_artifacts — files transferred from the source sandbox to the target sandbox via storage (see Cross-Sandbox Handoff Artifacts).
Normalizes and deduplicates events from LangGraph via StreamProcessor.
OrchestratorState (main graph)
├── PlannerState (planner subgraph)
├── ExecutorState (executor subgraph, replaces TaskState)
└── ResearchState (unchanged)
Anthropic, OpenAI, Google Gemini — each with per-tier model defaults.
Any OpenAI-compatible API can be registered via OPENAI_COMPATIBLE_PROVIDERS env var:
[{
"name": "qwen",
"api_key": "sk-...",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"tier_models": {"pro": "qwen3.5-plus", "flash": "qwen3.5-flash"},
"enable_thinking": true
}]| Tier | Purpose | Example (Anthropic) |
|---|---|---|
| MAX | Best quality, complex reasoning | claude-opus-4 |
| PRO | Balanced quality/speed | claude-sonnet-4 |
| FLASH | Fast, cost-efficient | claude-3.5-haiku |
Per-tier provider overrides: MAX_MODEL_PROVIDER, PRO_MODEL_PROVIDER, LITE_MODEL_PROVIDER.
File: backend/app/ai/thinking.py
ThinkingAwareChatOpenAI handles providers that return reasoning_content (Qwen, DeepSeek, Kimi):
- Always captures
reasoning_contentfrom API responses (streaming and non-streaming) - Auto-detects thinking mode when
reasoning_contentfirst appears - Patches outgoing assistant messages with captured
reasoning_contentfor multi-turn replay
Redis pub/sub-based interrupt lifecycle:
- Agent creates interrupt via
ask_usertool - Interrupt stored in Redis with TTL, streamed as SSE event to frontend
- Frontend shows approval/decision/input dialog
- User response published to Redis channel
- Agent receives response and continues
- APPROVAL — Approve/deny high-risk tool execution (120s timeout)
- DECISION — Choose between multiple options (300s timeout)
- INPUT — Free-form text input (300s timeout)
High-risk tools require user approval before execution. Users can "approve always" to auto-approve specific tools for the session.
Three scanners integrated at different points in the request lifecycle:
Scans user input before processing:
- Prompt injection detection via
llm-guard - Jailbreak pattern matching (regex-based)
Scans LLM responses before streaming:
- Toxicity detection
- PII detection with redaction
- Harmful content pattern matching
Validates tool arguments before execution:
- URL validation (blocks
file://,localhost, private IPs) - Code safety (blocks
rm -rf /, fork bombs, remote code execution)
| Setting | Default | Description |
|---|---|---|
GUARDRAILS_ENABLED |
true |
Master toggle |
GUARDRAILS_INPUT_ENABLED |
true |
Input scanning |
GUARDRAILS_OUTPUT_ENABLED |
true |
Output scanning |
GUARDRAILS_TOOL_ENABLED |
true |
Tool argument scanning |
GUARDRAILS_VIOLATION_ACTION |
block |
Action: block, warn, log |
GUARDRAILS_TIMEOUT_MS |
500 |
Scan timeout |
| E2B | BoxLite | |
|---|---|---|
| Type | Cloud | Local (Docker) |
| Requires | E2B_API_KEY |
Docker |
| Code execution | Python, JS, TS, Bash | Python, JS, TS, Bash |
| Browser automation | E2B Desktop | Docker desktop image |
| App hosting | Cloud URLs | Local ports |
Configured via SANDBOX_PROVIDER env var.
- Code Executor — Run code snippets with output capture
- Desktop Executor — Browser automation with screenshots and streaming
- App Runtime — Scaffold, build, and host web applications
File: backend/app/sandbox/unified_sandbox_manager.py
The UnifiedSandboxManager provides a single shared SandboxRuntime for both code execution and app development within one agent run, avoiding the overhead of separate VMs for the same task.
- Session key:
unified:{user_id}:{task_id} - Default timeout: 30 minutes
get_or_create_runtime(user_id, task_id)— shared runtime for both code and appget_code_executor(user_id, task_id)— wraps shared runtime asBaseCodeExecutorget_app_session(user_id, task_id, template)— wraps shared runtime asAppSandboxSession- Desktop sandboxes remain separate (different VM image requirement)
- Cleanup:
cleanup_sandboxes_for_task()prioritizes unified sessions first
File: backend/app/services/snapshot_service.py
Sandbox state (installed packages, generated files) is preserved across SSE disconnects and timeouts via workspace snapshots.
save_snapshot(runtime, user_id, task_id, sandbox_type)— tar key directories, upload to storagerestore_snapshot(runtime, user_id, task_id, sandbox_type)— download and restore on reconnect- Auto-snapshot on disconnect/cleanup (execution and app managers)
- Storage: R2 (production) or local filesystem (development)
- Retention: 24 hours (configurable via
SNAPSHOT_RETENTION_HOURS) - Max size: 100MB per snapshot (configurable via
SNAPSHOT_MAX_SIZE_BYTES) - Default paths:
/home/user,/tmp/outputs(execution);/home/user/app(app) - DB model:
SandboxSnapshotwith indexes on (user_id, task_id, sandbox_type)
File: backend/app/sandbox/artifact_transfer.py
When agents hand off work (Task → Research), files from the source sandbox can be transferred to the target sandbox.
collect_artifacts(runtime, patterns, max_files=10, max_size_mb=50)— find and upload filesrestore_artifacts(runtime, artifacts)— download files into target sandboxcleanup_artifacts(artifacts)— remove transferred files from storage after completion- Default patterns:
*.py,*.csv,*.json,*.txt,*.md,*.html,*.js,*.ts HandoffInfoincludes optionalhandoff_artifactsfield- Orchestrator restores artifacts and appends summary to handoff context
File: backend/app/agents/tools/codeact.py
An opt-in execute_script tool that accepts multi-line Python scripts with access to a pre-installed hyperagent helper library in the sandbox. Gated behind execution_mode: "codeact" configuration.
The hyperagent library (backend/app/sandbox/hyperagent_lib/__init__.py) is auto-installed in the sandbox on first use and provides:
| Function | Description |
|---|---|
hyperagent.web_search(query) |
Search the web |
hyperagent.read_file(path) |
Read a file |
hyperagent.write_file(path, content) |
Write a file |
hyperagent.run_command(cmd) |
Run a shell command |
hyperagent.browse(url) |
Fetch a URL |
hyperagent.list_files(dir) |
List directory contents |
- Agent emits
execute_scripttool call with multi-line Pythoncode - Sandbox session is retrieved or created (via
ExecutionSandboxManager) hyperagentlibrary installed if not already present (tracked per sandbox ID)- Script written to
/tmp/hyperagent/current_script.pyand executed - Returns JSON with
success,stdout,stderr,exit_code,created_files
| Setting | Default | Description |
|---|---|---|
execution_mode |
"standard" |
Set to "codeact" to enable execute_script tool |
30+ event types streamed via SSE:
Lifecycle: stage, complete, error
Content: token, image, code_result
Tools: tool_call, tool_result
Research: source, routing, handoff
Sandbox: browser_stream, browser_action, workspace_update, terminal_command, terminal_output, terminal_error, terminal_complete
Skills: skill_output, plan_step
HITL: interrupt, interrupt_response
Task Planning: plan_overview, plan_step_completed, todo_update
Parallel Execution: parallel_start, parallel_task, parallel_complete
| Setting | Default | Description |
|---|---|---|
CONTEXT_COMPRESSION_ENABLED |
true |
Enable/disable |
CONTEXT_COMPRESSION_TOKEN_THRESHOLD |
60000 |
Token count trigger |
CONTEXT_COMPRESSION_PRESERVE_RECENT |
10 |
Recent messages to keep |
- Estimate token count before each LLM call
- If above threshold, separate system/old/recent messages
- Summarize old messages using FLASH-tier LLM
- Inject summary as system context message
- Preserve tool message pairs (AIMessage + ToolMessage)
- Fall back to truncation if compression fails
- Maintain stable prefix hash across iterations to maximize KV-cache reuse
Deprecated agent types are mapped to ExecutorAgent:
- IMAGE → ExecutorAgent +
image_generationskill - WRITING → ExecutorAgent (handled directly by LLM)
- CODE → ExecutorAgent +
code_generationskill - DATA → ExecutorAgent +
data_analysisskill - APP → ExecutorAgent +
app_builderskill - SLIDE → ExecutorAgent +
slide_generationskill
The following files are deprecated but kept for backward compatibility. Imports are redirected to their replacements:
backend/app/agents/supervisor.py→ usebackend/app/agents/orchestrator.py(AgentOrchestrator)backend/app/agents/subagents/task.py→ usebackend/app/agents/executor.py(ExecutorAgent)