Edge-Cloud Collaborative AI Agent
EdgeClaw: Keep sensitive data off the cloud, let cheap models handle 80% of requests
γδΈζ | Englishγ
What's New π₯
- [2026.03.13] π EdgeClaw adds Cost-Aware Collaboration: automatically determines task complexity and matches the most economical cloud model
- [2026.02.12] π EdgeClaw is officially open-sourced β an Edge-Cloud Collaborative AI Agent
EdgeClaw is an Edge-Cloud Collaborative AI Agent jointly developed by THUNLP (Tsinghua University), Renmin University of China, AI9Stars, ModelBest, and OpenBMB, built on top of OpenClaw.
In current AI Agent architectures, the edge side has long been overlooked β all data and tasks are funneled to the cloud, leading to privacy leaks and wasted compute. EdgeClaw reactivates the value of edge computing by constructing a customizable three-tier security system (S1 Passthrough / S2 Desensitization / S3 Local). Through a dual-engine on the edge (rule-based detection ~0ms + local LLM semantic detection ~1β2s), it classifies the sensitivity and complexity of every request in real time, then routes each request through a unified composable pipeline to the most privacy-safe and cost-effective processing path. With intelligent edge-cloud forwarding, developers can achieve seamless privacy protection β "public data to cloud, sensitive data desensitized, private data stays local" β without modifying any business logic.
|
π€ Edge-Cloud Division of Labor The edge perceives data attributes (sensitivity, complexity); the cloud handles reasoning and generation. The edge covers the cloud's blind spots (sensitive data never leaves the device), while the cloud compensates for the edge's limitations (complex tasks are offloaded to the cloud). |
π Three-Tier Security Collaboration Safe data (S1) β sent directly to the cloud; Sensitive data (S2) β desensitized on-device before forwarding to the cloud; Private data (S3) β processed entirely on-device, with the cloud only maintaining context continuity. |
|
π° Cost-Aware Collaboration A local LLM semantically judges task complexity, routing simple tasks to cheap models and reserving expensive models for complex tasks only. In typical workflows, 60β80% of requests are forwarded to low-cost models, drastically cutting cloud token expenses. |
π Plug-and-Play, Zero Code Changes EdgeClaw automatically intercepts and routes via its Hook mechanism β no modifications to any business logic required. It serves as a seamless drop-in replacement for OpenClaw. |
Every user message, tool call, and tool result is inspected in real time and automatically classified into one of three levels:
| Level | Meaning | Routing Strategy | Example |
|---|---|---|---|
| S1 | Safe | Send directly to cloud model | "Write a poem about spring" |
| S2 | Sensitive | Desensitize then forward to cloud | Addresses, phone numbers, emails |
| S3 | Private | Process locally only | Pay slips, passwords, SSH keys |
| Engine | Mechanism | Latency | Coverage |
|---|---|---|---|
| Rule Detector | Keywords + Regex matching | ~0ms | Known patterns: API keys, DB connection strings, PEM key headers |
| Local LLM Detector | Semantic understanding (runs on a local small model) | ~1β2s | Contextual reasoning: "Analyze this pay slip for me", addresses in various languages |
The two engines can be stacked and combined, flexibly enabled per scenario via the checkpoints configuration.
User Message (containing PII)
β
βΌ
Local LLM Detection β S2
β
βΌ
Local LLM Extracts PII β JSON Array
β
βΌ
Programmatic PII Replacement β [REDACTED:PHONE], [REDACTED:ADDRESS]
β
βΌ
Privacy Proxy (localhost:8403)
βββ Strips PII markers
βββ Forwards to cloud model
βββ Passes through response (supports SSE streaming)
User Message (containing private data)
β
βΌ
Detection β S3
β
βΌ
Forward to Local Guard Agent
βββ Uses local LLM (Ollama / vLLM)
βββ Full data visible, entirely local inference
βββ Cloud-side history only receives π placeholder
~/.openclaw/workspace/
βββ MEMORY.md β What the cloud model sees (auto-desensitized)
βββ MEMORY-FULL.md β What the local model sees (complete data)
β
agents/{id}/sessions/
βββ full/ β Complete history (including Guard Agent interactions)
βββ clean/ β Clean history (for cloud model consumption)
The cloud model never sees MEMORY-FULL.md or sessions/full/ β the Hook system intercepts at the file access layer.
Theorem 1 (Cloud-Side Invisibility): For any S3-level data x, its original content is completely invisible to the cloud:
β x, Β Detect(x) = Sβ Β βΉΒ x β Cloud(x)
Theorem 2 (Desensitization Completeness): For any S2-level data x, the cloud-visible form contains none of the original privacy entity values:
β x, Β Detect(x) = Sβ Β βΉΒ β (ti, vi) β Extract(x), Β vi β Cloud(x)
In a typical AI coding assistant workflow, most requests involve browsing files, reading code, and simple Q&A β using the most expensive model for these tasks is pure waste. Cost-Aware Collaboration uses a local small model as an LLM-as-Judge, classifying requests by complexity and routing them to cloud models at different price tiers.
| Complexity | Task Examples | Default Target Model |
|---|---|---|
| SIMPLE | Queries, translation, formatting, greetings | gpt-4o-mini |
| MEDIUM | Code generation, single-file editing, email drafting | gpt-4o |
| COMPLEX | System design, multi-file refactoring, cross-document analysis | claude-sonnet-4.6 |
| REASONING | Mathematical proofs, formal logic, experiment design | o4-mini |
| Approach | Pros | Cons |
|---|---|---|
| Keyword Rules | Fast | No semantic understanding, high false-positive rate |
| LLM-as-Judge | Semantic understanding, multilingual | One additional local model call (~1β2s) |
The Judge runs on a local small model (e.g., MiniCPM-4.1 / Qwen3.5), with latency of approximately 1β2 seconds.
Prompt hash caching (SHA-256, TTL 5 minutes) β identical requests are not re-judged, further reducing latency overhead.
In a typical coding assistant workflow, Cost-Aware Collaboration can route 60β80% of requests to cheaper models.
Security collaboration and cost-aware collaboration run in the same pipeline, working together via weights and a two-phase short-circuit strategy:
User Message
β
βΌ
RouterPipeline.run()
β
βββ Phase 1: Fast routers (weight β₯ 50) run in parallel
β βββ security router β three-tier sensitivity detection
β
βββ Short-circuit: If Phase 1 detects sensitive data β skip Phase 2
β
βββ Phase 2: Slow routers (weight < 50) run on demand
βββ cost-aware router β LLM Judge task complexity classification
Design Philosophy: Security first β the security router runs first with high weight. If sensitive data is found, it short-circuits immediately without wasting time on complexity judgment. Cost-aware collaboration kicks in only after the security check passes (S1).
β§ ΞΈ_cloud(m) if a = passthrough
m β[c_msg]β Detect(m) β l β[c_route]β R(l) β a β β¨ ΞΈ_cloud(De(m)) if a = desensitize
β© ΞΈ_local(m) if a = redirect
β[c_persist]β W(m, l) β[c_end]β Sync
| Hook | Trigger Point | Core Responsibility |
|---|---|---|
before_model_resolve |
Before model selection | Run pipeline β routing decision |
before_prompt_build |
Before prompt construction | Inject Guard Prompt / S2 markers |
before_tool_call |
Before tool invocation | File access guard + sub-agent guard |
after_tool_call |
After tool invocation | Tool result detection |
tool_result_persist |
Result persistence | Dual-track session write |
before_message_write |
Before message write | S3 β placeholder, S2 β desensitized version |
session_end |
Session ends | Memory synchronization |
message_sending |
Outbound message | Detect and desensitize/cancel |
before_agent_start |
Before sub-agent starts | Task content guard |
message_received |
Message received | Observability logging |
We provide two installation methods: from source (recommended) and local LLM environment setup.
git clone https://github.com/openbmb/edgeclaw.git
cd edgeclaw
pnpm install
pnpm build
pnpm ui:build
pnpm openclaw onboard --install-daemonEdgeClaw requires a local inference backend for privacy detection and the Guard Agent. We recommend Ollama:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the recommended model
ollama pull openbmb/minicpm4.1
# Start the service
ollama serveAll OpenAI-compatible APIs are also supported, including vLLM, LMStudio, SGLang, TGI, etc. See config.example.json for backend configuration examples.
pnpm openclaw gateway runIf you see the GuardClaw plugin loading logs, the installation was successful.
Add the following to openclaw.json:
{
"plugins": {
"entries": {
"GuardClaw": {
"enabled": true,
"config": {
"privacy": {
"enabled": true,
"localModel": {
"enabled": true,
"provider": "ollama",
"model": "openbmb/minicpm4.1",
"endpoint": "http://localhost:11434"
},
"guardAgent": {
"id": "guard",
"workspace": "~/.openclaw/workspace-guard",
"model": "ollama/openbmb/minicpm4.1"
}
}
}
}
}
}
}Register the Guard Agent in the agents section of openclaw.json:
{
"agents": {
"list": [
{
"id": "main",
"workspace": "~/.openclaw/workspace-main",
"subagents": { "allowAgents": ["guard"] }
},
{
"id": "guard",
"workspace": "~/.openclaw/workspace-guard",
"model": "ollama/openbmb/minicpm4.1"
}
]
}
}pnpm openclaw gateway runEdgeClaw automatically intercepts and routes β no modifications to any business logic required.
Enable it in privacy.routers:
{
"privacy": {
"routers": {
"token-saver": {
"enabled": true,
"weight": 30,
"options": {
"tiers": {
"SIMPLE": { "provider": "openai", "model": "gpt-4o-mini" },
"MEDIUM": { "provider": "openai", "model": "gpt-4o" },
"COMPLEX": { "provider": "anthropic", "model": "claude-sonnet-4.6" },
"REASONING": { "provider": "openai", "model": "o4-mini" }
}
}
}
}
}
}{
"privacy": {
"rules": {
"keywords": {
"S2": ["password", "api_key", "token"],
"S3": ["ssh", "id_rsa", "private_key", ".pem"]
},
"patterns": {
"S2": ["(?:mysql|postgres|mongodb)://[^\\s]+"],
"S3": ["-----BEGIN (?:RSA |EC )?PRIVATE KEY-----"]
},
"tools": {
"S2": { "tools": ["exec", "shell"], "paths": ["~/secrets"] },
"S3": { "tools": ["sudo"], "paths": ["~/.ssh", "~/.aws"] }
}
}
}
}{
"privacy": {
"checkpoints": {
"onUserMessage": ["ruleDetector", "localModelDetector"],
"onToolCallProposed": ["ruleDetector"],
"onToolCallExecuted": ["ruleDetector"]
}
}
}The EdgeClaw collaboration pipeline is fully extensible β implement the GuardClawRouter interface to inject custom collaboration logic:
const myRouter: GuardClawRouter = {
id: "content-filter",
async detect(context, pluginConfig): Promise<RouterDecision> {
if (context.message && context.message.length > 10000) {
return {
level: "S1",
action: "redirect",
target: { provider: "anthropic", model: "claude-sonnet-4.6" },
reason: "Message too long, using larger context model",
};
}
return { level: "S1", action: "passthrough" };
},
};{
"privacy": {
"routers": {
"content-filter": {
"enabled": true,
"type": "custom",
"module": "./my-routers/content-filter.js",
"weight": 40
}
},
"pipeline": {
"onUserMessage": ["privacy", "token-saver", "content-filter"]
}
}
}Edit the Markdown files under extensions/guardclaw/prompts/ to adjust behavior β no code changes needed:
| File | Purpose |
|---|---|
detection-system.md |
S1/S2/S3 classification rules |
guard-agent-system.md |
Guard Agent behavior |
token-saver-judge.md |
Task complexity classification |
Built-in presets allow one-click switching between local model + cloud model combinations:
| Preset | Local Model | Cloud Model | Use Case |
|---|---|---|---|
vllm-qwen35 |
vLLM / Qwen 3.5-35B | Same (fully local) | Full local deployment, maximum privacy |
minimax-cloud |
vLLM / Qwen 3.5-35B | MiniMax M2.5 | Local privacy detection + cloud primary model |
Custom presets for Ollama, LMStudio, SGLang, and other backends are also supported.
extensions/guardclaw/
βββ index.ts # Plugin entry point
βββ openclaw.plugin.json # Plugin metadata
βββ config.example.json # Configuration example
β
βββ src/
β βββ detector.ts # Detection engine (coordinates dual detectors)
β βββ rules.ts # Rule detector (keywords + regex)
β βββ local-model.ts # Local LLM detector + desensitization engine
β βββ router-pipeline.ts # Router pipeline (two-phase + weighted merge)
β βββ hooks.ts # 10 Hooks
β βββ privacy-proxy.ts # HTTP privacy proxy
β βββ guard-agent.ts # Guard Agent management
β βββ session-state.ts # Session privacy state
β βββ session-manager.ts # Dual-track session history
β βββ memory-isolation.ts # Dual-track memory management
β βββ routers/
β βββ privacy.ts # Privacy router (security)
β βββ token-saver.ts # Cost-Aware router (cost savings)
β
βββ prompts/ # Customizable prompt templates
β βββ detection-system.md
β βββ guard-agent-system.md
β βββ token-saver-judge.md
β
βββ test/ # Test suite
βββ rules.test.ts
βββ detector.test.ts
βββ router-pipeline.test.ts
βββ token-saver.test.ts
βββ privacy-proxy.test.ts
βββ integration.test.ts
Thanks to all contributors for their code submissions and testing. We welcome new members to join us in building the edge-cloud collaborative Agent ecosystem!
Contributing workflow: Fork this repo β Submit Issues β Create Pull Requests (PRs)
If this project is helpful to your research or work, please give us a β!
- For technical questions and feature requests, please use GitHub Issues
- OpenClaw β Base AI assistant framework
- MiniCPM β Recommended local detection model
- Ollama β Recommended local inference backend
MIT


