🔒 PLAYBOOK — AI Agent Security & Governance Platform

The deterministic SOAR layer that intercepts, judges, and contains rogue AI agents before they do damage.

📖 Table of Contents

🎯 What is PLAYBOOK?
🚀 Live Demo
⚡ The Problem
🏗️ Architecture
✨ Key Features
🛡️ Judge Layer — Zero LLM Enforcement
📸 Screenshots
⚙️ Quick Start
🔌 SDK Quick Start
🧪 Running Tests
📚 Tech Stack
🗺️ Roadmap
🤝 Contributing
📄 License

🎯 What is PLAYBOOK?

PLAYBOOK is an automated incident response system designed specifically for AI agent deployments. It integrates with Veea Lobster Trap DPI (Deep Packet Inspection for LLM traffic) to:

Stage	Action	Latency
🔍 DETECT	Normalize & inspect every agent action	`< 2ms`
⚖️ JUDGE	Deterministic rule evaluation — zero LLM calls	`< 5ms`
🛡️ ENFORCE	`ALLOW` / `DENY` / `QUARANTINE` / `ESCALATE`	`< 1ms`
📦 FORENSICS	Auto-generate evidence packages + compliance reports	`Async`

End-to-end p95: < 40ms 🚀

🚀 Live Demo

🌐 URL: https://playbooksoar.aiproofofconcept.in

Field	Value
👤 Username	`demo`
🔑 Password	`demo123`

🎮 Try This in 90 Seconds

🎯 Dashboard — See live KPIs, critical alerts, and agent health scores
📡 DPI Live Feed (/dpi-live) — Watch real-time Lobster Trap intercepts with risk scores and verdict badges
🐝 Agent Swarm (/swarm) — Launch a 3-agent attack. Toggle Misbehavior Mode for 100% malicious actions
⚖️ Judge Layer (/judge) — Inspect deterministic rule evaluation. Zero LLM in the enforcement path
📋 Incidents (/incidents) — Review auto-generated incidents with full forensics packages
🏛️ Policy Builder (/policy-builder) — Customize NIST SP 800-53 ODPs per incident type

⚡ The Problem

Enterprises deployed thousands of autonomous AI agents in 2025. Each has access to databases, APIs, and sensitive data. But nobody built the security layer to watch them.

When an agent goes rogue — exfiltrating customer data 💳, injecting malicious prompts 💉, or escalating privileges 🔓 — you find out after the damage is done.

Stat	Value
💰 Avg. Data Breach Cost	$4.5M
🏢 Orgs Lacking Agent Security	73%
⏱️ Days to Identify Breach	287 days
🛡️ Real-Time Defense	< 1%

PLAYBOOK fixes this. We don't just block. We capture the agent's reasoning, document the evidence, execute the playbook, and generate audit-ready compliance reports — all in under 40 milliseconds.

🏗️ Architecture

┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│  🤖 LLM App     │─────▶│ 🦞 Veea Lobster │─────▶│ 📕 PLAYBOOK     │
│  (Agent Swarm)  │      │ Trap DPI Proxy  │      │ Judge + SOAR    │
└─────────────────┘      └─────────────────┘      └─────────────────┘
                                                            │
                              ┌─────────────────────────────┼─────────────────────────────┐
                              ▼                             ▼                             ▼
                        ┌──────────┐                 ┌──────────┐                 ┌──────────┐
                        │ 🚨 INCIDENT│                │ 📦 FORENSICS│              │ 🏛️ COMPLIANCE│
                        │  CREATE    │               │  PACKAGE   │               │  MAPPING    │
                        │  + ROUTE   │               │  BUILDER   │               │  EU/NIST    │
                        └──────────┘                 └──────────┘                 └──────────┘

🔄 4-Stage Pipeline

Stage 1: DETECT          Stage 2: CLASSIFY/JUDGE   Stage 3: ENFORCE        Stage 4: FORENSICS
+----------------+      +----------------+       +----------------+       +----------------+
| 📝 Log Tailer  |─────▶| ⚖️ Local Judge |──────▶| 🛡️ Playbook    |──────▶| 📦 Evidence    |
| + Pre-Screen   |      | + ODP Resolver |       |  + Judge Gate  |       |  Package       |
+----------------+      +----------------+       +----------------+       +----------------+
                              ↓
                       🏛️ Policy Builder
                       (NIST Baseline + ODPs)

✨ Key Features

Feature	Description
⚖️ Deterministic Judge Layer	Zero LLM in enforcement path. Rule-based decisions in `< 5ms`. Immune to prompt-injection bypasses.
📡 DPI Live Feed	Real-time dashboard powered by Veea Lobster Trap. See every intercepted prompt, matched rule, and verdict.
🐝 Agent Swarm Simulator	Launch multi-agent attacks (FX Swap fraud, Data Exfiltration, Prompt Injection) with live WebSocket streaming.
🧠 Agent Playground	Test LLM providers (OpenAI, Gemini, Claude, Azure, Ollama) with human-in-the-loop approvals.
📋 16 Incident Types	Full taxonomy from Data Destruction to Regulatory Trigger.
🏛️ Policy Builder	NIST SP 800-53 ODPs with 6 industry templates (Finance, Healthcare, Gov, Retail, Manufacturing, Energy).
🗺️ Compliance Mapping	EU AI Act Art. 9/15/73, NIST AI RMF Agentic Profile, SOC 2 Type II, HIPAA, GDPR.
🕵️ Bypass Detection	Immune to 4 known LLM-judge bypass patterns (55 test vectors).
📦 Forensics Packages	Tamper-evident packages with SHA-256 manifest + HMAC signature.
📧 Multi-Channel Alerts	Slack, Email (Resend), PagerDuty notifications.
🔌 SDK + Middleware	Python SDK with `@guard` decorator, LangChain & CrewAI middleware.

🛡️ Judge Layer — Zero LLM Enforcement

The Judge Layer is architecturally separated from any LLM-based classification and is immune to bypass:

Bypass Pattern	Defense
📝 Context Window Displacement	Regex indicators (ignore previous, DAN mode, jailbreak)
🔗 Indirect Tool Chaining	Suspicious tool sequences (read_file → send_email)
🔤 Unicode Homoglyph Substitution	NFKC normalization + confusable character map
🎭 Confidence Hijacking	Social engineering pattern detection

Verdict Matrix

Severity	Auth Present	Verdict
1–3	Any	✅ `ALLOW`
4–6	Yes	✅ `ALLOW`
4–6	No	🟡 `QUARANTINE`
7–8	Any	🟡 `QUARANTINE`
9–10	Any	🔴 `DENY`

Fail-closed: any exception returns ESCALATE (severity 10)

📸 Screenshots

🎥 [Demo video coming soon]

🎯 Dashboard

Real-time KPIs, critical alerts, agent health %, and live pulse indicators.

📡 DPI Live Feed

Watch Lobster Trap intercepts streaming in with verdict badges, risk-score bars, and detected pattern chips.

🐝 Agent Swarm Simulator

Launch attacks, watch live event stream with colored verdicts, and see incidents auto-generate in real time.

⚖️ Judge Layer

Inspect deterministic rule evaluation with bypass pattern detection and latency histograms.

🏛️ Policy Builder

Customize NIST baselines, edit ODPs per incident type, detect conflicts, and apply industry templates.

⚙️ Quick Start

📋 Prerequisites

🐍 Python 3.11+
📦 Node.js 18+
🐘 PostgreSQL 15 (or SQLite for local dev)

🖥️ Backend

cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
alembic upgrade head
uvicorn app.main:app --reload --port 8000

🎨 Frontend

cd frontend
npm install
npm run dev

🔧 Environment Variables

Create backend/.env:

# 🌍 Environment
ENVIRONMENT=development
DEMO_MODE=true

# 🗄️ Database
DATABASE_URL=sqlite:///./data/playbooks.db
# DATABASE_URL=postgresql+asyncpg://user:pass@localhost/playbook

# 🔐 Security
SECRET_KEY=change-me-in-production-min-64-characters-long
ACCESS_TOKEN_EXPIRE_MINUTES=60

# 🤖 Gemini (optional — for AI analysis overlay)
GEMINI_API_KEY=

# 🦞 Lobster Trap
LOBSTERTRAP_BINARY_PATH=./bin/lobstertrap

# 📧 Notifications (optional)
RESEND_API_KEY=
SMTP_HOST=
SMTP_PORT=587
SMTP_USER=
SMTP_PASSWORD=
NOTIFICATION_DEFAULT_RECIPIENTS=["security@company.com"]

🔌 SDK Quick Start

📦 Installation

pip install playbook-guard

🛡️ Basic Usage

from playbook_sdk import guard

@guard(agent_id="my-agent")
async def risky_action(data):
    """This function is automatically protected by PLAYBOOK."""
    return await process_data(data)

🔗 LangChain Middleware

from playbook_sdk.middleware.langchain import PlaybookCallbackHandler

handler = PlaybookCallbackHandler(agent_id="my-agent")
# Pass handler to your LangChain agent

👥 CrewAI Middleware

from playbook_sdk.middleware.crewai import crewai_guard

@crewai_guard
class MyAgent(Agent):
    role = "Security Analyst"

🧪 Running Tests

🐍 Backend

cd backend
pytest -v                           # Full suite
pytest -v -m "integration"          # Integration tests only
pytest --cov=app --cov-report=html  # Coverage report

⚖️ Judge Layer Tests

pytest tests/unit/test_bypass_detection.py    # 55/55 bypass vectors
pytest tests/unit/test_determinism.py          # 1000-repeat variance
pytest tests/unit/test_enforcement_accuracy.py # True positive rate

🎨 Frontend

cd frontend
npm run lint
npm run typecheck
npm test

📚 Tech Stack

Layer	Technology	Purpose
⚡ Backend	Python 3.11, FastAPI, SQLAlchemy 2.0, Alembic	REST API + WebSocket
🎨 Frontend	React 18.2, TypeScript, Tailwind CSS, Recharts	Dashboard UI
🗄️ Database	SQLite (dev) / PostgreSQL 15 (prod)	Persistence
🦞 DPI	Veea Lobster Trap	LLM traffic inspection
🤖 LLM Overlay	Gemini 3.1 Flash Lite (async reasoning only)	Post-hoc analysis
📧 Email	Resend SMTP	Alert notifications
🔄 Reverse Proxy	Caddy	Auto HTTPS

🗺️ Roadmap

🤝 Contributing

Contributions are welcome! Please read our Development Plan and open an issue before submitting PRs.

📄 License

Apache 2.0 — see LICENSE for details.

Built for the lablab.ai TechEx Intelligent Enterprise Solutions hackathon. Track: Agent Security & AI Governance — Veea 🦞

⭐ Star us on GitHub if you find PLAYBOOK useful!

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
bin		bin
frontend		frontend
sdk		sdk
.env.example		.env.example
.env.template		.env.template
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
railway.json		railway.json
rigorous_issues.json		rigorous_issues.json

Folders and files

Latest commit

History

Repository files navigation

🔒 PLAYBOOK — AI Agent Security & Governance Platform

📖 Table of Contents

🎯 What is PLAYBOOK?

🚀 Live Demo

🎮 Try This in 90 Seconds

⚡ The Problem

🏗️ Architecture

🔄 4-Stage Pipeline

✨ Key Features

🛡️ Judge Layer — Zero LLM Enforcement

Verdict Matrix

📸 Screenshots

🎯 Dashboard

📡 DPI Live Feed

🐝 Agent Swarm Simulator

⚖️ Judge Layer

🏛️ Policy Builder

⚙️ Quick Start

📋 Prerequisites

🖥️ Backend

🎨 Frontend

🔧 Environment Variables

🔌 SDK Quick Start

📦 Installation

🛡️ Basic Usage

🔗 LangChain Middleware

👥 CrewAI Middleware

🧪 Running Tests

🐍 Backend

⚖️ Judge Layer Tests

🎨 Frontend

📚 Tech Stack

🗺️ Roadmap

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages