|
| 1 | +# StackRox MCP Architecture |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +StackRox MCP Server is a Model Context Protocol (MCP) server that exposes StackRox Central's security capabilities through a standardized interface. It enables AI assistants to query vulnerability data. |
| 6 | + |
| 7 | +## High-Level Architecture |
| 8 | + |
| 9 | +``` |
| 10 | +┌─────────────────────────────────────────────────────────────────────────┐ |
| 11 | +│ MCP Client │ |
| 12 | +│ (Claude Code, goose, etc.) │ |
| 13 | +└───────────────┬─────────────────────────────────────────────────────────┘ |
| 14 | + │ HTTP/SSE or stdio |
| 15 | + │ (includes Authorization header) |
| 16 | + ▼ |
| 17 | +┌─────────────────────────────────────────────────────────────────────────┐ |
| 18 | +│ StackRox MCP Server │ |
| 19 | +│ ┌────────────────────────────────────────────────────────────────────┐ │ |
| 20 | +│ │ MCP Server │ │ |
| 21 | +│ │ (go-sdk/mcp.Server with HTTP/stdio transport) │ │ |
| 22 | +│ └────────────┬───────────────────────────────────────────────────────┘ │ |
| 23 | +│ │ │ |
| 24 | +│ ▼ │ |
| 25 | +│ ┌────────────────────────────────────────────────────────────────────┐ │ |
| 26 | +│ │ Toolsets Registry │ │ |
| 27 | +│ │ ┌──────────────────┐ ┌──────────────────┐ │ │ |
| 28 | +│ │ │ Vulnerability │ │ Config Manager │ │ │ |
| 29 | +│ │ │ Toolset │ │ Toolset │ │ │ |
| 30 | +│ │ └──────────────────┘ └──────────────────┘ │ │ |
| 31 | +│ └────────────┬───────────────────────────────────────────────────────┘ │ |
| 32 | +│ │ │ |
| 33 | +│ ▼ │ |
| 34 | +│ ┌────────────────────────────────────────────────────────────────────┐ │ |
| 35 | +│ │ StackRox Client │ │ |
| 36 | +│ │ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │ |
| 37 | +│ │ │ Auth Handler │ │ Interceptors│ │ Retry Policy │ │ │ |
| 38 | +│ │ │(passthrough/ │ │(logging/ │ │(exponential │ │ │ |
| 39 | +│ │ │ static) │ │ retry) │ │ backoff) │ │ │ |
| 40 | +│ │ └──────────────┘ └─────────────┘ └──────────────────┘ │ │ |
| 41 | +│ └────────────┬───────────────────────────────────────────────────────┘ │ |
| 42 | +└───────────────┼─────────────────────────────────────────────────────────┘ |
| 43 | + │ gRPC (HTTP/2 or HTTP/1 bridge) |
| 44 | + │ TLS with Bearer token |
| 45 | + ▼ |
| 46 | +┌─────────────────────────────────────────────────────────────────────────┐ |
| 47 | +│ StackRox Central │ |
| 48 | +│ ┌────────────────────────────────────────────────────────────────────┐ │ |
| 49 | +│ │ gRPC API Services │ │ |
| 50 | +│ │ • DeploymentService • ImageService │ │ |
| 51 | +│ │ • NodeService • ClustersService │ │ |
| 52 | +│ └────────────────────────────────────────────────────────────────────┘ │ |
| 53 | +└─────────────────────────────────────────────────────────────────────────┘ |
| 54 | +``` |
| 55 | + |
| 56 | +## Core Components |
| 57 | + |
| 58 | +### MCP Server |
| 59 | + |
| 60 | +The MCP server handles client connections and routes tool invocations to the appropriate toolsets. |
| 61 | + |
| 62 | +**Responsibilities**: |
| 63 | +- Serves MCP protocol over HTTP with Stream-HTTP or stdio transport |
| 64 | +- Routes tool calls to registered toolsets |
| 65 | +- Provides health check endpoint |
| 66 | +- Manages graceful shutdown |
| 67 | + |
| 68 | +**Transport Modes**: |
| 69 | +- **Stream-HTTP**: Streaming responses over HTTP, supports both auth modes |
| 70 | +- **stdio**: Standard input/output, requires static authentication |
| 71 | + |
| 72 | +### Toolsets Registry |
| 73 | + |
| 74 | +Central registry that manages all available toolsets and their tools. |
| 75 | + |
| 76 | +**Responsibilities**: |
| 77 | +- Manages toolset registration |
| 78 | +- Applies global read-only filtering when configured |
| 79 | +- Provides unified tool discovery |
| 80 | + |
| 81 | +**Available Toolsets**: |
| 82 | + |
| 83 | +1. **Vulnerability Toolset**: Query resources where CVEs are detected |
| 84 | + - `get_deployments_for_cve`: Find deployments where CVE is detected |
| 85 | + - `get_nodes_for_cve`: Find nodes where CVE is detected (aggregated by cluster and OS) |
| 86 | + - `get_clusters_with_orchestrator_cve`: Find clusters where CVE is detected in orchestrator components |
| 87 | + |
| 88 | +2. **Config Manager Toolset**: Manage cluster configurations |
| 89 | + - `list_clusters`: List all managed clusters with pagination |
| 90 | + |
| 91 | +### StackRox Client |
| 92 | + |
| 93 | +Manages the gRPC connection to StackRox Central API. |
| 94 | + |
| 95 | +**Responsibilities**: |
| 96 | +- Establishes and maintains gRPC connections |
| 97 | +- Handles authentication (static or passthrough) |
| 98 | +- Applies interceptors for logging and retry |
| 99 | +- Manages connection lifecycle and automatic reconnection |
| 100 | + |
| 101 | +**Connection Features**: |
| 102 | +- Lazy connection initialization |
| 103 | +- Automatic reconnection on transient failures |
| 104 | +- Support for both HTTP/2 (native gRPC) and HTTP/1 bridge mode |
| 105 | +- Configurable request timeouts (default: 30 seconds) |
| 106 | + |
| 107 | +### Authentication |
| 108 | + |
| 109 | +Two authentication modes are supported: |
| 110 | + |
| 111 | +**Passthrough Authentication**: |
| 112 | +- Token extracted from incoming MCP request headers |
| 113 | +- Enables per-user authentication when MCP server is shared |
| 114 | +- Token passed directly to StackRox Central for each API call |
| 115 | +- Supports multi-tenant deployments |
| 116 | + |
| 117 | +**Static Authentication**: |
| 118 | +- Single API token configured at server startup |
| 119 | +- All API calls use the same credentials |
| 120 | +- Required for stdio transport mode |
| 121 | +- Simpler setup for single-user scenarios |
| 122 | + |
| 123 | +### Configuration |
| 124 | + |
| 125 | +Centralized configuration with multiple sources (in precedence order): |
| 126 | +1. Default values |
| 127 | +2. YAML configuration file |
| 128 | +3. Environment variables (prefix: `STACKROX_MCP__`) |
| 129 | + |
| 130 | +**Key Configuration Areas**: |
| 131 | +- `central`: StackRox Central connection settings (endpoint, auth, TLS) |
| 132 | +- `global`: Server-wide settings (read-only mode) |
| 133 | +- `server`: HTTP server configuration (port, timeouts) |
| 134 | +- `tools`: Individual toolset enable/disable flags |
| 135 | + |
| 136 | +## Request Flow |
| 137 | + |
| 138 | +``` |
| 139 | +MCP Client |
| 140 | + │ |
| 141 | + ├─> 1. HTTP POST with Authorization header |
| 142 | + │ |
| 143 | + ▼ |
| 144 | +MCP Server |
| 145 | + │ |
| 146 | + ├─> 2. Route to tool handler |
| 147 | + │ |
| 148 | + ▼ |
| 149 | +Tool Handler |
| 150 | + │ |
| 151 | + ├─> 3. Store MCP request in context |
| 152 | + │ |
| 153 | + ▼ |
| 154 | +StackRox Client |
| 155 | + │ |
| 156 | + ├─> 4. Extract token (passthrough) or use static token |
| 157 | + │ |
| 158 | + ▼ |
| 159 | +gRPC Interceptors |
| 160 | + │ |
| 161 | + ├─> 5. Apply logging and retry logic |
| 162 | + │ |
| 163 | + ▼ |
| 164 | +StackRox Central API |
| 165 | + │ |
| 166 | + ├─> 6. Process request and return response |
| 167 | + │ |
| 168 | + ▼ |
| 169 | +Tool Handler |
| 170 | + │ |
| 171 | + ├─> 7. Format response for MCP |
| 172 | + │ |
| 173 | + ▼ |
| 174 | +MCP Client |
| 175 | +``` |
| 176 | + |
| 177 | +## Error Handling |
| 178 | + |
| 179 | +The system implements intelligent error handling with retry logic for transient failures. |
| 180 | + |
| 181 | +### Error Classification |
| 182 | + |
| 183 | +**Retriable Errors** (automatically retried with exponential backoff): |
| 184 | +- `Unavailable`: Service temporarily unavailable |
| 185 | +- `DeadlineExceeded`: Request timeout |
| 186 | + |
| 187 | +**Non-Retriable Errors** (returned immediately): |
| 188 | +- `Unauthenticated`: Invalid or expired API token |
| 189 | +- `PermissionDenied`: Insufficient permissions |
| 190 | +- `NotFound`: Resource not found |
| 191 | +- `InvalidArgument`: Bad request parameters |
| 192 | + |
| 193 | +### Retry Strategy |
| 194 | + |
| 195 | +- Maximum retries: 3 (configurable) |
| 196 | +- Exponential backoff: starts at 1s, doubles each attempt, capped at 10s |
| 197 | +- Timeout per attempt: 30 seconds (configurable) |
| 198 | +- Only retriable errors trigger retry logic |
| 199 | + |
| 200 | +### Error Messages |
| 201 | + |
| 202 | +All errors are converted to user-friendly messages with: |
| 203 | +- Clear description of what went wrong |
| 204 | +- Actionable guidance for resolution |
| 205 | +- Context about the failed operation |
| 206 | +- Transparency about automatic retries |
| 207 | + |
| 208 | +## Available Tools |
| 209 | + |
| 210 | +### Vulnerability Tools |
| 211 | + |
| 212 | +**get_deployments_for_cve** |
| 213 | +- Query deployments where CVE is detected |
| 214 | +- Optional filters: cluster, namespace, platform type |
| 215 | +- Optional image enrichment (lists container images where CVE is detected) |
| 216 | +- Pagination support for large result sets |
| 217 | + |
| 218 | +**get_nodes_for_cve** |
| 219 | +- Query nodes where CVE is detected |
| 220 | +- Results aggregated by cluster and OS image |
| 221 | +- Optional cluster filter |
| 222 | +- Streaming API for efficient processing |
| 223 | + |
| 224 | +**get_clusters_with_orchestrator_cve** |
| 225 | +- Query clusters where CVE is detected for orchestrator components |
| 226 | +- Optional cluster filter for verification |
| 227 | +- Sorted results for deterministic output |
| 228 | + |
| 229 | +### Config Management Tools |
| 230 | + |
| 231 | +**list_clusters** |
| 232 | +- List all clusters managed by StackRox |
| 233 | +- Client-side pagination support |
| 234 | +- Returns cluster metadata and status |
| 235 | + |
| 236 | +## Query Syntax |
| 237 | + |
| 238 | +All vulnerability tools use StackRox query syntax: |
| 239 | + |
| 240 | +- **Field filters**: `CVE:"CVE-2021-44228"` |
| 241 | +- **Multiple conditions**: `CVE:"CVE-2021"+Namespace:"default"` |
| 242 | +- **Exact matching**: Values quoted to prevent partial matches |
| 243 | +- **Platform filters**: `Platform Component:0` (user workload) or `Platform Component:1` (platform) |
| 244 | + |
| 245 | +## Performance Considerations |
| 246 | + |
| 247 | +**Deployment Image Enrichment**: |
| 248 | +- Disabled by default for faster response times |
| 249 | +- When enabled, uses concurrent requests with semaphore limiting |
| 250 | +- Can significantly increase response time for large deployments |
| 251 | + |
| 252 | +**Node Aggregation**: |
| 253 | +- Streams all nodes before aggregating and returning results |
| 254 | +- Groups nodes by cluster and OS for reduced response size |
| 255 | +- Memory usage scales with number of nodes |
| 256 | + |
| 257 | +**Cluster Listing**: |
| 258 | +- Fetches all clusters from API |
| 259 | +- Applies client-side pagination |
| 260 | +- Optimized for typical deployments (10-1000 clusters) |
0 commit comments