Skip to content

Commit 53793f2

Browse files
authored
ROX-31482: Add project architecture documentation (#24)
1 parent bc05b10 commit 53793f2

File tree

1 file changed

+260
-0
lines changed

1 file changed

+260
-0
lines changed

docs/architecture.md

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# StackRox MCP Architecture
2+
3+
## Overview
4+
5+
StackRox MCP Server is a Model Context Protocol (MCP) server that exposes StackRox Central's security capabilities through a standardized interface. It enables AI assistants to query vulnerability data.
6+
7+
## High-Level Architecture
8+
9+
```
10+
┌─────────────────────────────────────────────────────────────────────────┐
11+
│ MCP Client │
12+
│ (Claude Code, goose, etc.) │
13+
└───────────────┬─────────────────────────────────────────────────────────┘
14+
│ HTTP/SSE or stdio
15+
│ (includes Authorization header)
16+
17+
┌─────────────────────────────────────────────────────────────────────────┐
18+
│ StackRox MCP Server │
19+
│ ┌────────────────────────────────────────────────────────────────────┐ │
20+
│ │ MCP Server │ │
21+
│ │ (go-sdk/mcp.Server with HTTP/stdio transport) │ │
22+
│ └────────────┬───────────────────────────────────────────────────────┘ │
23+
│ │ │
24+
│ ▼ │
25+
│ ┌────────────────────────────────────────────────────────────────────┐ │
26+
│ │ Toolsets Registry │ │
27+
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │
28+
│ │ │ Vulnerability │ │ Config Manager │ │ │
29+
│ │ │ Toolset │ │ Toolset │ │ │
30+
│ │ └──────────────────┘ └──────────────────┘ │ │
31+
│ └────────────┬───────────────────────────────────────────────────────┘ │
32+
│ │ │
33+
│ ▼ │
34+
│ ┌────────────────────────────────────────────────────────────────────┐ │
35+
│ │ StackRox Client │ │
36+
│ │ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
37+
│ │ │ Auth Handler │ │ Interceptors│ │ Retry Policy │ │ │
38+
│ │ │(passthrough/ │ │(logging/ │ │(exponential │ │ │
39+
│ │ │ static) │ │ retry) │ │ backoff) │ │ │
40+
│ │ └──────────────┘ └─────────────┘ └──────────────────┘ │ │
41+
│ └────────────┬───────────────────────────────────────────────────────┘ │
42+
└───────────────┼─────────────────────────────────────────────────────────┘
43+
│ gRPC (HTTP/2 or HTTP/1 bridge)
44+
│ TLS with Bearer token
45+
46+
┌─────────────────────────────────────────────────────────────────────────┐
47+
│ StackRox Central │
48+
│ ┌────────────────────────────────────────────────────────────────────┐ │
49+
│ │ gRPC API Services │ │
50+
│ │ • DeploymentService • ImageService │ │
51+
│ │ • NodeService • ClustersService │ │
52+
│ └────────────────────────────────────────────────────────────────────┘ │
53+
└─────────────────────────────────────────────────────────────────────────┘
54+
```
55+
56+
## Core Components
57+
58+
### MCP Server
59+
60+
The MCP server handles client connections and routes tool invocations to the appropriate toolsets.
61+
62+
**Responsibilities**:
63+
- Serves MCP protocol over HTTP with Stream-HTTP or stdio transport
64+
- Routes tool calls to registered toolsets
65+
- Provides health check endpoint
66+
- Manages graceful shutdown
67+
68+
**Transport Modes**:
69+
- **Stream-HTTP**: Streaming responses over HTTP, supports both auth modes
70+
- **stdio**: Standard input/output, requires static authentication
71+
72+
### Toolsets Registry
73+
74+
Central registry that manages all available toolsets and their tools.
75+
76+
**Responsibilities**:
77+
- Manages toolset registration
78+
- Applies global read-only filtering when configured
79+
- Provides unified tool discovery
80+
81+
**Available Toolsets**:
82+
83+
1. **Vulnerability Toolset**: Query resources where CVEs are detected
84+
- `get_deployments_for_cve`: Find deployments where CVE is detected
85+
- `get_nodes_for_cve`: Find nodes where CVE is detected (aggregated by cluster and OS)
86+
- `get_clusters_with_orchestrator_cve`: Find clusters where CVE is detected in orchestrator components
87+
88+
2. **Config Manager Toolset**: Manage cluster configurations
89+
- `list_clusters`: List all managed clusters with pagination
90+
91+
### StackRox Client
92+
93+
Manages the gRPC connection to StackRox Central API.
94+
95+
**Responsibilities**:
96+
- Establishes and maintains gRPC connections
97+
- Handles authentication (static or passthrough)
98+
- Applies interceptors for logging and retry
99+
- Manages connection lifecycle and automatic reconnection
100+
101+
**Connection Features**:
102+
- Lazy connection initialization
103+
- Automatic reconnection on transient failures
104+
- Support for both HTTP/2 (native gRPC) and HTTP/1 bridge mode
105+
- Configurable request timeouts (default: 30 seconds)
106+
107+
### Authentication
108+
109+
Two authentication modes are supported:
110+
111+
**Passthrough Authentication**:
112+
- Token extracted from incoming MCP request headers
113+
- Enables per-user authentication when MCP server is shared
114+
- Token passed directly to StackRox Central for each API call
115+
- Supports multi-tenant deployments
116+
117+
**Static Authentication**:
118+
- Single API token configured at server startup
119+
- All API calls use the same credentials
120+
- Required for stdio transport mode
121+
- Simpler setup for single-user scenarios
122+
123+
### Configuration
124+
125+
Centralized configuration with multiple sources (in precedence order):
126+
1. Default values
127+
2. YAML configuration file
128+
3. Environment variables (prefix: `STACKROX_MCP__`)
129+
130+
**Key Configuration Areas**:
131+
- `central`: StackRox Central connection settings (endpoint, auth, TLS)
132+
- `global`: Server-wide settings (read-only mode)
133+
- `server`: HTTP server configuration (port, timeouts)
134+
- `tools`: Individual toolset enable/disable flags
135+
136+
## Request Flow
137+
138+
```
139+
MCP Client
140+
141+
├─> 1. HTTP POST with Authorization header
142+
143+
144+
MCP Server
145+
146+
├─> 2. Route to tool handler
147+
148+
149+
Tool Handler
150+
151+
├─> 3. Store MCP request in context
152+
153+
154+
StackRox Client
155+
156+
├─> 4. Extract token (passthrough) or use static token
157+
158+
159+
gRPC Interceptors
160+
161+
├─> 5. Apply logging and retry logic
162+
163+
164+
StackRox Central API
165+
166+
├─> 6. Process request and return response
167+
168+
169+
Tool Handler
170+
171+
├─> 7. Format response for MCP
172+
173+
174+
MCP Client
175+
```
176+
177+
## Error Handling
178+
179+
The system implements intelligent error handling with retry logic for transient failures.
180+
181+
### Error Classification
182+
183+
**Retriable Errors** (automatically retried with exponential backoff):
184+
- `Unavailable`: Service temporarily unavailable
185+
- `DeadlineExceeded`: Request timeout
186+
187+
**Non-Retriable Errors** (returned immediately):
188+
- `Unauthenticated`: Invalid or expired API token
189+
- `PermissionDenied`: Insufficient permissions
190+
- `NotFound`: Resource not found
191+
- `InvalidArgument`: Bad request parameters
192+
193+
### Retry Strategy
194+
195+
- Maximum retries: 3 (configurable)
196+
- Exponential backoff: starts at 1s, doubles each attempt, capped at 10s
197+
- Timeout per attempt: 30 seconds (configurable)
198+
- Only retriable errors trigger retry logic
199+
200+
### Error Messages
201+
202+
All errors are converted to user-friendly messages with:
203+
- Clear description of what went wrong
204+
- Actionable guidance for resolution
205+
- Context about the failed operation
206+
- Transparency about automatic retries
207+
208+
## Available Tools
209+
210+
### Vulnerability Tools
211+
212+
**get_deployments_for_cve**
213+
- Query deployments where CVE is detected
214+
- Optional filters: cluster, namespace, platform type
215+
- Optional image enrichment (lists container images where CVE is detected)
216+
- Pagination support for large result sets
217+
218+
**get_nodes_for_cve**
219+
- Query nodes where CVE is detected
220+
- Results aggregated by cluster and OS image
221+
- Optional cluster filter
222+
- Streaming API for efficient processing
223+
224+
**get_clusters_with_orchestrator_cve**
225+
- Query clusters where CVE is detected for orchestrator components
226+
- Optional cluster filter for verification
227+
- Sorted results for deterministic output
228+
229+
### Config Management Tools
230+
231+
**list_clusters**
232+
- List all clusters managed by StackRox
233+
- Client-side pagination support
234+
- Returns cluster metadata and status
235+
236+
## Query Syntax
237+
238+
All vulnerability tools use StackRox query syntax:
239+
240+
- **Field filters**: `CVE:"CVE-2021-44228"`
241+
- **Multiple conditions**: `CVE:"CVE-2021"+Namespace:"default"`
242+
- **Exact matching**: Values quoted to prevent partial matches
243+
- **Platform filters**: `Platform Component:0` (user workload) or `Platform Component:1` (platform)
244+
245+
## Performance Considerations
246+
247+
**Deployment Image Enrichment**:
248+
- Disabled by default for faster response times
249+
- When enabled, uses concurrent requests with semaphore limiting
250+
- Can significantly increase response time for large deployments
251+
252+
**Node Aggregation**:
253+
- Streams all nodes before aggregating and returning results
254+
- Groups nodes by cluster and OS for reduced response size
255+
- Memory usage scales with number of nodes
256+
257+
**Cluster Listing**:
258+
- Fetches all clusters from API
259+
- Applies client-side pagination
260+
- Optimized for typical deployments (10-1000 clusters)

0 commit comments

Comments
 (0)