This document describes the comprehensive performance benchmarking suite for TerraphimAgent operations.
The benchmark suite provides comprehensive performance testing for:
- Core Agent Operations: Agent creation, initialization, command processing
- WebSocket Communication: Protocol performance, message throughput, connection handling
- Multi-Agent Workflows: Concurrent execution, batch processing, coordination
- Knowledge Graph Operations: Query performance, path finding, automata operations
- Memory Management: Context enrichment, state persistence, resource utilization
- LLM Integration: Request processing, token tracking, cost management
File: agent_operations.rs
Uses Criterion.rs for statistical benchmarking with HTML reports.
-
Agent Lifecycle
- Agent creation time
- Agent initialization
- State save/load operations
-
Command Processing
- Generate commands
- Answer commands
- Analyze commands
- Create commands
- Review commands
-
Registry Operations
- Agent registration
- Capability-based discovery
- Load balancing
-
Memory Operations
- Context enrichment
- State persistence
- Knowledge graph queries
-
Batch & Concurrent Operations
- Batch command processing (1, 5, 10, 20, 50 commands)
- Concurrent execution (1, 2, 4, 8 threads)
-
Knowledge Graph Operations
- RoleGraph queries
- Node matching
- Path connectivity checks
-
Automata Operations
- Autocomplete functionality
- Pattern matching
- Text processing
-
LLM Operations
- Simple generation
- Request processing
-
Tracking Operations
- Token usage tracking
- Cost calculation
- Budget monitoring
File: agent-performance.benchmark.js
Uses Vitest for JavaScript performance testing.
-
WebSocket Connection Performance
- Connection establishment (10 concurrent connections)
- Message processing throughput (50 messages/connection)
-
Workflow Performance
- Workflow start latency (5 concurrent workflows)
- Concurrent workflow execution
-
Command Processing Performance
- Different command types (generate, analyze, answer, create, review)
- End-to-end processing time
-
Throughput Performance
- 10-second load test
- Operations per second measurement
- Latency under load
-
Memory and Resource Performance
- Memory operation efficiency (20 operations)
- Batch operations (configurable batch size)
-
Error Handling Performance
- Malformed message handling
- Connection resilience
- Error recovery time
const THRESHOLDS = {
webSocketConnection: { avg: 500, p95: 1000 }, // ms
messageProcessing: { avg: 100, p95: 200 }, // ms
workflowStart: { avg: 2000, p95: 5000 }, // ms
commandProcessing: { avg: 3000, p95: 10000 }, // ms
memoryOperations: { avg: 50, p95: 100 }, // ms
contextEnrichment: { avg: 500, p95: 1000 }, // ms
batchOperations: { avg: 5000, p95: 15000 }, // ms
};# Run all benchmarks with comprehensive reporting
./scripts/run-benchmarks.sh
# Run only Rust benchmarks
./scripts/run-benchmarks.sh --rust-only
# Run only JavaScript benchmarks
./scripts/run-benchmarks.sh --js-only# Navigate to multi-agent crate
cd crates/terraphim_multi_agent
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench agent_creation
# Generate HTML reports
cargo bench -- --output-format html# Navigate to desktop directory
cd desktop
# Run benchmarks
yarn benchmark
# Watch mode for development
yarn benchmark:watch
# UI mode with visualization
yarn benchmark:ui
# Run with specific configuration
npx vitest --config vitest.benchmark.config.ts- Performance Report:
benchmark-results/[timestamp]/performance_report.md - Rust Results:
benchmark-results/[timestamp]/rust_benchmarks.txt - JavaScript Results:
benchmark-results/[timestamp]/js_benchmarks.json - Criterion HTML:
benchmark-results/[timestamp]/rust_criterion_reports/
Each benchmark run generates:
- Executive Summary: Key performance metrics overview
- Detailed Results: Per-operation timing and statistics
- Threshold Analysis: Pass/fail status against performance targets
- Raw Data: Complete benchmark output for further analysis
- Recommendations: Performance optimization suggestions
Agent Creation time: [45.2 ms 47.1 ms 49.3 ms]
change: [-2.1% +0.8% +3.9%] (p = 0.18 > 0.05)
- Time Range: Lower bound, estimate, upper bound
- Change: Performance change from previous run
- P-value: Statistical significance
{
"name": "WebSocket Connection",
"count": 10,
"avg": 324.5,
"min": 298.1,
"max": 367.2,
"p95": 359.8,
"p99": 365.1
}- Count: Number of samples
- Avg: Average execution time (ms)
- Min/Max: Fastest/slowest execution
- P95/P99: 95th/99th percentile times
- High Latency Operations: Look for operations exceeding thresholds
- Memory Pressure: Monitor memory operations for excessive allocation
- Concurrency Issues: Compare single-threaded vs multi-threaded performance
- Network Bottlenecks: Analyze WebSocket throughput patterns
- Agent Pooling: Reuse initialized agents
- Connection Pooling: Efficient database/LLM connections
- Async Optimization: Reduce unnecessary context switches
- Memory Management: Optimize allocation patterns
- Message Batching: Group related operations
- Connection Management: Reuse WebSocket connections
- Error Recovery: Fast error handling without reconnection
- Resource Cleanup: Proper cleanup to prevent memory leaks
The benchmark suite can be integrated into CI/CD pipelines:
name: Performance Benchmarks
on: [pull_request]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Rust
uses: actions-rs/toolchain@v1
- name: Setup Node.js
uses: actions/setup-node@v3
- name: Run Benchmarks
run: ./scripts/run-benchmarks.sh
- name: Upload Results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: benchmark-results/- Threshold Monitoring: Automated alerts when thresholds are exceeded
- Trend Analysis: Track performance changes over time
- Comparative Analysis: Compare performance across versions
Located in crates/terraphim_multi_agent/Cargo.toml:
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "agent_operations"
harness = falseLocated in desktop/vitest.benchmark.config.ts:
export default defineConfig({
test: {
include: ['tests/benchmarks/**/*.benchmark.{js,ts}'],
timeout: 120000, // 2 minutes per test
reporters: ['verbose', 'json'],
pool: 'forks',
poolOptions: {
forks: { singleFork: true }
}
}
});- Server Not Starting: Ensure no other processes are using benchmark ports
- Timeout Errors: Increase timeout values for slower systems
- Memory Issues: Reduce batch sizes or concurrent operations
- WebSocket Failures: Check firewall settings and port availability
# Enable debug logging
RUST_LOG=debug ./scripts/run-benchmarks.sh
# Verbose JavaScript output
yarn benchmark -- --reporter=verbose
# Single test execution
yarn benchmark -- --run tests/benchmarks/specific-test.benchmark.js- Add benchmark function to
agent_operations.rs - Include in
criterion_group!macro - Document expected performance characteristics
- Add test to
agent-performance.benchmark.js - Include performance thresholds
- Add proper error handling and cleanup
- Consistent Environment: Run benchmarks on consistent hardware
- Warm-up Runs: Include warm-up iterations for JIT optimization
- Statistical Significance: Ensure sufficient sample sizes
- Isolation: Avoid interference from other processes
- Documentation: Document expected performance ranges
- Agent Creation: < 100ms average
- Command Processing: < 5s p95
- WebSocket Latency: < 200ms p95
- Memory Operations: < 100ms p95
- Throughput: > 100 operations/second
- Agent Creation: < 200ms average
- Command Processing: < 10s p95
- WebSocket Latency: < 500ms p95
- Memory Operations: < 200ms p95
- Throughput: > 50 operations/second
These benchmarks ensure TerraphimAgent maintains high performance across all operation categories while providing detailed insights for optimization efforts.