Production-ready guidelines for deploying NeuroCache.
- Production Deployment
- Security
- Performance Optimization
- Monitoring & Observability
- Cache Strategy
- Error Handling
- Testing
- Common Pitfalls
Before deploying to production:
- Use RedisStore (not MemoryStore)
- Set appropriate TTL for your use case
- Configure environment variables (no hardcoded secrets)
- Disable logging (use metrics instead)
- Set up monitoring (metrics, alerts)
- Implement error handling (provider/store failures)
- Test concurrent requests behavior
- Configure Redis persistence (RDB/AOF)
- Set up Redis backups (if critical data)
- Review security policy (SECURITY.md)
✅ Recommended:
import { NeuroCache, OpenAIProvider, RedisStore } from 'neurocache';
const cache = new NeuroCache({
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!,
timeout: 30000, // 30s timeout
maxRetries: 2 // Retry twice
}),
store: new RedisStore({
host: process.env.REDIS_HOST!,
port: parseInt(process.env.REDIS_PORT!),
password: process.env.REDIS_PASSWORD,
keyPrefix: `${process.env.APP_NAME}:cache:`,
tls: process.env.REDIS_TLS === 'true' ? {} : undefined
}),
ttl: parseInt(process.env.CACHE_TTL || '3600'),
logging: false, // Disable in production
enableContextIntelligence: true
});❌ Avoid:
// ❌ Hardcoded secrets
const provider = new OpenAIProvider({
apiKey: 'sk-...' // Security risk!
});
// ❌ MemoryStore in production
const store = new MemoryStore(); // Lost on restart, not shared
// ❌ Logging enabled in production
const cache = new NeuroCache({
logging: true // Performance overhead, log spam
});
// ❌ No TTL or very long TTL
const cache = new NeuroCache({
ttl: 0 // Cache forever (stale data risk)
});Create .env file:
# Required
OPENAI_API_KEY=sk-...
REDIS_HOST=your-redis-host
REDIS_PORT=6379
# Optional
REDIS_PASSWORD=your-password
REDIS_TLS=true
REDIS_DB=0
APP_NAME=myapp
CACHE_TTL=3600
NODE_ENV=productionLoad with dotenv:
import 'dotenv/config';
const cache = new NeuroCache({
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!
}),
store: new RedisStore({
host: process.env.REDIS_HOST!,
port: parseInt(process.env.REDIS_PORT || '6379'),
password: process.env.REDIS_PASSWORD
})
});Never commit .env to version control!
Add to .gitignore:
.env
.env.*
!.env.exampleCreate .env.example (safe to commit):
OPENAI_API_KEY=sk-your-key-here
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=✅ DO:
// ✅ Environment variables
const apiKey = process.env.OPENAI_API_KEY!;
// ✅ Secret management (AWS Secrets Manager, Vault)
const apiKey = await getSecret('openai-api-key');
// ✅ Validate key exists
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is required');
}❌ DON'T:
// ❌ Hardcoded keys
const apiKey = 'sk-...';
// ❌ Logged in code
console.log('API Key:', apiKey);
// ❌ Exposed in client-side code
const provider = new OpenAIProvider({ apiKey }); // In browser JS
// ❌ Committed to Git✅ Secure Redis:
const store = new RedisStore({
host: process.env.REDIS_HOST!,
port: 6379,
password: process.env.REDIS_PASSWORD, // ← Required in production
username: process.env.REDIS_USERNAME, // ← ACL support
tls: { // ← Encrypt connection
rejectUnauthorized: true
}
});Redis Security Checklist:
- Enable authentication (
requirepass) - Use TLS/SSL for connections
- Bind to private network only (not
0.0.0.0) - Use Redis ACLs (limit permissions)
- Regular security updates
- Firewall rules (restrict access)
Sanitize user input before caching:
function isValidRequest(request: GenerateRequest): boolean {
// Check message count
if (!request.messages || request.messages.length === 0) {
return false;
}
// Check message size
const totalLength = request.messages.reduce(
(sum, msg) => sum + msg.content.length,
0
);
if (totalLength > 100000) { // 100KB limit
return false;
}
// Check for malicious content
for (const msg of request.messages) {
if (containsMaliciousContent(msg.content)) {
return false;
}
}
return true;
}
// Usage
if (!isValidRequest(request)) {
throw new Error('Invalid request');
}
const response = await cache.generate(request);Protect against abuse:
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // 100 requests per window
message: 'Too many requests, please try again later.'
});
app.post('/api/chat', limiter, async (req, res) => {
try {
const response = await cache.generate(req.body);
res.json(response);
} catch (error) {
res.status(500).json({ error: 'Internal server error' });
}
});Development:
const store = new MemoryStore(1000); // Fast, ephemeralProduction (single server):
const store = new MemoryStore(10000); // Fast, limited to one instanceProduction (multi-server):
const store = new RedisStore({...}); // Shared, scalableToo short → Low hit rate, increased costs Too long → Stale data, wasted storage
Recommendation by use case:
| Use Case | TTL | Reason |
|---|---|---|
| Static FAQ | 86400 (24h) | Rarely changes |
| Product info | 3600 (1h) | Occasional updates |
| Weather data | 300 (5m) | Frequently changes |
| Real-time data | 60 (1m) or disable | Constantly changing |
| Code generation | 604800 (7d) | Deterministic output |
Use connection pooling for high traffic:
import { createClient } from 'redis';
const redisClient = createClient({
socket: {
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT!)
},
password: process.env.REDIS_PASSWORD
});
await redisClient.connect();
// Share client across requests
const store = new RedisStore(redisClient as any);Optimize cache hit rate:
// ✅ Enabled (default) - Better hit rate
const cache = new NeuroCache({
enableContextIntelligence: true // Normalizes whitespace, removes duplicates
});
// Results in more cache hits:
// "What is 2+2?" === "What is 2+2?" === " What is 2+2? "When to disable:
// Exact matching needed (e.g., code generation)
const cache = new NeuroCache({
enableContextIntelligence: false
});Target: 40-60% hit rate
Below 40% → Review TTL, context intelligence settings Above 80% → Great! But verify data freshness
setInterval(() => {
const hitRate = cache.getCacheHitRate();
if (hitRate < 0.4) {
logger.warn('Low cache hit rate', { hitRate });
}
console.log(`Cache hit rate: ${(hitRate * 100).toFixed(1)}%`);
}, 60000); // Every minuteTrack key metrics:
import express from 'express';
const app = express();
app.get('/metrics', (req, res) => {
const metrics = cache.getMetrics();
res.json({
totalRequests: metrics.totalRequests,
cacheHits: metrics.cacheHits,
cacheMisses: metrics.cacheMisses,
hitRate: (metrics.cacheHits / metrics.totalRequests * 100).toFixed(1) + '%',
tokensSaved: metrics.tokensSaved,
costSaved: metrics.estimatedCostSaved.toFixed(4),
avgLatencySaved: metrics.averageLatencySaved.toFixed(0) + 'ms',
errors: {
provider: metrics.providerErrors,
store: metrics.storeErrors
}
});
});
app.listen(3000);app.get('/health', async (req, res) => {
try {
// Test cache operation
const testKey = 'health-check';
await cache.generate({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'ping' }]
});
const metrics = cache.getMetrics();
const hasErrors = metrics.providerErrors > 0 || metrics.storeErrors > 0;
res.status(hasErrors ? 503 : 200).json({
status: hasErrors ? 'degraded' : 'healthy',
cache: {
hitRate: cache.getCacheHitRate(),
errors: {
provider: metrics.providerErrors,
store: metrics.storeErrors
}
}
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error instanceof Error ? error.message : 'Unknown error'
});
}
});See API Reference - Custom MetricsAdapter for full example.
Structured logging:
import winston from 'winston';
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
});
// Log metrics periodically
setInterval(() => {
const metrics = cache.getMetrics();
logger.info('NeuroCache metrics', {
hitRate: cache.getCacheHitRate(),
totalRequests: metrics.totalRequests,
tokensSaved: metrics.tokensSaved,
costSaved: metrics.estimatedCostSaved
});
}, 60000);✅ Good candidates:
- Identical requests (FAQ, common queries)
- Expensive computations
- High-latency API calls
- Deterministic outputs (temperature=0)
- Static content
❌ Poor candidates:
- Real-time data (stock prices, weather)
- User-specific content (without proper key namespacing)
- Non-deterministic outputs (high temperature)
- Sensitive data (PII, secrets)
Option 1: TTL-based (Recommended)
const cache = new NeuroCache({
ttl: 3600 // Auto-expire after 1 hour
});Option 2: Manual invalidation
// Clear specific entry (requires custom implementation)
// NeuroCache doesn't expose per-key deletion yet
// Clear all cache
await cache.clearCache();Option 3: Event-based invalidation
// When data changes, clear cache
eventEmitter.on('product-updated', async () => {
await cache.clearCache(); // Or implement selective clearing
});NeuroCache automatically creates keys from request parameters. To namespace by user:
// Include user ID in system message (not recommended for privacy)
const response = await cache.generate({
model: 'gpt-3.5-turbo',
messages: [
{ role: 'system', content: `User ID: ${userId}` }, // Makes cache user-specific
{ role: 'user', content: userQuestion }
]
});
// Better: Use separate cache instance per user (if needed)
const userCaches = new Map<string, NeuroCache>();
function getCacheForUser(userId: string): NeuroCache {
if (!userCaches.has(userId)) {
userCaches.set(userId, new NeuroCache({
provider,
store: new RedisStore({
...redisConfig,
keyPrefix: `user:${userId}:cache:` // User-specific namespace
})
}));
}
return userCaches.get(userId)!;
}Always handle provider errors:
try {
const response = await cache.generate(request);
return response;
} catch (error) {
logger.error('Cache generation failed', { error, request });
// Option 1: Return error to user
throw new Error('Failed to generate response');
// Option 2: Retry with exponential backoff
return await retryWithBackoff(() => cache.generate(request));
// Option 3: Fallback to default response
return { content: 'Sorry, service temporarily unavailable.' };
}async function retryWithBackoff<T>(
fn: () => Promise<T>,
maxRetries = 3,
baseDelay = 1000
): Promise<T> {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (i === maxRetries - 1) throw error;
const delay = baseDelay * Math.pow(2, i); // Exponential backoff
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Max retries exceeded');
}
// Usage
const response = await retryWithBackoff(() =>
cache.generate(request)
);Store errors are non-fatal:
// If store.get() fails → Cache miss (calls provider)
// If store.set() fails → Response still returned (just not cached)
// Metrics will show store errors:
const metrics = cache.getMetrics();
if (metrics.storeErrors > 0) {
logger.warn('Store errors detected', { count: metrics.storeErrors });
// Alert ops team, check Redis health
}import { NeuroCache, MemoryStore } from 'neurocache';
describe('NeuroCache', () => {
let cache: NeuroCache;
beforeEach(() => {
cache = new NeuroCache({
provider: new MockProvider(),
store: new MemoryStore(),
ttl: 60
});
});
it('should cache identical requests', async () => {
const request = {
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'test' }]
};
const response1 = await cache.generate(request);
const response2 = await cache.generate(request);
expect(response1).toEqual(response2);
expect(cache.getCacheHitRate()).toBe(0.5); // 1 hit, 1 miss
});
});describe('NeuroCache Integration', () => {
it('should work with real OpenAI API', async () => {
const cache = new NeuroCache({
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!
}),
store: new MemoryStore()
});
const response = await cache.generate({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Say "test"' }]
});
expect(response.content).toContain('test');
});
});❌ Bad:
const response = await cache.generate(request); // Unhandled rejection✅ Good:
try {
const response = await cache.generate(request);
} catch (error) {
logger.error('Generation failed', { error });
throw error;
}❌ Bad:
const cache = new NeuroCache({
store: new MemoryStore() // Lost on restart!
});✅ Good:
const cache = new NeuroCache({
store: new RedisStore({...}) // Persistent, shared
});❌ Bad:
const provider = new OpenAIProvider({
apiKey: 'sk-...' // Security risk!
});✅ Good:
const provider = new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!
});❌ Bad:
// No monitoring → No visibility into performance✅ Good:
setInterval(() => {
console.log(cache.getMetricsSummary());
}, 60000);❌ Bad:
const cache = new NeuroCache({
ttl: 86400 * 365 // 1 year → Stale data risk
});✅ Good:
const cache = new NeuroCache({
ttl: 3600 // 1 hour → Balance freshness and performance
});❌ Bad:
const store = new RedisStore({...});
// Never disconnect → Connection leak✅ Good:
const store = new RedisStore({...});
process.on('SIGTERM', async () => {
await store.disconnect();
process.exit(0);
});❌ Bad:
// Deploying without testing cache hit/miss behavior✅ Good:
// Test with real requests
const request = {...};
const t1 = Date.now();
await cache.generate(request);
console.log('First request:', Date.now() - t1, 'ms'); // ~2000ms
const t2 = Date.now();
await cache.generate(request);
console.log('Second request:', Date.now() - t2, 'ms'); // ~5ms
console.log('Hit rate:', cache.getCacheHitRate()); // 0.5// ✅ Complete production setup
import 'dotenv/config';
import { NeuroCache, OpenAIProvider, RedisStore } from 'neurocache';
import winston from 'winston';
const logger = winston.createLogger({...});
const cache = new NeuroCache({
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!,
timeout: 30000,
maxRetries: 2
}),
store: new RedisStore({
host: process.env.REDIS_HOST!,
port: parseInt(process.env.REDIS_PORT!),
password: process.env.REDIS_PASSWORD,
keyPrefix: `${process.env.APP_NAME}:cache:`,
tls: process.env.REDIS_TLS === 'true' ? {} : undefined
}),
ttl: parseInt(process.env.CACHE_TTL || '3600'),
logging: false,
enableContextIntelligence: true
});
// Metrics monitoring
setInterval(() => {
const summary = cache.getMetricsSummary();
logger.info('NeuroCache metrics', { summary });
}, 60000);
// Graceful shutdown
process.on('SIGTERM', async () => {
logger.info('Shutting down...');
await cache.clearCache(); // Optional
process.exit(0);
});
// Health check
app.get('/health', (req, res) => {
const metrics = cache.getMetrics();
res.json({
status: 'ok',
hitRate: cache.getCacheHitRate(),
errors: metrics.providerErrors + metrics.storeErrors
});
});
// Error handling
app.post('/api/generate', async (req, res) => {
try {
const response = await cache.generate(req.body);
res.json(response);
} catch (error) {
logger.error('Generation failed', { error, request: req.body });
res.status(500).json({ error: 'Internal server error' });
}
});Ready for production! See Use Cases for real-world examples.