Best Practices

Production-ready guidelines for deploying NeuroCache.

Production Deployment
Security
Performance Optimization
Monitoring & Observability
Cache Strategy
Error Handling
Testing
Common Pitfalls

Production Deployment

✅ Checklist

Before deploying to production:

Production Configuration

✅ Recommended:

import { NeuroCache, OpenAIProvider, RedisStore } from 'neurocache';

const cache = new NeuroCache({
  provider: new OpenAIProvider({
    apiKey: process.env.OPENAI_API_KEY!,
    timeout: 30000,        // 30s timeout
    maxRetries: 2          // Retry twice
  }),
  store: new RedisStore({
    host: process.env.REDIS_HOST!,
    port: parseInt(process.env.REDIS_PORT!),
    password: process.env.REDIS_PASSWORD,
    keyPrefix: `${process.env.APP_NAME}:cache:`,
    tls: process.env.REDIS_TLS === 'true' ? {} : undefined
  }),
  ttl: parseInt(process.env.CACHE_TTL || '3600'),
  logging: false,          // Disable in production
  enableContextIntelligence: true
});

❌ Avoid:

// ❌ Hardcoded secrets
const provider = new OpenAIProvider({
  apiKey: 'sk-...'  // Security risk!
});

// ❌ MemoryStore in production
const store = new MemoryStore();  // Lost on restart, not shared

// ❌ Logging enabled in production
const cache = new NeuroCache({
  logging: true  // Performance overhead, log spam
});

// ❌ No TTL or very long TTL
const cache = new NeuroCache({
  ttl: 0  // Cache forever (stale data risk)
});

Environment Variables

Create .env file:

# Required
OPENAI_API_KEY=sk-...
REDIS_HOST=your-redis-host
REDIS_PORT=6379

# Optional
REDIS_PASSWORD=your-password
REDIS_TLS=true
REDIS_DB=0
APP_NAME=myapp
CACHE_TTL=3600
NODE_ENV=production

Load with dotenv:

import 'dotenv/config';

const cache = new NeuroCache({
  provider: new OpenAIProvider({
    apiKey: process.env.OPENAI_API_KEY!
  }),
  store: new RedisStore({
    host: process.env.REDIS_HOST!,
    port: parseInt(process.env.REDIS_PORT || '6379'),
    password: process.env.REDIS_PASSWORD
  })
});

Never commit .env to version control!

Add to .gitignore:

.env
.env.*
!.env.example

Create .env.example (safe to commit):

OPENAI_API_KEY=sk-your-key-here
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=

Security

API Key Protection

✅ DO:

// ✅ Environment variables
const apiKey = process.env.OPENAI_API_KEY!;

// ✅ Secret management (AWS Secrets Manager, Vault)
const apiKey = await getSecret('openai-api-key');

// ✅ Validate key exists
if (!process.env.OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY environment variable is required');
}

❌ DON'T:

// ❌ Hardcoded keys
const apiKey = 'sk-...';

// ❌ Logged in code
console.log('API Key:', apiKey);

// ❌ Exposed in client-side code
const provider = new OpenAIProvider({ apiKey });  // In browser JS

// ❌ Committed to Git

Redis Authentication

✅ Secure Redis:

const store = new RedisStore({
  host: process.env.REDIS_HOST!,
  port: 6379,
  password: process.env.REDIS_PASSWORD,  // ← Required in production
  username: process.env.REDIS_USERNAME,  // ← ACL support
  tls: {                                 // ← Encrypt connection
    rejectUnauthorized: true
  }
});

Redis Security Checklist:

Enable authentication (requirepass)
Use TLS/SSL for connections
Bind to private network only (not 0.0.0.0)
Use Redis ACLs (limit permissions)
Regular security updates
Firewall rules (restrict access)

Input Validation

Sanitize user input before caching:

function isValidRequest(request: GenerateRequest): boolean {
  // Check message count
  if (!request.messages || request.messages.length === 0) {
    return false;
  }
  
  // Check message size
  const totalLength = request.messages.reduce(
    (sum, msg) => sum + msg.content.length,
    0
  );
  if (totalLength > 100000) {  // 100KB limit
    return false;
  }
  
  // Check for malicious content
  for (const msg of request.messages) {
    if (containsMaliciousContent(msg.content)) {
      return false;
    }
  }
  
  return true;
}

// Usage
if (!isValidRequest(request)) {
  throw new Error('Invalid request');
}

const response = await cache.generate(request);

Rate Limiting

Protect against abuse:

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 100,                   // 100 requests per window
  message: 'Too many requests, please try again later.'
});

app.post('/api/chat', limiter, async (req, res) => {
  try {
    const response = await cache.generate(req.body);
    res.json(response);
  } catch (error) {
    res.status(500).json({ error: 'Internal server error' });
  }
});

Performance Optimization

1. Choose the Right Store

Development:

const store = new MemoryStore(1000);  // Fast, ephemeral

Production (single server):

const store = new MemoryStore(10000);  // Fast, limited to one instance

Production (multi-server):

const store = new RedisStore({...});  // Shared, scalable

2. Optimize TTL

Too short → Low hit rate, increased costs Too long → Stale data, wasted storage

Recommendation by use case:

Use Case	TTL	Reason
Static FAQ	86400 (24h)	Rarely changes
Product info	3600 (1h)	Occasional updates
Weather data	300 (5m)	Frequently changes
Real-time data	60 (1m) or disable	Constantly changing
Code generation	604800 (7d)	Deterministic output

3. Connection Pooling (Redis)

Use connection pooling for high traffic:

import { createClient } from 'redis';

const redisClient = createClient({
  socket: {
    host: process.env.REDIS_HOST,
    port: parseInt(process.env.REDIS_PORT!)
  },
  password: process.env.REDIS_PASSWORD
});

await redisClient.connect();

// Share client across requests
const store = new RedisStore(redisClient as any);

4. Context Intelligence

Optimize cache hit rate:

// ✅ Enabled (default) - Better hit rate
const cache = new NeuroCache({
  enableContextIntelligence: true  // Normalizes whitespace, removes duplicates
});

// Results in more cache hits:
// "What is 2+2?" === "What is  2+2?" === "  What is 2+2?  "

When to disable:

// Exact matching needed (e.g., code generation)
const cache = new NeuroCache({
  enableContextIntelligence: false
});

5. Monitor Hit Rate

Target: 40-60% hit rate

Below 40% → Review TTL, context intelligence settings Above 80% → Great! But verify data freshness

setInterval(() => {
  const hitRate = cache.getCacheHitRate();
  
  if (hitRate < 0.4) {
    logger.warn('Low cache hit rate', { hitRate });
  }
  
  console.log(`Cache hit rate: ${(hitRate * 100).toFixed(1)}%`);
}, 60000);  // Every minute

Monitoring & Observability

1. Metrics Dashboard

Track key metrics:

import express from 'express';

const app = express();

app.get('/metrics', (req, res) => {
  const metrics = cache.getMetrics();
  
  res.json({
    totalRequests: metrics.totalRequests,
    cacheHits: metrics.cacheHits,
    cacheMisses: metrics.cacheMisses,
    hitRate: (metrics.cacheHits / metrics.totalRequests * 100).toFixed(1) + '%',
    tokensSaved: metrics.tokensSaved,
    costSaved: metrics.estimatedCostSaved.toFixed(4),
    avgLatencySaved: metrics.averageLatencySaved.toFixed(0) + 'ms',
    errors: {
      provider: metrics.providerErrors,
      store: metrics.storeErrors
    }
  });
});

app.listen(3000);

2. Health Checks

app.get('/health', async (req, res) => {
  try {
    // Test cache operation
    const testKey = 'health-check';
    await cache.generate({
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: 'ping' }]
    });
    
    const metrics = cache.getMetrics();
    const hasErrors = metrics.providerErrors > 0 || metrics.storeErrors > 0;
    
    res.status(hasErrors ? 503 : 200).json({
      status: hasErrors ? 'degraded' : 'healthy',
      cache: {
        hitRate: cache.getCacheHitRate(),
        errors: {
          provider: metrics.providerErrors,
          store: metrics.storeErrors
        }
      }
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error instanceof Error ? error.message : 'Unknown error'
    });
  }
});

3. Prometheus Integration

See API Reference - Custom MetricsAdapter for full example.

4. Logging

Structured logging:

import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

// Log metrics periodically
setInterval(() => {
  const metrics = cache.getMetrics();
  logger.info('NeuroCache metrics', {
    hitRate: cache.getCacheHitRate(),
    totalRequests: metrics.totalRequests,
    tokensSaved: metrics.tokensSaved,
    costSaved: metrics.estimatedCostSaved
  });
}, 60000);

Cache Strategy

When to Cache

✅ Good candidates:

Identical requests (FAQ, common queries)
Expensive computations
High-latency API calls
Deterministic outputs (temperature=0)
Static content

❌ Poor candidates:

Real-time data (stock prices, weather)
User-specific content (without proper key namespacing)
Non-deterministic outputs (high temperature)
Sensitive data (PII, secrets)

Cache Invalidation

Option 1: TTL-based (Recommended)

const cache = new NeuroCache({
  ttl: 3600  // Auto-expire after 1 hour
});

Option 2: Manual invalidation

// Clear specific entry (requires custom implementation)
// NeuroCache doesn't expose per-key deletion yet

// Clear all cache
await cache.clearCache();

Option 3: Event-based invalidation

// When data changes, clear cache
eventEmitter.on('product-updated', async () => {
  await cache.clearCache();  // Or implement selective clearing
});

Cache Key Design

NeuroCache automatically creates keys from request parameters. To namespace by user:

// Include user ID in system message (not recommended for privacy)
const response = await cache.generate({
  model: 'gpt-3.5-turbo',
  messages: [
    { role: 'system', content: `User ID: ${userId}` },  // Makes cache user-specific
    { role: 'user', content: userQuestion }
  ]
});

// Better: Use separate cache instance per user (if needed)
const userCaches = new Map<string, NeuroCache>();

function getCacheForUser(userId: string): NeuroCache {
  if (!userCaches.has(userId)) {
    userCaches.set(userId, new NeuroCache({
      provider,
      store: new RedisStore({
        ...redisConfig,
        keyPrefix: `user:${userId}:cache:`  // User-specific namespace
      })
    }));
  }
  return userCaches.get(userId)!;
}

Error Handling

Graceful Degradation

Always handle provider errors:

try {
  const response = await cache.generate(request);
  return response;
} catch (error) {
  logger.error('Cache generation failed', { error, request });
  
  // Option 1: Return error to user
  throw new Error('Failed to generate response');
  
  // Option 2: Retry with exponential backoff
  return await retryWithBackoff(() => cache.generate(request));
  
  // Option 3: Fallback to default response
  return { content: 'Sorry, service temporarily unavailable.' };
}

Retry Logic

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries = 3,
  baseDelay = 1000
): Promise<T> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      
      const delay = baseDelay * Math.pow(2, i);  // Exponential backoff
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  throw new Error('Max retries exceeded');
}

// Usage
const response = await retryWithBackoff(() =>
  cache.generate(request)
);

Store Error Handling

Store errors are non-fatal:

// If store.get() fails → Cache miss (calls provider)
// If store.set() fails → Response still returned (just not cached)

// Metrics will show store errors:
const metrics = cache.getMetrics();
if (metrics.storeErrors > 0) {
  logger.warn('Store errors detected', { count: metrics.storeErrors });
  // Alert ops team, check Redis health
}

Testing

Unit Tests

import { NeuroCache, MemoryStore } from 'neurocache';

describe('NeuroCache', () => {
  let cache: NeuroCache;
  
  beforeEach(() => {
    cache = new NeuroCache({
      provider: new MockProvider(),
      store: new MemoryStore(),
      ttl: 60
    });
  });
  
  it('should cache identical requests', async () => {
    const request = {
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: 'test' }]
    };
    
    const response1 = await cache.generate(request);
    const response2 = await cache.generate(request);
    
    expect(response1).toEqual(response2);
    expect(cache.getCacheHitRate()).toBe(0.5);  // 1 hit, 1 miss
  });
});

Integration Tests

describe('NeuroCache Integration', () => {
  it('should work with real OpenAI API', async () => {
    const cache = new NeuroCache({
      provider: new OpenAIProvider({
        apiKey: process.env.OPENAI_API_KEY!
      }),
      store: new MemoryStore()
    });
    
    const response = await cache.generate({
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: 'Say "test"' }]
    });
    
    expect(response.content).toContain('test');
  });
});

Common Pitfalls

1. Forgetting to Handle Errors

❌ Bad:

const response = await cache.generate(request);  // Unhandled rejection

✅ Good:

try {
  const response = await cache.generate(request);
} catch (error) {
  logger.error('Generation failed', { error });
  throw error;
}

2. Using MemoryStore in Production

❌ Bad:

const cache = new NeuroCache({
  store: new MemoryStore()  // Lost on restart!
});

✅ Good:

const cache = new NeuroCache({
  store: new RedisStore({...})  // Persistent, shared
});

3. Hardcoding API Keys

❌ Bad:

const provider = new OpenAIProvider({
  apiKey: 'sk-...'  // Security risk!
});

✅ Good:

const provider = new OpenAIProvider({
  apiKey: process.env.OPENAI_API_KEY!
});

4. Ignoring Metrics

❌ Bad:

// No monitoring → No visibility into performance

✅ Good:

setInterval(() => {
  console.log(cache.getMetricsSummary());
}, 60000);

5. Setting TTL Too High

❌ Bad:

const cache = new NeuroCache({
  ttl: 86400 * 365  // 1 year → Stale data risk
});

✅ Good:

const cache = new NeuroCache({
  ttl: 3600  // 1 hour → Balance freshness and performance
});

6. No Redis Connection Management

❌ Bad:

const store = new RedisStore({...});
// Never disconnect → Connection leak

✅ Good:

const store = new RedisStore({...});

process.on('SIGTERM', async () => {
  await store.disconnect();
  process.exit(0);
});

7. Not Testing Cache Behavior

❌ Bad:

// Deploying without testing cache hit/miss behavior

✅ Good:

// Test with real requests
const request = {...};

const t1 = Date.now();
await cache.generate(request);
console.log('First request:', Date.now() - t1, 'ms');  // ~2000ms

const t2 = Date.now();
await cache.generate(request);
console.log('Second request:', Date.now() - t2, 'ms');  // ~5ms

console.log('Hit rate:', cache.getCacheHitRate());  // 0.5

Quick Reference

Production Checklist

// ✅ Complete production setup
import 'dotenv/config';
import { NeuroCache, OpenAIProvider, RedisStore } from 'neurocache';
import winston from 'winston';

const logger = winston.createLogger({...});

const cache = new NeuroCache({
  provider: new OpenAIProvider({
    apiKey: process.env.OPENAI_API_KEY!,
    timeout: 30000,
    maxRetries: 2
  }),
  store: new RedisStore({
    host: process.env.REDIS_HOST!,
    port: parseInt(process.env.REDIS_PORT!),
    password: process.env.REDIS_PASSWORD,
    keyPrefix: `${process.env.APP_NAME}:cache:`,
    tls: process.env.REDIS_TLS === 'true' ? {} : undefined
  }),
  ttl: parseInt(process.env.CACHE_TTL || '3600'),
  logging: false,
  enableContextIntelligence: true
});

// Metrics monitoring
setInterval(() => {
  const summary = cache.getMetricsSummary();
  logger.info('NeuroCache metrics', { summary });
}, 60000);

// Graceful shutdown
process.on('SIGTERM', async () => {
  logger.info('Shutting down...');
  await cache.clearCache();  // Optional
  process.exit(0);
});

// Health check
app.get('/health', (req, res) => {
  const metrics = cache.getMetrics();
  res.json({
    status: 'ok',
    hitRate: cache.getCacheHitRate(),
    errors: metrics.providerErrors + metrics.storeErrors
  });
});

// Error handling
app.post('/api/generate', async (req, res) => {
  try {
    const response = await cache.generate(req.body);
    res.json(response);
  } catch (error) {
    logger.error('Generation failed', { error, request: req.body });
    res.status(500).json({ error: 'Internal server error' });
  }
});

Ready for production! See Use Cases for real-world examples.

FilesExpand file tree

BEST_PRACTICES.md

Latest commit

History

BEST_PRACTICES.md

File metadata and controls

Best Practices

Table of Contents

Production Deployment

✅ Checklist

Production Configuration

Environment Variables

Security

API Key Protection

Redis Authentication

Input Validation

Rate Limiting

Performance Optimization

1. Choose the Right Store

2. Optimize TTL

3. Connection Pooling (Redis)

4. Context Intelligence

5. Monitor Hit Rate

Monitoring & Observability

1. Metrics Dashboard

2. Health Checks

3. Prometheus Integration

4. Logging

Cache Strategy

When to Cache

Cache Invalidation

Cache Key Design

Error Handling

Graceful Degradation

Retry Logic

Store Error Handling

Testing

Unit Tests

Integration Tests

Common Pitfalls

1. Forgetting to Handle Errors

2. Using MemoryStore in Production

3. Hardcoding API Keys

4. Ignoring Metrics

5. Setting TTL Too High

6. No Redis Connection Management

7. Not Testing Cache Behavior

Quick Reference

Production Checklist