ModelMesh middleware lets you intercept requests and responses without modifying library internals. Use middleware for logging, request transforms, response enrichment, caching, or custom error handling. Middleware runs inside the router's request pipeline, after pool selection and before/after provider execution.
import modelmesh
from modelmesh import Middleware, MiddlewareContext
from modelmesh.interfaces.provider import CompletionRequest, CompletionResponse
class LoggingMiddleware(Middleware):
async def before_request(self, request, context):
print(f"Routing to {context.model_id}")
return request
async def after_response(self, response, context):
print(f"Got {response.usage.total_tokens} tokens")
return response
client = modelmesh.create("chat", middleware=[LoggingMiddleware()])import { create, Middleware, MiddlewareContext } from '@nistrapa/modelmesh-core';
class LoggingMiddleware extends Middleware {
async beforeRequest(request, context) {
console.log(`Routing to ${context.modelId}`);
return request;
}
async afterResponse(response, context) {
console.log(`Got ${response.usage?.totalTokens} tokens`);
return response;
}
}
const client = create('chat', { middleware: [new LoggingMiddleware()] });Middleware hooks are called in a pipeline around each provider call:
beforeRequest (A → B → C) ← forward order
│
▼
provider.complete()
│
▼
afterResponse (C → B → A) ← reverse order (onion model)
If the provider throws an error:
onError (A → B → C) ← forward order, first handler wins
Override any combination of these three hooks:
Called before the provider receives the request. Return the (possibly modified) request to proceed.
Called after a successful response. Return the (possibly modified) response.
Called when the provider raises an error. Either:
- Return a fallback
CompletionResponseto suppress the error - Raise the error (or a new one) to let the router retry/rotate
Every hook receives a context object with metadata about the current routing decision:
| Field | Type | Description |
|---|---|---|
model_id |
str |
Real model identifier selected |
provider_id |
str |
Connector ID of the provider |
pool_name |
str |
Virtual model / pool name |
attempt |
int |
Current retry attempt (1-based) |
timestamp |
float |
When the request was initiated |
metadata |
dict |
Arbitrary key-value store for chaining |
Add custom headers or modify parameters:
class AddMetadata(Middleware):
async def before_request(self, request, context):
# Add a custom field to the request
context.metadata["request_id"] = str(uuid.uuid4())
return requestAdd computed fields to responses:
class TimingMiddleware(Middleware):
async def before_request(self, request, context):
context.metadata["start_time"] = time.time()
return request
async def after_response(self, response, context):
elapsed = time.time() - context.metadata["start_time"]
context.metadata["latency_ms"] = elapsed * 1000
return responseReturn a cached or default response on failure:
class FallbackMiddleware(Middleware):
def __init__(self, default_response):
self._default = default_response
async def on_error(self, error, context):
if context.attempt >= 3:
return self._default # Return fallback after 3 failures
raise error # Let the router retryStack multiple middleware — they execute in registration order:
client = modelmesh.create("chat", middleware=[
AuthMiddleware(), # Runs first (before_request)
LoggingMiddleware(), # Runs second (before_request)
CachingMiddleware(), # Runs third (before_request)
])
# after_response runs in reverse: Caching → Logging → AuthFor advanced use, you can create a MiddlewareStack directly:
from modelmesh import MiddlewareStack
stack = MiddlewareStack()
stack.add(LoggingMiddleware())
stack.add(CachingMiddleware())
# Use programmatically
result = await stack.run_before_request(request, context)
result = await stack.run_after_response(response, context)
result = await stack.run_on_error(error, context)Middleware is entirely optional. Existing code that doesn't use middleware continues to work identically. The middleware parameter defaults to None, and the router skips middleware hooks when none are configured.
See also: FAQ · Quick Start · System Configuration