chatguru Agent is a production-ready whitelabel chatbot with RAG capabilities and agentic commerce integration, built with FastAPI, LangChain, and Azure OpenAI.
Brought with ❤️ by Netguru
Read the full Docs at: https://github.com/netguru/chatguru
chatguru Agent ships with WebSocket streaming, RAG capabilities, and comprehensive observability!
Key Features:
- Real-time WebSocket streaming for instant responses
- RAG-powered product search and recommendations
- Comprehensive API documentation with Swagger UI
ℹ️ Library supports Python 3.12+
# Clone the repository
git clone <repository-url>
cd chatguru
# Complete development setup
make setupAfter installation:
# Configure environment variables
make env-setup
# Edit .env with your credentials
# Start the development server
make devCheck the live demo at http://localhost:8000/
This is how you can use the WebSocket API in your app:
import asyncio
import websockets
import json
async def chat():
uri = "ws://localhost:8000/ws"
async with websockets.connect(uri) as websocket:
# Send message
await websocket.send(json.dumps({
"message": "Hello, how are you?",
"session_id": None
}))
# Receive streaming response
async for message in websocket:
data = json.loads(message)
if data["type"] == "token":
print(data["content"], end="", flush=True)
elif data["type"] == "end":
print("\n")
break
elif data["type"] == "error":
print(f"Error: {data['content']}")
break
asyncio.run(chat())- 🚀 WebSocket Streaming: Real-time streaming chat responses via WebSocket
- 🧪 Minimal Test UI: Lightweight HTML at
/for smoke testing only - 🎨 Whitelabel Design: Easily customizable for different brands and tenants
- 🧠 RAG Capabilities: Semantic product search with sqlite-vec vector database
- 🛒 Agentic Commerce: Ready for MCP (Model Context Protocol) integration
- 📊 Observability: Built-in Langfuse tracing and monitoring
- ✅ Testing: Comprehensive test suite with promptfoo LLM evaluation
- 🐳 Production Ready: Docker containerization with health checks
Simple, modular architecture designed for whitelabel deployment:
graph LR
subgraph "Current Implementation"
UI[React/Vite Frontend<br/>frontend/] -->|WebSocket| API[FastAPI API]
API -->|Streaming| AGENT[Agent Service]
AGENT -->|AzureChatOpenAI| LLM[Azure OpenAI]
AGENT -->|RAG Tool| PRODUCTDB[Product DB<br/>sqlite-vec]
AGENT --> LANGFUSE[Langfuse<br/>Tracing]
end
subgraph "Future Extensions"
MCP[MCP Tools<br/>Commerce Platforms]
AGENT -.-> MCP
end
For detailed architecture documentation, see docs/architecture.md.
- Backend: FastAPI + Uvicorn (async)
- AI/ML: LangChain + Azure OpenAI (direct integration)
- LLM Provider: Azure OpenAI (via langchain-openai)
- Vector Search: sqlite-vec (semantic product search)
- Observability: Langfuse
- Testing: pytest + promptfoo + GenericFakeChatModel
- Code Quality: mypy + ruff + pre-commit
- Frontend: React 19 + Vite (
frontend/) - CSS: Tailwind CSS v4 (via
@tailwindcss/vite) - Containerization: Docker + Docker Compose
- Package Management: uv (Python) + npm (Node.js)
- Development: Makefile for task automation
A React + Vite frontend lives in the frontend/ directory.
Run it locally:
make frontend-dev # Vite dev server → http://localhost:5173Or via Docker Compose — the frontend service starts automatically on port 5173.
Copy the env template before running:
cp frontend/.env.example frontend/.envBefore you begin, ensure you have the following installed:
- Python 3.12+ (Download)
- Node.js 20+ and npm — required by React 19 (Download)
- uv - Fast Python package installer (Installation guide)
- Docker and Docker Compose (optional, for containerized deployment)
- Azure OpenAI account with API access
- Langfuse account (for observability and tracing)
git clone <repository-url>
cd chatguru# Install dependencies and set up pre-commit hooks
make setupThis command will:
- Install Python dependencies using
uv - Install and configure pre-commit hooks
- Set up the development environment
# Copy environment template
make env-setup
# Edit .env with your credentials
# Required: LLM_* and LANGFUSE_* variables (see Configuration section below)make dev- Frontend: http://localhost:5173
- Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- WebSocket Endpoint: ws://localhost:8000/ws
git clone <repository-url>
cd chatguru
# Copy and configure environment variables
make env-setup
# Edit .env with your credentials# Build and start all services
make docker-run
# Or run in background
make docker-run-detached- Frontend: http://localhost:5173
- Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- WebSocket Endpoint: ws://localhost:8000/ws
The application uses environment variables for configuration. Copy env.example to .env and configure the following:
| Variable | Description | Example |
|---|---|---|
OPENAI_ENDPOINT |
OpenAI-compatible base URL for chat + embeddings | https://your-resource.openai.azure.com/openai/v1 |
LLM_API_KEY |
Azure OpenAI API key | your-api-key-here |
LLM_DEPLOYMENT_NAME |
Azure OpenAI deployment name | gpt-4o-mini |
LANGFUSE_PUBLIC_KEY |
Langfuse public key | pk-lf-... |
LANGFUSE_SECRET_KEY |
Langfuse secret key | sk-lf-... |
LANGFUSE_HOST |
Langfuse host URL | https://cloud.langfuse.com |
| Variable | Description | Default |
|---|---|---|
FASTAPI_HOST |
API host address | 0.0.0.0 |
FASTAPI_PORT |
API port | 8000 |
FASTAPI_CORS_ORIGINS |
CORS allowed origins (JSON array) | ["*"] |
APP_NAME |
Application name | chatguru Agent |
DEBUG |
Enable debug mode | false |
LOG_LEVEL |
Logging level | INFO |
VECTOR_DB_TYPE |
Database type | sqlite |
VECTOR_DB_SQLITE_URL |
SQLite service URL | http://product-db:8001 |
PERSISTENCE_DATABASE_URL |
Async SQLAlchemy URL for chat history storage | (unset — disabled) |
LLM_API_VERSION |
API version for native Azure OpenAI setups | (empty) |
LLM_OPENAI_BASE_URL |
OpenAI v1-compatible chat base URL; when set, chat uses ChatOpenAI instead of native Azure routing |
(empty) |
TITLE_GENERATION_PROVIDER |
Title provider: openai, fallback, custom |
openai |
TITLE_GENERATION_CUSTOM_CLASS |
Custom class path (module.path:ClassName) when provider is custom |
(empty) |
PERSISTENCE_DATABASE_URL is the single toggle for server-side chat history:
- Unset (default) — persistence is disabled. The server is stateless: no database is required and no messages are stored. The
/historyand/conversationsendpoints are not registered at all (they won't appear in/docsor return 404). - Set — persistence is enabled. Messages and conversations are stored per
visitor_id/session_id. Runmake migrateonce after setting the URL to create the schema.
# SQLite (local dev / single-node)
PERSISTENCE_DATABASE_URL=sqlite+aiosqlite:///data/chatguru.db
# PostgreSQL
PERSISTENCE_DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/chatguruSee docs/persistence.md for the full architecture and instructions on adding new database adapters.
LLM URL modes: LLM_OPENAI_BASE_URL (universal OpenAI-compatible API) vs. OPENAI_ENDPOINT with empty LLM_OPENAI_BASE_URL (native Azure OpenAI client) is documented in docs/design-decisions.md.
See env.example for a complete template with detailed comments.
The primary interface for chat is via WebSocket at ws://localhost:8000/ws.
{
"message": "Your message here",
"session_id": "optional-session-id",
"messages": [
{"role": "user", "content": "previous user message"},
{"role": "assistant", "content": "previous assistant response"}
]
}Responses are streamed as JSON messages:
// Token chunk (streamed multiple times)
{"type": "token", "content": "chunk of text", "session_id": "session-id"}
// End of stream (includes the full response as safety)
{"type": "end", "content": "full assistant response", "session_id": "session-id"}
// Error response
{"type": "error", "content": "error message", "session_id": "session-id"}- Health Check:
GET /health - API Documentation:
GET /docs(Swagger UI) - OpenAPI Schema:
GET /openapi.json
The following endpoints are only registered when PERSISTENCE_DATABASE_URL is set:
GET /history— returns stored messages for avisitor_id+session_idpair, oldest first.- Query params:
visitor_id(required),session_id(default:"default")
- Query params:
GET /conversations— returns all conversations for avisitor_id, newest first.- Query params:
visitor_id(required)
- Query params:
Run make help to see all available commands. Key commands:
make setup # Complete development setup
make env-setup # Copy environment template
make install # Install production dependenciesmake dev # Start backend development server (auto-reload)
make frontend-dev # Start frontend development server (Vite, port 5173)
make run # Start production server (no auto-reload)make test # Run all tests
make coverage # Run tests with coverage report
make promptfoo-eval # Run LLM evaluation tests
make promptfoo-view # View evaluation resultsmake pre-commit-install # Install pre-commit hooks
make pre-commit # Run pre-commit checks manuallymake docker-build # Build Docker images
make docker-run # Run with Docker Compose (foreground)
make docker-run-detached # Run with Docker Compose (background)
make docker-stop # Stop services
make docker-down # Stop and remove containers
make docker-logs # View logs
make docker-clean # Clean all Docker resourcesmake version # Show current version
make clean # Clean Python cache fileschatguru/
├── frontend/ # React + Vite frontend
│ ├── src/ # Source code (components, hooks, pages)
│ ├── public/ # Static assets
│ ├── .env.example # Frontend env template
│ └── package.json
├── src/ # Main application code
│ ├── api/ # FastAPI application
│ │ ├── main.py # FastAPI app setup
│ │ ├── templates/ # Minimal HTML test UI
│ │ └── routes/ # API routes
│ │ └── chat.py # WebSocket chat endpoint
│ ├── agent/ # Agent implementation
│ │ ├── service.py # LangChain agent with streaming
│ │ ├── prompt.py # System prompts
│ │ └── __init__.py
│ ├── product_db/ # Product database (sqlite-vec)
│ │ ├── api.py # FastAPI service
│ │ ├── store.py # ProductStore with embeddings
│ │ ├── sqlite.py # HTTP client for agent
│ │ ├── base.py # Abstract interface
│ │ └── factory.py # Database factory
│ ├── rag/ # RAG components
│ │ ├── documents.py # Document handling
│ │ ├── simple_retriever.py # Retriever interface
│ │ └── products.json # Sample products data
│ ├── config.py # Configuration management
│ └── main.py # Application entry point
├── tests/ # Test suite
│ ├── test_api.py # API endpoint tests
│ ├── test_agent.py # Agent tests
│ └── conftest.py # Test configuration
├── docs/ # Documentation
│ └── architecture.md # Architecture documentation
├── promptfoo/ # LLM evaluation config
│ ├── provider.py # Python provider adapter
│ └── promptfooconfig.yaml
├── docker/ # Docker configuration
│ ├── Dockerfile # Backend Dockerfile
│ └── Dockerfile.db # Product database Dockerfile
├── .pre-commit-config.yaml # Pre-commit hooks
├── docker-compose.yml # Docker Compose setup
├── Makefile # Development commands
├── pyproject.toml # Python project configuration
├── env.example # Environment template
└── README.md # This file
# Run all tests
make test
# Run with coverage report
make coverageTests use GenericFakeChatModel from LangChain for reliable, deterministic testing without API calls.
# Run evaluation suite
make promptfoo-eval
# View results in browser
make promptfoo-view
# Run specific test file
make promptfoo-test TEST=tests/basic_greeting.yamlPromptfoo tests evaluate response quality, helpfulness, and boundary conditions.
RAGAS (Retrieval-Augmented Generation Assessment) and RAG Evaluator are frameworks/tools for evaluating the performance of Retrieval-Augmented Generation (RAG) systems. They provide metrics to assess aspects like faithfulness, answer relevance, context precision, and retrieval quality in RAG pipelines.
For detailed information on RAG testing and evaluation using RAGAS and RAG Evaluator, see docs/rag_eval_readme.md.
# Build and run backend with Docker Compose
make docker-run# Build backend image
docker build -f docker/Dockerfile -t chatguru-agent .
# Run backend container
docker run -p 8000:8000 --env-file .env chatguru-agent- Frontend:
5173(host) →5173(container) - Backend API:
8000(host) →8000(container) - Product DB:
8001(host) →8001(container) - WebSocket:
ws://localhost:8000/ws - Test UI:
http://localhost:8000/(minimal, not production)
The frontend service is included in Docker Compose and starts automatically on port 5173.
WS_PROXY_TARGET controls where Vite proxies WebSocket traffic inside the Docker network
(default: http://chatguru-agent:8000).
Solution: Ensure dependencies are installed:
make installSolution:
- Verify backend is running:
curl http://localhost:8000/health - Check WebSocket endpoint:
ws://localhost:8000/ws - Ensure CORS is configured correctly in
.env
Solution:
- Verify
OPENAI_ENDPOINTis a full OpenAI-compatible base URL ending in/v1 - Check
LLM_API_KEYis correct - Ensure
LLM_DEPLOYMENT_NAMEmatches your Azure deployment - If using native Azure OpenAI routing, verify
LLM_API_VERSIONis supported
Solution:
- Verify Langfuse credentials in
.env - Check
LANGFUSE_HOSTis correct (default:https://cloud.langfuse.com) - Ensure network connectivity to Langfuse
Solution:
- Ensure
uv.lockfile exists (runuv synclocally first) - Check Docker has sufficient resources
- Verify all required files are present
Solution:
- Backend (8000): Stop other services using port 8000 or change
FASTAPI_PORT - Frontend: Configure your external frontend to target the correct backend host/port
- Check docs/architecture.md for architecture details
- Review CONTRIBUTING.md for development guidelines
- Open an issue on GitHub for bugs or feature requests
- Architecture Guide - Detailed architecture documentation
- Contributing Guide - How to contribute to the project
- Getting Started Guide - Detailed setup instructions
We welcome contributions! Please see CONTRIBUTING.md for:
- Development setup instructions
- Code style guidelines
- Testing requirements
- Pull request process
- Issue reporting guidelines
- Vector Database Integration: sqlite-vec for semantic search ✅
- Streaming Responses: Real-time chat streaming via WebSocket ✅
- MCP Tools: Integration with commerce platforms (PimCore, Strapi, Medusa.js)
- Authentication: JWT-based API authentication
- Rate Limiting: API rate limiting and quotas
- Session Management: Client-side persistent conversation history (localStorage) ✅
- Server-side Sessions: Backend-persisted conversation history via
PERSISTENCE_DATABASE_URL(opt-in) ✅ - Multi-tenancy: Database-backed tenant configuration
This library is available as open source under the terms of the MIT License.
- FastAPI - Modern web framework
- LangChain - LLM application framework
- Langfuse - LLM observability platform
- promptfoo - LLM evaluation framework
For support and questions:
- 📖 Check the documentation
- 🐛 Open an issue for bugs
- 💬 Start a discussion for questions
- 📧 Contact the maintainers