A proof-of-concept agentic RAG application over the CoMSES Computational Model Library, built on Temporal.io.
A 5-minute setup and demo video on youtube: https://www.youtube.com/watch?v=sfjV-Id7-vg

A POC that lets researchers ask natural-language questions across computational model data — metadata, documentation, and source code (see sample_data/) — and get answers with paragraph-level citations back to the source material.
The agent itself is a Temporal workflow (AgentWorkflow) whose tools can be either Temporal activities (fast, mostly side-effect-free) or Temporal child workflows (multi-step, durable, with their own progress events).
Screenshots
- Not production-ready. No auth hardening, no rate-limiting at the public edge, etc.
- Not a search box or a chatbot wrapper around a single model — it decomposes queries, resolves relevant models (with optional human-in-the-loop), and runs hybrid (dense + sparse) vector search before generating cited answers.
Per-module intent.md files document the why behind each major decision:
intent.md— system-level rationale: agentic RAG over CoMSES, layered code structure, Temporal, worker split, event sourcing, LiteLLM proxysrc/modules/agent/intent.md— the conversation runtime:AgentWorkflow, three tool types, transactional outbox, context propagationsrc/modules/ingestion/intent.md— write side: marker-pdf, synthetic Q&A enrichment, hybrid embeddings (dense + BM42 sparse), tree-sitter for codesrc/modules/retrieval/intent.md— read side: intent analysis, query decomposition, model relevance + HITL, hybrid RRF search, source attribution, almost real-time progress
Software
- linux, wsl2, macos (didn't test)
- Docker + Docker Compose (for the infrastructure stack)
./setup.shwill install missing dependencies automatically
Hardware
- 16 GB RAM minimum; 24 GB+ recommended (PyTorch + marker-pdf + embedding models share host memory)
- ~10 GB disk for ML model weights and Docker images
- GPU: NVIDIA GPU with CUDA for faster PDF parsing and embeddings.
Verified on Windows 11 laptop on WSL2 with 32 GB RAM, 8 GB VRAM (NVIDIA RTX 2000 Ada Generation)
LLM access
- An API key from at least one provider — OpenAI, Anthropic, OpenRouter, Groq, Google — or a local Ollama instance reachable at
OLLAMA_HOST.setup.shprobes the keys you supply and auto-picks the first live profile. - Embeddings (dense + sparse BM42) run locally via FastEmbed by default — no separate API needed. A GPU is highly recommended for embeddings computation.
./setup.shThe script bootstraps everything in phases: toolchain install, .env generation with auto-generated secrets, Docker stack startup (Postgres, Qdrant, Redis, MinIO, Temporal, LiteLLM), database migrations, model warming, sample-data ingestion. It will prompt for LLM API KEY and llm/embeddings configuration and worker startup. When it finishes you'll have a UI at http://localhost:5173 and a sample dataset to query.
Run ./setup.sh --help for individual phase verbs (re-run a phase, recreate, etc.).
Each phase is idempotent (sentinel-gated) and resumable — a re-run picks up at the first incomplete or invalidated phase.
| # | Phase | What it does |
|---|---|---|
| 1 | toolchain |
Detects required CLIs (node, uv, pnpm, zellij, jq, shellcheck, docker) and installs anything missing via the official installers. |
| 2 | uv_sync |
Runs uv sync --group pdf (and --group gpu when an NVIDIA GPU is detected). First run downloads ~2 GB (PyTorch + marker-pdf), plus ~600 MB of cuDNN/cuBLAS wheels on GPU hosts. |
| 3 | hardware_preflight |
Warn-only RAM / swap / CPU / GPU posture check. Suggests .env overrides for low-memory hosts (e.g. INGEST_WORKER_MAX_CONCURRENT_ACTIVITIES=2); never hard-fails. |
| 4 | env_bootstrap |
Creates .env from .env.example (or appends new keys to an existing one), generates per-deployment secrets (LITELLM_MASTER_KEY, MINIO_ROOT_PASSWORD, QDRANT_API_KEY, DB passwords, UI passwords). |
| 5 | app_hostnames |
Prompts for the public host the browser will use (default localhost; FQDN/IP for remote VMs - see Deploying CoMSES AgentSpace on a remote VM). Coherently writes CORS_ALLOWED_ORIGINS, MINIO_EXTERNAL_ENDPOINT, VITE_API_BASE_URL, VITE_WS_BASE_URL, VITE_HOST, and VITE_ALLOWED_HOSTS. RFC-1123-validates the input. |
| 6 | env_triage |
Detects and refuses to start when a sibling Temporal stack is already running on the same ports (7233 / 8080 / 9090 / 8085 / 16686). |
| 7 | provider_keys |
Probes every supported LLM provider (OpenAI, Anthropic, Groq, OpenRouter, xAI, Google, GPUStack), prompts for a key when none are alive, and asks whether embeddings should run remote (LiteLLM) or local (FastEmbed in-process). |
| 8 | marker_prewarm |
Pre-downloads marker-pdf layout / OCR / text-recognition models (~1.5 GB) into ~/.cache/huggingface/ so the first PDF ingest doesn't stall. |
| 9 | fastembed_prewarm |
Pre-downloads dense + sparse (BM42) embedding models locally. Dense is skipped when EMBEDDING_DENSE_PROVIDER=remote; sparse is always local. |
| 10 | docker_up |
Brings up the Temporal stack then the infra stack via docker compose up -d, then health-checks Postgres, Temporal, Redis, MinIO, Qdrant, and the LiteLLM proxy in order. |
| 11 | litellm_key |
Calls POST /key/generate against the running LiteLLM proxy to mint a virtual API key and writes it to LITELLM_PROXY_API_KEY in .env. |
| 12 | litellm_routing_probe |
Per-role smoke calls (smart / default / fast / long / embed) against the proxy. Hard-fails if no chat role responds 2xx or if embed returns no vector. |
| 13 | migrations |
Runs make db-check then make db-upgrade to bring the comses-rag-db schema to the latest Alembic head. |
| 14 | hosts_file |
Validates that the Docker DNS names workers connect to (minio, redis, qdrant, ollama, litellm-proxy, litellm-db, comses-rag-db) resolve from the host. If any are missing, offers [a]uto sudo / [m]anual / [s]kip to append 127.0.0.1 … to /etc/hosts. |
| 15 | workers |
Prompts you to start the 10-pane Zellij worker layout in a second terminal (make w) and polls each worker's metrics port (10090–10098) until ready. |
| 16 | sample_data |
Stages and ingests two bundled CoMSES codebases through the full pipeline (marker-pdf → fastembed → Qdrant + Postgres + MinIO). |
| 17 | dashboard |
Prints the final dashboard: service URLs + credentials, Temporal CLI hint, Zellij attach command, sample-data summary, and a "Try it" pointer at the configured host. |
| Service | URL | Credentials |
|---|---|---|
| Chat UI | http://localhost:5173 | API key dev-key-1 (from API_KEY_MAPPING in .env) |
| FastAPI | http://localhost:8000 | — |
| Temporal UI | http://localhost:8080 | — |
| Grafana | http://localhost:8085 | admin / $GRAFANA_ADMIN_PASSWORD |
| LiteLLM UI | http://localhost:4000/ui | admin / $LITELLM_PROXY_UI_PASSWORD |
| Jaeger | http://localhost:16686 | — |
| Prometheus | http://localhost:9090 | — |
| Qdrant dashboard | http://localhost:6333/dashboard | $QDRANT_API_KEY |
| MinIO Console | http://localhost:9001 | minio_admin / $MINIO_ROOT_PASSWORD |
| pgAdmin | http://localhost:8888 | $PGADMIN_DEFAULT_EMAIL / $PGADMIN_DEFAULT_PASSWORD |
| Databasus | http://localhost:4005 | — |
$VAR references are auto-generated values written into .env by the env-bootstrap phase — setup.sh also prints them once on completion. Look them up in .env, not here.
Temporal CLI
docker exec -it temporal-admin-tools temporal workflow listWorkers (Zellij)
zellij attach comses-workersSample data
Two actual models from the CoMSES Model Library are ingested on the first run of setup.sh:
761c91b8-897b-4e59-8b5f-83715d6c9471- MicroAnts 2.5dd847e79-bb37-43e1-ae3a-27de57573376- Ants Digging Networks
Try it
Open http://localhost:5173, log in with API key dev-key-1, and ask a multi-part question — e.g. "What ant-foraging models are in the library, and how do they differ?"
See deployment/README.md for the full recipe — SSH-tunnel mode (recommended for solo dev) and HTTPS-via-Caddy mode (for sharing a public demo URL).
make d # start infrastructure (Postgres, Qdrant, Redis, MinIO, Temporal, LiteLLM)
make w # start all 10 Temporal workers (Zellij layout) + the chat app (backend + frontend)
make k # stop infra
make kw # kill all workers + chat app
make test # unit tests (fast, mocked)
make test-integration # integration tests (PMR containers)
make check # ruff + mypy + deptry + qltyModule-specific develop notes live in the per-module READMEs: backend/, frontend/, shared/, shared/worker_base/.
Contributions are welcome.
- Temporal — the durable workflow engine that is the execution backbone of the ingestion workflows, agent runtime, every retrieval tool and the event-streaming outbox
- marker-pdf — layout-aware PDF parsing for academic model documentation
- Zellij — terminal multiplexer that hosts the 10-pane worker layout via
make w
This project is released under the MIT License.
⚠️ Caveat — GPL-3.0 dependency. The PDF ingestion pipeline depends onmarker-pdf(and its sub-dependencysurya-ocr), both of which are licensed under GPL-3.0-or-later. While this project's own source code is MIT-licensed, anyone distributing or running the combined application withmarker-pdflinked in is bound by GPL-3.0 obligations for that combined work.