An open-source, AI-powered research infrastructure for sociology, psychology, environmentalism, and neuroscience.
Built by Optimal Living Systems — a mutual aid nonprofit facilitating projects that help humanity.
A research system that combines a fine-tuned AI model with domain-specific knowledge bases to perform rigorous literature reviews, methodology analysis, evidence synthesis, and research gap identification across four scientific domains.
Architecture: One research-methodology model (Qwen 2.5 14B, QLoRA fine-tuned) + four domain-specific RAG collections in LanceDB, orchestrated by Kestra.
Why one model instead of four? Our domains deeply overlap (neuroscience ↔ psychology, sociology ↔ environmentalism). A single model trained on how to do research paired with domain-specific retrieval produces better cross-domain synthesis, costs 75% less compute, and updates instantly when new papers are added.
| Domain | Focus Areas | OLS Project Alignment |
|---|---|---|
| Psychology | Self-Determination Theory, intrinsic motivation, community psychology, well-being | PersonalLLM, OVNN |
| Sociology | Mutual aid, community organizing, commons governance, participatory democracy | CommunityLLM, DirectDemocracyLLM |
| Environmentalism | Degrowth, ecological economics, planetary boundaries, environmental justice | OVNN ecological dimensions |
| Neuroscience | Social neuroscience, motivation circuits, stress/resilience, decision-making | HBoK biological grounding |
┌─────────────────────────────────────────────┐
│ KESTRA ORCHESTRATOR │
│ (Schedules, triggers, monitors, logs) │
└──────────┬─────────────────────┬────────────┘
│ │
┌─────▼──────┐ ┌─────▼──────┐
│ DATA LAYER │ │MODEL LAYER │
│ │ │ │
│ LanceDB │◄─────►│ Qwen 2.5 │
│ 4 domain │ │ 14B QLoRA │
│ collections│ │ fine-tuned │
│ + cross- │ │ for research│
│ domain idx │ │ methodology│
│ │ │ │
│ BGE-M3 │ │ Reranker │
│ embeddings │ │ │
└─────────────┘ └────────────┘
│
┌─────▼───────────────────────────┐
│ DATA SOURCES │
│ OpenAlex · Semantic Scholar │
│ PubMed/PMC · PsyArXiv │
│ SocArXiv · EarthArXiv │
└────────────────────────────────-┘
The fine-tuned model is trained on research methodology, not domain facts. It learns 8 skills:
- Paper Analysis — Extract hypothesis, methodology, sample, findings, limitations
- Methodology Critique — Identify confounds, sampling issues, statistical concerns
- Evidence Synthesis — Synthesize findings across multiple studies
- Research Gap Identification — Find unstudied areas and missing methodologies
- Statistical Reasoning — Interpret effect sizes, p-values, regression tables
- Structured Data Extraction — Pull consistent structured data from papers
- Tool Use Decisions — Know when to search, retrieve, calculate, or clarify
- Citation-Grounded Response — Never make unsupported claims, always cite sources
Domain knowledge comes from RAG at query time — updated instantly when new papers are added.
| Component | Tool | Why |
|---|---|---|
| Orchestration | Kestra | YAML-first, observable, plugin ecosystem |
| Vector Database | LanceDB | Embedded, open source, Apache 2.0 |
| Embeddings | BGE-M3 | Best quality/speed balance for academic text |
| Base Model | Qwen 2.5 14B Instruct | Native tool calling, 128K context, Apache 2.0 |
| Fine-Tuning | Unsloth + QLoRA | 2-5x faster, lower memory |
| Data Sources | OpenAlex, Semantic Scholar, PubMed/PMC | Free, open, comprehensive |
| GPU (training) | RunPod (rented A100) | Cost-effective on-demand |
| GPU (inference) | Modal (serverless) | Pay-per-second, scales to zero |
- Architecture design and planning
- Kestra deployment
- Data collection pipelines
- RAG infrastructure (LanceDB + BGE-M3)
- Training data generation
- Model fine-tuning
- Agent integration
- End-to-end research workflows
- Documentation and OSF registration
research-system/
├── docs/ # Architecture, guides, methodology
├── kestra-flows/ # All Kestra workflow YAML files
│ ├── collection/ # Data collection from APIs
│ ├── processing/ # Chunking, embedding, indexing
│ ├── training/ # Training data gen + fine-tuning
│ ├── research/ # Literature review, analysis flows
│ └── maintenance/ # Backups, updates, monitoring
├── schemas/ # LanceDB, training data, eval schemas
├── prompts/ # Prompt templates for training data generation
├── evaluation/ # Test sets and evaluation results
├── scripts/ # Setup and utility scripts
└── data/ # Data storage (not in git, see data/README.md)
This project is registered with the Center for Open Science. All methodology is pre-registered, all data pipelines are reproducible, and all results are published openly.
- Code: AGPL-3.0 (ensures derivatives stay open)
- Documentation: CC-BY-SA-4.0
- Training Data: CC-BY-4.0
- Model Weights: Apache 2.0
- Human Body of Knowledge (HBoK) — Personal knowledge architecture
- OVNN — Optimal Value Neural Network
- CommunityLLM — Politically neutral community organizing AI
- DirectDemocracyLLM — Participatory democracy tools
We welcome contributions from researchers, data scientists, librarians, information architects, and anyone passionate about open science. See CONTRIBUTING.md.
- Organization: Optimal Living Systems (501(c)(3) nonprofit)
- Email: research@optimallivingsystems.org
- Mission: Building AI infrastructure that supports human autonomy rather than capital extraction.