OLS Open Science Research System

An open-source, AI-powered research infrastructure for sociology, psychology, environmentalism, and neuroscience.

Built by Optimal Living Systems — a mutual aid nonprofit facilitating projects that help humanity.

What This Is

A research system that combines a fine-tuned AI model with domain-specific knowledge bases to perform rigorous literature reviews, methodology analysis, evidence synthesis, and research gap identification across four scientific domains.

Architecture: One research-methodology model (Qwen 2.5 14B, QLoRA fine-tuned) + four domain-specific RAG collections in LanceDB, orchestrated by Kestra.

Why one model instead of four? Our domains deeply overlap (neuroscience ↔ psychology, sociology ↔ environmentalism). A single model trained on how to do research paired with domain-specific retrieval produces better cross-domain synthesis, costs 75% less compute, and updates instantly when new papers are added.

Research Domains

Domain	Focus Areas	OLS Project Alignment
Psychology	Self-Determination Theory, intrinsic motivation, community psychology, well-being	PersonalLLM, OVNN
Sociology	Mutual aid, community organizing, commons governance, participatory democracy	CommunityLLM, DirectDemocracyLLM
Environmentalism	Degrowth, ecological economics, planetary boundaries, environmental justice	OVNN ecological dimensions
Neuroscience	Social neuroscience, motivation circuits, stress/resilience, decision-making	HBoK biological grounding

System Architecture

┌─────────────────────────────────────────────┐
│            KESTRA ORCHESTRATOR              │
│  (Schedules, triggers, monitors, logs)      │
└──────────┬─────────────────────┬────────────┘
           │                     │
     ┌─────▼──────┐       ┌─────▼──────┐
     │  DATA LAYER │       │MODEL LAYER │
     │             │       │            │
     │  LanceDB    │◄─────►│ Qwen 2.5   │
     │  4 domain   │       │ 14B QLoRA  │
     │  collections│       │ fine-tuned │
     │  + cross-   │       │ for research│
     │  domain idx │       │ methodology│
     │             │       │            │
     │  BGE-M3     │       │ Reranker   │
     │  embeddings │       │            │
     └─────────────┘       └────────────┘
           │
     ┌─────▼───────────────────────────┐
     │         DATA SOURCES            │
     │  OpenAlex · Semantic Scholar    │
     │  PubMed/PMC · PsyArXiv         │
     │  SocArXiv · EarthArXiv         │
     └────────────────────────────────-┘

What the Model Learns

The fine-tuned model is trained on research methodology, not domain facts. It learns 8 skills:

Paper Analysis — Extract hypothesis, methodology, sample, findings, limitations
Methodology Critique — Identify confounds, sampling issues, statistical concerns
Evidence Synthesis — Synthesize findings across multiple studies
Research Gap Identification — Find unstudied areas and missing methodologies
Statistical Reasoning — Interpret effect sizes, p-values, regression tables
Structured Data Extraction — Pull consistent structured data from papers
Tool Use Decisions — Know when to search, retrieve, calculate, or clarify
Citation-Grounded Response — Never make unsupported claims, always cite sources

Domain knowledge comes from RAG at query time — updated instantly when new papers are added.

Tech Stack

Component	Tool	Why
Orchestration	Kestra	YAML-first, observable, plugin ecosystem
Vector Database	LanceDB	Embedded, open source, Apache 2.0
Embeddings	BGE-M3	Best quality/speed balance for academic text
Base Model	Qwen 2.5 14B Instruct	Native tool calling, 128K context, Apache 2.0
Fine-Tuning	Unsloth + QLoRA	2-5x faster, lower memory
Data Sources	OpenAlex, Semantic Scholar, PubMed/PMC	Free, open, comprehensive
GPU (training)	RunPod (rented A100)	Cost-effective on-demand
GPU (inference)	Modal (serverless)	Pay-per-second, scales to zero

Project Status

Repository Structure

research-system/
├── docs/                  # Architecture, guides, methodology
├── kestra-flows/          # All Kestra workflow YAML files
│   ├── collection/        # Data collection from APIs
│   ├── processing/        # Chunking, embedding, indexing
│   ├── training/          # Training data gen + fine-tuning
│   ├── research/          # Literature review, analysis flows
│   └── maintenance/       # Backups, updates, monitoring
├── schemas/               # LanceDB, training data, eval schemas
├── prompts/               # Prompt templates for training data generation
├── evaluation/            # Test sets and evaluation results
├── scripts/               # Setup and utility scripts
└── data/                  # Data storage (not in git, see data/README.md)

Open Science Commitment

This project is registered with the Center for Open Science. All methodology is pre-registered, all data pipelines are reproducible, and all results are published openly.

Code: AGPL-3.0 (ensures derivatives stay open)
Documentation: CC-BY-SA-4.0
Training Data: CC-BY-4.0
Model Weights: Apache 2.0

Related OLS Projects

Human Body of Knowledge (HBoK) — Personal knowledge architecture
OVNN — Optimal Value Neural Network
CommunityLLM — Politically neutral community organizing AI
DirectDemocracyLLM — Participatory democracy tools

Contributing

We welcome contributions from researchers, data scientists, librarians, information architects, and anyone passionate about open science. See CONTRIBUTING.md.

Contact

Organization: Optimal Living Systems (501(c)(3) nonprofit)
Email: research@optimallivingsystems.org
Mission: Building AI infrastructure that supports human autonomy rather than capital extraction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OLS Open Science Research System

What This Is

Research Domains

System Architecture

What the Model Learns

Tech Stack

Project Status

Repository Structure

Open Science Commitment

Related OLS Projects

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
docs		docs
evaluation		evaluation
kestra-flows		kestra-flows
prompts		prompts
schemas		schemas
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

OLS Open Science Research System

What This Is

Research Domains

System Architecture

What the Model Learns

Tech Stack

Project Status

Repository Structure

Open Science Commitment

Related OLS Projects

Contributing

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages