Author: Tristen Pierson, BitConcepts Research
An empirical study of whether recursive generative stability depends more on directional calibration and epistemic filtering than on retrieval augmentation or generic decoding constraints.
The OEA (Ontology, Epistemic, Agentic) framework is a three-layer generation-time protocol tested across 4 language models (82M to 1.5B parameters) and 3 architecture families (GPT-2, GPT-Neo, Qwen). Key result: inverting the calibration signal degrades log-probability by -0.55 to -1.37 nats, while correct calibration improves it by +0.62 to +1.63 nats.
Read the paper on Academia.edu
Prerequisites: Python 3.11+ and pip.
pip install -r requirements-lock.txtRun all bigram experiments (about 2 minutes, no GPU needed):
bash scripts/run_all_experiments.shRun real LLM experiments (GPU recommended, 10-30 min per model):
python experiments/real_lm_experiment.py --model distilgpt2
python experiments/real_lm_experiment.py --model gpt2
python experiments/real_lm_experiment.py --model EleutherAI/gpt-neo-125M
python experiments/real_lm_experiment.py --model Qwen/Qwen2.5-1.5BCPU is supported with reduced config: add --n-seeds 3 --n-iterations 5 --gen-tokens 40.
Verify result integrity:
python experiments/verify_manifest.pyBuild the manuscript PDF (requires MiKTeX or TeX Live):
scripts/build_pdf.cmdSee REPRODUCE.md for the full step-by-step guide.
The experiment harness auto-detects the best available device (cuda > rocm > xpu > mps > cpu).
Use --device <backend> to override.
| Hardware | Install command | Test status |
|---|---|---|
| NVIDIA CUDA 12.1 | pip install torch==2.3.1+cu121 --index-url https://download.pytorch.org/whl/cu121 |
✅ Verified (RTX 4070 SUPER, Win 11) |
| NVIDIA CUDA 12.4+ | pip install torch --index-url https://download.pytorch.org/whl/cu124 |
✅ Verified |
| CPU only | pip install torch --index-url https://download.pytorch.org/whl/cpu |
✅ Verified |
| AMD ROCm 6.x | pip install torch --index-url https://download.pytorch.org/whl/rocm6.3 |
|
| Intel Arc / Xe XPU | pip install torch --index-url https://download.pytorch.org/whl/xpu |
|
| Apple Silicon (MPS) | pip install torch (macOS 13+, auto-detected) |
CI note: GPU paths are not tested in CI — GitHub-hosted runners have no GPU hardware. Only CPU-based unit tests and the LaTeX compile run automatically. If you run on ROCm, XPU, or MPS, please report your result (pass or fail) using the Hardware Compatibility template.
| Image | GPU | Status | Build command |
|---|---|---|---|
Dockerfile |
CPU only | ✅ Verified | docker build -t oea-framework . |
Dockerfile.cuda |
NVIDIA CUDA 12.1 | ✅ Verified | docker build -f Dockerfile.cuda -t oea-framework-cuda . |
Dockerfile.rocm |
AMD ROCm 6.x | docker build -f Dockerfile.rocm -t oea-framework-rocm . |
|
Dockerfile.xpu |
Intel Arc / Xe XPU | docker build -f Dockerfile.xpu -t oea-framework-xpu . |
|
| Apple MPS | ❌ Not Docker-compatible | N/A — use native install | — |
ROCm requires --device /dev/kfd --device /dev/dri --group-add render --group-add video at runtime (Linux only).
XPU requires --device /dev/dri at runtime (Linux only).
For Apple Silicon, install natively — MPS is not accessible from inside Docker containers.
Report ROCm/XPU/MPS results via the Hardware Compatibility template.
arxiv/
main.tex LaTeX manuscript (14 pages)
references.bib 13 verified citations
figures/ 3 publication figures
experiments/
credibility_suite.py Bigram-proxy ablation harness (12 variants)
real_lm_experiment.py Real LLM recursive stability experiment
baseline_competition.py OEA vs 5 non-OEA controls
recursive_memory_drift.py 30-step recursive memory benchmark
generate_figures.py Generates all publication figures
verify_manifest.py SHA-256 artifact integrity checker
manifest.json Hashes for all committed results
data/ Public-domain corpora
results/ Committed experiment artifacts
scripts/ Setup, build, and run scripts
tests/ 12 unit tests (pytest)
REPRODUCE.md Step-by-step reproduction guide
Dockerfile CPU reproducibility container
Dockerfile.cuda NVIDIA CUDA 12.1 GPU container (verified)
Dockerfile.rocm AMD ROCm 6.x GPU container (community-tested)
Dockerfile.xpu Intel Arc / Xe XPU container (community-tested)
| Experiment | What it tests | Runtime |
|---|---|---|
| Credibility suite | 12-variant ablation, 648 runs each | ~90s (CPU) |
| Real LLM validation | 4 models, 4 variants, 10 seeds x 10 iterations | ~10-30 min/model (GPU) |
| Memory drift | 30-step recursive summarization, 20 seeds | ~5s (CPU) |
| Baseline competition | OEA vs temperature, top-k, entropy, repetition, RAG-only | ~5s (CPU) |
- Log-probability — mean per-token log-prob under frozen reference model (primary metric)
- ROUGE-L recall — seed-corpus content preservation (independent of log-prob)
- JSD — Jensen-Shannon divergence from seed distribution
- TRR / FRR — true/false rejection rates for out-of-vocabulary token detection
@misc{pierson2026oea,
title={OEA: Structured Recursive Calibration for Generative Stability},
author={Pierson, Tristen},
year={2026},
howpublished={https://github.com/BitConcepts/oea-framework-paper}
}Code: MIT | Paper: CC BY 4.0