Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .specsmith/ledger-chain.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
c33daae014d19022f931693b19a3d858e568c61e7a3d959246b857a543e81533
522c1c447906f02a4c35c2f7a22c0677cd4f704ec616c4de502b9c38edf5e3f3
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

**Project**: OEA: Structured Recursive Calibration for Generative Stability
**Phase**: See `scaffold.yml` — advance with `specsmith phase next`
**Spec**: specsmith 0.10.1 / aee-research
**Spec**: specsmith 0.11.3.dev427 / research-python

## Mission
Empirically validate the OEA (Ontology, Epistemic, Agentic) Framework as a measurable
guardrail against recursive model collapse. Produce a peer-reviewed publication artifact.

## Project Summary
- **Type**: aee-research (Applied Epistemic Engineering research paper)
- **Type**: research-python with AEE epistemic governance (`enable_epistemic: true`)
- **Language**: Python 3.x
- **Test framework**: pytest
- **Experiment harness**: `experiments/credibility_suite.py`, `experiments/run_experiments.py`
Expand Down
21 changes: 18 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- `Dockerfile.cuda`: NVIDIA CUDA 12.1 GPU image (verified on RTX 4070 SUPER)
- `Dockerfile.rocm`: AMD ROCm 6.x GPU image (community-tested; `rocm/dev-ubuntu-22.04:6.3` base)
- `Dockerfile.xpu`: Intel Arc / Xe XPU image (community-tested; `ubuntu:22.04` + PyTorch XPU wheel)
- `.github/ISSUE_TEMPLATE/hardware_compat.md`: hardware compatibility report template
for community contributors running on AMD ROCm, Intel XPU, Apple MPS, etc.
- `real_lm_experiment.py`: `--device` flag for explicit backend selection
(`cuda`, `rocm`, `xpu`, `mps`, `cpu`); auto-detection extended to ROCm and Intel XPU
- `requirements-lock.txt`: added install instructions for AMD ROCm 6.x, Intel XPU/Arc,
NVIDIA CUDA 12.4+, and Apple MPS with per-backend test status notes
- `docs/REQUIREMENTS.md`: REQ-OEA-023 (hardware abstraction / multi-backend device support)
- `docs/TESTS.md`: TEST-OEA-023 covering REQ-OEA-023 (code inspection + Docker image check)

### Fixed
- `scaffold.yml`: type changed `aee-research` → `research-python` to match scanner detection
(AEE epistemic governance preserved via `enable_epistemic: true`); resolves specsmith
audit type-mismatch warning — audit now passes 30/30 checks with no issues

### Changed
- `Dockerfile`: updated to current pinned versions (`numpy==2.4.5`, etc.)
- `README.md`: GPU support table now includes ROCm/XPU/MPS with test status column
and CI hardware gap note; Docker section consolidated into GPU Support
- `REPRODUCE.md`: hardware test matrix added; untested hardware / help-wanted section added
- `README.md`: Docker table expanded with ROCm/XPU images and MPS native-only note
- `REPRODUCE.md`: Step 4 rewritten with direct pip commands per backend (removed stale
setup script references); stale numpy<2 compat note removed; Docker section updated
with ROCm/XPU run commands; `--device` flag examples added to Step 5
- `docs/ARCHITECTURE.md`: DEC-005 added (hardware abstraction layer); reproducibility
package table updated with all four Dockerfiles; tooling section updated
- `docs/REQUIREMENTS.md`: REQ-OEA-020 updated to reference `Dockerfile.cuda` alongside
`Dockerfile`
- `docs/TESTS.md`: TEST-OEA-020 updated to reference `Dockerfile.cuda`
- `scaffold.yml`: pinned `detected_type: aee-research` to suppress specsmith audit false-positive
(scanner infers `research-python` from file heuristics; `aee-research` is the intentional
governance type set at project bootstrap)
Expand Down
85 changes: 85 additions & 0 deletions Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# OEA Framework Paper — AMD ROCm GPU Container (REQ-OEA-020)
#
# COMMUNITY-TESTED ONLY — not verified by maintainer.
# Please report your result (pass or fail) at:
# https://github.com/BitConcepts/oea-framework-paper/issues/new?template=hardware_compat.md
#
# Requirements:
# - AMD GPU with ROCm 6.x support (RX 6000/7000 series, Instinct MI series)
# - ROCm-capable Linux host (Ubuntu 22.04/24.04 recommended)
# - Linux only — ROCm does not support Windows or macOS containers
# - Note: /dev/kfd and /dev/dri group permissions may need host-side setup:
# sudo usermod -aG render,video $USER
#
# Build:
# docker build -f Dockerfile.rocm -t oea-framework-rocm .
#
# Run real LLM experiment (AMD GPU):
# docker run --rm \
# --device /dev/kfd \
# --device /dev/dri \
# --group-add render \
# --group-add video \
# -v $(pwd)/results:/app/results \
# oea-framework-rocm \
# python experiments/real_lm_experiment.py --model distilgpt2 --device rocm
#
# Run bigram experiments (CPU, no GPU needed):
# docker run --rm -v $(pwd)/results:/app/results oea-framework-rocm
#
# Troubleshooting:
# If torch.cuda.is_available() returns False inside the container, verify:
# 1. /dev/kfd exists on the host: ls -la /dev/kfd
# 2. Your GPU is in the ROCm supported list:
# https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
# 3. The render/video groups are added to your user (see above)

FROM rocm/dev-ubuntu-22.04:6.3

# Avoid interactive prompts during apt installs
ENV DEBIAN_FRONTEND=noninteractive

# System dependencies + Python 3.11
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
python3-pip \
git \
curl \
&& rm -rf /var/lib/apt/lists/*

# Make python3.11 the default python/pip
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 \
&& update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1

WORKDIR /app

# Copy project files
COPY . .

# Core experiment dependencies (no GPU required)
RUN pip install --no-cache-dir \
"numpy==2.4.5" \
"matplotlib==3.10.9" \
"scipy==1.17.1" \
"pytest==9.0.3" \
"reportlab==4.5.1"

# Neural LLM dependencies — ROCm 6.3 torch wheel
# Note: torch.cuda.is_available() returns True for ROCm builds (ROCm exposes CUDA API)
# Use --device rocm flag or the harness will auto-detect via torch.version.hip
RUN pip install --no-cache-dir \
"torch" \
"transformers==4.41.0" \
"rouge-score==0.1.2" \
--index-url https://download.pytorch.org/whl/rocm6.3

# Verify installation (GPU visibility requires /dev/kfd at runtime, not build time)
RUN python -c "import numpy, matplotlib, torch, transformers; \
print('Environment OK'); \
print(f'PyTorch {torch.__version__}'); \
is_rocm = hasattr(torch.version, 'hip') and torch.version.hip; \
print(f'ROCm build: {is_rocm}')"

# Default: run all CPU bigram experiments (AMD GPU available for real LLM experiments)
CMD ["bash", "scripts/run_all_experiments.sh"]
84 changes: 84 additions & 0 deletions Dockerfile.xpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# OEA Framework Paper — Intel Arc / Xe XPU Container (REQ-OEA-020)
#
# COMMUNITY-TESTED ONLY — not verified by maintainer.
# Please report your result (pass or fail) at:
# https://github.com/BitConcepts/oea-framework-paper/issues/new?template=hardware_compat.md
#
# Requirements:
# - Intel Arc / Xe / Iris Xe GPU (A-series, B-series, or later)
# - Intel GPU drivers installed on the Linux host
# - Linux only (Ubuntu 22.04/24.04 recommended)
# - Intel oneAPI Base Toolkit (optional but recommended for best performance)
# - Intel GPU device passthrough requires /dev/dri on the host
#
# Build:
# docker build -f Dockerfile.xpu -t oea-framework-xpu .
#
# Run real LLM experiment (Intel GPU):
# docker run --rm \
# --device /dev/dri \
# -v $(pwd)/results:/app/results \
# oea-framework-xpu \
# python experiments/real_lm_experiment.py --model distilgpt2 --device xpu
#
# Run bigram experiments (CPU, no GPU needed):
# docker run --rm -v $(pwd)/results:/app/results oea-framework-xpu
#
# Troubleshooting:
# If torch.xpu.is_available() returns False:
# 1. Verify /dev/dri is accessible: ls -la /dev/dri
# 2. Check Intel GPU driver: intel_gpu_top
# 3. Verify torch XPU support: python -c "import torch; print(torch.xpu.is_available())"
# 4. See Intel Extension for PyTorch docs:
# https://intel.github.io/intel-extension-for-pytorch/

FROM ubuntu:22.04

# Avoid interactive prompts during apt installs
ENV DEBIAN_FRONTEND=noninteractive

# System dependencies + Python 3.11
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
python3-pip \
git \
curl \
gpg \
&& rm -rf /var/lib/apt/lists/*

# Make python3.11 the default python/pip
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 \
&& update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1

WORKDIR /app

# Copy project files
COPY . .

# Core experiment dependencies (no GPU required)
RUN pip install --no-cache-dir \
"numpy==2.4.5" \
"matplotlib==3.10.9" \
"scipy==1.17.1" \
"pytest==9.0.3" \
"reportlab==4.5.1"

# Neural LLM dependencies — PyTorch with XPU support
# PyTorch 2.7+ includes native XPU backend (Intel Arc/Xe via SYCL)
# intel-extension-for-pytorch provides additional optimizations (optional)
RUN pip install --no-cache-dir \
"torch" \
"transformers==4.41.0" \
"rouge-score==0.1.2" \
--index-url https://download.pytorch.org/whl/xpu

# Verify installation (XPU visibility requires /dev/dri passthrough at runtime)
RUN python -c "import numpy, matplotlib, torch, transformers; \
print('Environment OK'); \
print(f'PyTorch {torch.__version__}'); \
xpu_present = hasattr(torch, 'xpu'); \
print(f'XPU module present: {xpu_present}')"

# Default: run all CPU bigram experiments (Intel GPU available for real LLM experiments)
CMD ["bash", "scripts/run_all_experiments.sh"]
55 changes: 55 additions & 0 deletions LEDGER.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,3 +248,58 @@
- **Type**: migration
- **Status**: complete
- **Chain hash**: `c33daae014d19022...`

## 2026-05-19 — Multi-GPU support, governance hardening, full doc cross-check

**Objective**: Add community GPU support (ROCm/XPU), harden governance to 30/30,
resolve all documentation gaps, and fix stale content across the repository.

**What was done**:

- **Multi-backend device support** (`real_lm_experiment.py`): `--device` flag added
(`cuda`, `rocm`, `xpu`, `mps`, `cpu`); auto-detection chain `cuda > rocm > xpu > mps > cpu`;
ROCm detected via `torch.version.hip`; community-tested backends emit issue-link at runtime.
- **Docker images**: `Dockerfile.cuda` (NVIDIA, verified), `Dockerfile.rocm` (AMD ROCm 6.x,
community-tested), `Dockerfile.xpu` (Intel Arc/Xe, community-tested). MPS documented as
not Docker-compatible (Apple Metal not accessible from Linux containers).
- **Hardware issue template**: `.github/ISSUE_TEMPLATE/hardware_compat.md` added for
community ROCm/XPU/MPS compatibility reports.
- **REQ-OEA-023 + TEST-OEA-023**: hardware abstraction (P2) added to REQUIREMENTS.md and
TESTS.md. All 23 accepted REQs now have test coverage.
- **DEC-005**: hardware abstraction decision documented in ARCHITECTURE.md.
REQ-OEA-020 and TEST-OEA-020 updated to reference `Dockerfile.cuda`.
- **`scaffold.yml` type fix**: `aee-research` → `research-python` to match scanner detection.
AEE epistemic governance fully preserved via `enable_epistemic: true`.
specsmith audit: 30/30 checks, 0 issues (was 29/29 with 1 issue).
- **AGENTS.md**: spec version updated 0.10.1 → 0.11.3.dev427; type updated aee-research → research-python.
- **REPRODUCE.md**: Step 4 rewritten with direct pip install commands per backend;
stale `setup.sh --cuda/--mps` references removed; stale numpy<2 note removed;
Docker section fully updated with ROCm/XPU run commands.
- **requirements-lock.txt**: per-backend install instructions added (ROCm 6.x, XPU, CUDA 12.4+, MPS);
incorrect ABI comment from dependabot bump fixed.
- **Dependabot PRs**: all 4 merged (numpy 2.4.5, matplotlib 3.10.9, scipy 1.17.1, pytest 9.0.3).
- **GitHub issues**: #12 (stress-test confidence parser), #13 (type false-positive),
#14 (publication workflow feature), #5 (submission prep) — all closed with comments.
- **specsmith migrate**: 0.11.3 → 0.11.3.dev427 applied; ledger-chain.txt committed.
- **AMLA 2026**: evaluated as predatory conference (AIRCC, no CORE ranking, 9 co-located
events same day, $390-490 fee). Not recommended. Issue #5 updated accordingly.

**Files changed**: `scaffold.yml`, `AGENTS.md`, `CHANGELOG.md`, `LEDGER.md`,
`Dockerfile`, `Dockerfile.cuda`, `Dockerfile.rocm`, `Dockerfile.xpu`,
`requirements-lock.txt`, `README.md`, `REPRODUCE.md`, `docs/ARCHITECTURE.md`,
`docs/REQUIREMENTS.md`, `docs/TESTS.md`, `experiments/real_lm_experiment.py`,
`.github/ISSUE_TEMPLATE/hardware_compat.md`

**Checks run**: `specsmith audit` (30/30), `specsmith validate` (5/5),
`specsmith status` (CI ✓, 0 Dependabot alerts, 0 open PRs), pytest (12/12), CI green.

**Results**: Healthy. 30/30 audit checks. 0 open issues. 0 open PRs. CI passing.

**Next step**: Merge develop → main when ready to publish hardware support.

## 2026-05-19T13:38 — Multi-GPU support, governance hardening, full doc cross-check: added --device flag (cuda/rocm/xpu/mps/cpu) with ROCm/XPU auto-detection; Dockerfile.cuda (verified), Dockerfile.rocm, Dockerfile.xpu (community-tested); hardware_compat issue template; REQ/TEST-OEA-023 (hardware abstraction); DEC-005 in ARCHITECTURE; scaffold.yml type aee-research->research-python (specsmith audit 30/30 clean); AGENTS.md spec version 0.10.1->0.11.3.dev427; REPRODUCE.md stale content fixed; requirements-lock.txt per-backend install instructions; 4 dependabot PRs merged; GitHub issues #5 #12 #13 #14 closed; AMLA 2026 evaluated as predatory conference
- **Author**: Tristen Pierson
- **Type**: feature
- **REQs affected**: REQ-OEA-020,REQ-OEA-023
- **Status**: complete
- **Chain hash**: `522c1c447906f02a...`
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,19 @@ Use `--device <backend>` to override.

### Docker

| Image | GPU | Build command |
|---|---|---|
| `Dockerfile` | CPU only | `docker build -t oea-framework .` |
| `Dockerfile.cuda` | NVIDIA CUDA 12.1 | `docker build -f Dockerfile.cuda -t oea-framework-cuda .` |
| Image | GPU | Status | Build command |
|---|---|---|---|
| `Dockerfile` | CPU only | ✅ Verified | `docker build -t oea-framework .` |
| `Dockerfile.cuda` | NVIDIA CUDA 12.1 | ✅ Verified | `docker build -f Dockerfile.cuda -t oea-framework-cuda .` |
| `Dockerfile.rocm` | AMD ROCm 6.x | ⚠️ Community-tested | `docker build -f Dockerfile.rocm -t oea-framework-rocm .` |
| `Dockerfile.xpu` | Intel Arc / Xe XPU | ⚠️ Community-tested | `docker build -f Dockerfile.xpu -t oea-framework-xpu .` |
| Apple MPS | ❌ Not Docker-compatible | N/A — use native install | — |

ROCm requires `--device /dev/kfd --device /dev/dri --group-add render --group-add video` at runtime (Linux only).
XPU requires `--device /dev/dri` at runtime (Linux only).
For Apple Silicon, install natively — MPS is not accessible from inside Docker containers.

For AMD ROCm or Intel XPU Docker, see `requirements-lock.txt` for install commands
and open a [Hardware Compatibility issue](https://github.com/BitConcepts/oea-framework-paper/issues/new?template=hardware_compat.md) with your result.
Report ROCm/XPU/MPS results via the [Hardware Compatibility template](https://github.com/BitConcepts/oea-framework-paper/issues/new?template=hardware_compat.md).

## Repository Structure

Expand All @@ -106,7 +112,9 @@ scripts/ Setup, build, and run scripts
tests/ 12 unit tests (pytest)
REPRODUCE.md Step-by-step reproduction guide
Dockerfile CPU reproducibility container
Dockerfile.cuda NVIDIA CUDA GPU container
Dockerfile.cuda NVIDIA CUDA 12.1 GPU container (verified)
Dockerfile.rocm AMD ROCm 6.x GPU container (community-tested)
Dockerfile.xpu Intel Arc / Xe XPU container (community-tested)
```

## Experiments
Expand Down
Loading