A constitutional alignment framework for AI based on non-weighable axioms, jurisprudential interpretation, and execution constraints.
This repository contains the canonical constitution, jurisprudence cases, an execution protocol layer, and evaluation material used to explore constitutional constraint architectures for AI alignment.
Most AI alignment approaches optimise between weighted ethical rules or reward signals.
The Universal Constitution explores a different architecture:
- Fixed axioms that are not optimised or re-weighted
- Interpretation through jurisprudence
- Execution constraints to expose known failure modes
- Drift detection via case testing
The goal is to investigate whether such a structure can produce more stable and reliable reasoning in AI systems.
The project consists of four layers:
-
Canon
Immutable ethical axioms -
Jurisprudence
Case interpretations clarifying application -
Protocol (Execution Layer)
Pre-answer integrity constraints -
Evaluation
Test protocols used to measure alignment behaviour
Each layer has a strictly defined role:
- Canon → defines truth and principles
- Jurisprudence → defines failure modes and interpretation
- Protocol → ensures pre-answer integrity is evaluated and made visible
- Evaluation → measures behaviour and detects drift
The protocol layer is explicitly subordinate to the Canon
(see Case 0000 — Canonical Supremacy)
and may not redefine or override it.
/canon
Canonical text of the Universal Constitution 2.0
/jurisprudence
Case interpretations that clarify how the constitution should be applied
/jurisprudence/cases
Individual case files
/jurisprudence/dossiers
Extended discussion and development history of cases
/implementation/protocol
Execution protocol defining pre-answer constraints
/implementation/evaluation
Test protocols and benchmark scenarios used to evaluate AI behaviour
/whitepaper
Research paper describing the constitutional alignment architecture
The constitution itself is intentionally immutable.
Changes to interpretation occur through jurisprudence rather than modification of the canonical text.
This prevents silent doctrinal drift.
The protocol layer introduces pre-answer integrity constraints.
These ensure that before an answer is given:
- context is sufficient or explicitly incomplete (Case 0010)
- certainty is correctly calibrated (Case 0012)
- user intent is preserved (Case 0011)
- interpretations are transparent (Case 0007)
- critical failure modes are surfaced (Case 0013)
- unnecessary expansion is avoided
The protocol:
- does not determine whether an answer is allowed
- does not replace reasoning
- does not override the Canon
Instead, it ensures:
that any limitations, assumptions, or conflicts
are made explicit in the answer.
Due to model context limitations, alignment must be loaded in separate prompts.
Load the full canonical constitution.
Load the relevant cases (or full set if required).
Load a short version of the protocol only.
Example:
- If context is missing, state it or ask
- Calibrate certainty (do not present guesses as facts)
- Do not change the user’s goal
- State assumptions when needed
- Include only critical failure modes
- Do not add unnecessary information
- If constraints cannot be satisfied, make that explicit
Apply implicitly.
- Canon must be loaded complete and unchanged
- Jurisprudence must be loaded complete or intentionally selected
- Only the Protocol may be compressed or optimized
Failure to separate these may cause:
- context overflow
- incomplete loading
- alignment degradation
Canonical architecture:
Canon
↓
Interpretation
↓
Jurisprudence
↓
Protocol (integrity exposure layer)
↓
Output
Evaluation operates across all layers.
The execution protocol is currently not reliably enforceable within a single model run.
A single LLM instance may:
- skip or partially apply constraints
- answer before evaluation is complete
- fail to surface violations
As a result:
The protocol improves behaviour,
but does not guarantee enforcement.
Reliable enforcement requires architectural separation, such as:
- a generator model producing answers
- a separate checker validating them against the protocol and cases
- a gating mechanism that can reject invalid outputs
Without this:
protocol compliance remains probabilistic.
A key open question is:
how to transform protocol-based integrity checks
into enforceable guarantees without overriding the Canon
This includes:
- multi-pass validation
- explicit violation surfacing
- case-referenced verification
Structured evaluation protocols are included in:
/implementation/evaluation
These allow:
- comparison between standard AI and constitutional AI
- detection of failure modes
- validation of protocol effectiveness
This project explores the hypothesis that a constitutional constraint architecture can produce more stable alignment than optimisation-based ethical systems.
Key idea:
Alignment is not achieved by better answers alone,
but by ensuring the conditions under which answers are produced
remain transparent and structurally sound.
The canonical constitution is released under:
Creative Commons Attribution–NoDerivatives 4.0 International (CC BY-ND 4.0)
This allows sharing while preserving the integrity of the canonical text.
Alternative frameworks may be created under different names.
Jelbert Holtrop
Research project exploring constitutional AI alignment, jurisprudential drift detection, and execution-constrained reasoning.