Version: 1.1.0 Status: Stable Author: Seuriin
This document defines the Helix Base-3 Rotating Trellis, a constrained coding scheme for mapping arbitrary binary data onto DNA strands. The encoding guarantees GC-content balancing and Homopolymer prevention (Run-Length Limited constraint
Input binary data is treated as a stream of bytes. Each byte (8 bits) is decomposed into 6 trits (base-3 digits) to maximize density while respecting the coding constraint (
Algorithm:
Given a byte
$t_0 = B \pmod 3$ $B_1 = \lfloor B / 3 \rfloor$ - Repeat until 6 trits are generated.
Note
The core encoder is a Mealy Machine where the output depends on the current state (Previous Base) and the input (Current Trit).
-
State Space (
$S$ ):${A, C, G, T}$ -
Input Alphabet (
$I$ ):${0, 1, 2}$ -
Transition Function (
$\delta$ ):$S \times I \rightarrow S$
Mapping bases to
The transition function is defined as:
| Previous Base ( |
Input Trit ( |
Computation | Next Base ( |
|---|---|---|---|
| A (0) | 0 | C | |
| A (0) | 1 | G | |
| A (0) | 2 | T | |
| C (1) | 0 | G | |
| C (1) | 1 | T | |
| C (1) | 2 | A | |
| G (2) | 0 | T | |
| G (2) | 1 | A | |
| G (2) | 2 | C | |
| T (3) | 0 | A | |
| T (3) | 1 | C | |
| T (3) | 2 | G |
stateDiagram-v2
A --> C: 0
A --> G: 1
A --> T: 2
C --> G: 0
C --> T: 1
C --> A: 2
G --> T: 0
G --> A: 1
G --> C: 2
T --> A: 0
T --> C: 1
T --> G: 2
A homopolymer occurs if
If
The transition probabilities for any state are uniform (
- A: 25%
- C: 25%
- G: 25%
- T: 25%
Expected GC Content:
To ensure retrieving and decoding capability, all DNA strands generated by Helix must adhere to the Chained Trellis structure.
| Segment | Length | Description | Trellis Seed |
|---|---|---|---|
| FP (Forward Primer) | 20 bases | PCR amplification target (Zip Code). | N/A |
| Addr (Address) | 24 bases | Encoded Block ID + Shard Index. | Last base of FP |
| Payload (Data) | Variable | Encoded Binary Data (with CRC32). | Last base of Addr |
| RP (Reverse Primer) | 20 bases | PCR amplification target. | N/A |
To maintain the homopolymer constraint across segment boundaries, the first base of any segment is calculated using the last base of the previous segment as the state
flowchart LR
subgraph "Forward Primer (20bp)"
FP_Head[...] --> FP_Tail[Base: T]
end
subgraph "Address (24bp)"
FP_Tail --> |"Seed State (T)"| Addr_Start[Base: A]
Addr_Start --> Addr_Body[...]
Addr_Body --> Addr_Tail[Base: G]
end
subgraph "Payload (Variable)"
Addr_Tail --> |"Seed State (G)"| Pay_Start[Base: C]
Pay_Start --> Pay_Body[...]
end
style FP_Tail fill:#f9f,stroke:#333,stroke-width:2px
style Addr_Start fill:#bbf,stroke:#333,stroke-width:2px
style Addr_Tail fill:#bbf,stroke:#333,stroke-width:2px
style Pay_Start fill:#bfb,stroke:#333,stroke-width:2px
$S_{Addr_Start} = \delta(Last(FP), Trit_0)$ $S_{Payload_Start} = \delta(Last(Addr), Trit_0)$
This guarantees that no "seams" exist in the DNA strand where a homopolymer could accidentally form (e.g., if Primer ends in A and Address starts in A).
© 2026 Project Helix