Distributed Seal Storage with Logarithmic Retention
Problem Statement
Users with older seals must aggregate more proofs of non-inclusion into their account history proof. While aggregating one Merkle path per month requires minimal computational effort, this creates a potential challenge for zk-based cryptocurrencies as stores of value. Users shouldn't risk losing their savings if they become temporarily inactive.
Proposed Solution: Distributed Data Availability
Full nodes can implement a logarithmically decreasing storage strategy:
- Generate a random ID for each node
- Store 100% of seals from the current month
- Store a decreasing percentage of seals from previous months, based on proximity to the node's ID
- Proximity is determined by the number of leading zeros in the XOR of the seal and the node's ID
Storage Distribution by Month
| Months ago |
% of seals stored |
Nodes needed for 99% availability |
| 0 |
100% |
1 |
| 1 |
50% |
8 |
| 2 |
25% |
18 |
| 3 |
12.5% |
36 |
| 4 |
6.25% |
73 |
| 5 |
3.125% |
147 |
| 6 |
1.56% |
294 |
| 12 |
0.024% |
18861 |
Mathematical Model
The number of nodes required can be expressed as:
$$
N = \left\lceil \frac{\log(1 - P)}{\log\left(1 - \frac{1}{R^M}\right)} \right\rceil
$$
Where:
-
$P$ is the target probability of finding a seal from $M$ months ago
-
$N$ is the required number of nodes
-
$R$ is the reduction factor (rate of falloff) for seal storage each month
For example, with $R = 2$, we need $N = 18,862$ nodes to achieve a $P = 99$% chance of finding a seal from $M = 12$ months ago.
Storage Overhead Analysis
The total storage requirement for each node depends on the reduction factor $R$. For $R = 2$ the storage overhead asymptotically approach twice the average number of seals published per month.
This can be generalized as:
$$
S = \frac{1 - R^{-M}}{1 - \frac{1}{R}}, \quad R \neq 1
$$
Where:
-
$S$ is the storage overhead factor
-
$R$ and $M$ are as defined above
Query Efficiency
The system can be enhanced with a structured network topology (like a k-bucket DHT) allowing:
- Logarithmic query complexity ($O(\log_k(n))$ roundtrips)
- Fast retrieval of historical seals (~300ms based on Mainline DHT experience)
Conclusion
This approach enables:
- 99% data availability for seals up to 12 months old, for a network size comparable to the current Bitcoin node count.
- Only double the storage requirement per node
- Fast, logarithmic-time queries for historical data
This pattern could potentially be applied to Bitcoin's UTXO set once ZeroSync enables historical data pruning, reducing the the cost of running a fully validating node.
Distributed Seal Storage with Logarithmic Retention
Problem Statement
Users with older seals must aggregate more proofs of non-inclusion into their account history proof. While aggregating one Merkle path per month requires minimal computational effort, this creates a potential challenge for zk-based cryptocurrencies as stores of value. Users shouldn't risk losing their savings if they become temporarily inactive.
Proposed Solution: Distributed Data Availability
Full nodes can implement a logarithmically decreasing storage strategy:
Storage Distribution by Month
Mathematical Model
The number of nodes required can be expressed as:
Where:
For example, with$R = 2$ , we need $N = 18,862$ nodes to achieve a $P = 99$ % chance of finding a seal from $M = 12$ months ago.
Storage Overhead Analysis
The total storage requirement for each node depends on the reduction factor$R$ . For $R = 2$ the storage overhead asymptotically approach twice the average number of seals published per month.
This can be generalized as:
Where:
Query Efficiency
The system can be enhanced with a structured network topology (like a k-bucket DHT) allowing:
Conclusion
This approach enables:
This pattern could potentially be applied to Bitcoin's UTXO set once ZeroSync enables historical data pruning, reducing the the cost of running a fully validating node.