feat: add archive-backed block storage independent of QMDB state root#11
feat: add archive-backed block storage independent of QMDB state root#11tac0turtle merged 6 commits intomainfrom
Conversation
Implements `BlockStorage<C>` in `crates/storage/src/block_store.rs` using
`commonware_storage::archive::prunable::Archive` as the storage backend.
## Design
QMDB produces a Merkle root (`CommitHash`) from every KV pair written via
`batch()` + `commit()`. Block data (headers, txs, receipts) must NOT
affect this hash. The `archive::prunable::Archive` is a completely
separate, non-Merkleized journal-backed store:
- Two partitions: `{prefix}-block-index` (key journal) and
`{prefix}-block-data` (value blob)
- Zero interaction with QMDB — no effect on `CommitHash`
- Lookup by block number (`u64`) or block hash (`[u8; 32]`)
- Pruning support: `prune(min_block)` deletes sections older than
the given block number
- Async write (`put`) and durable write (`put_sync`) variants
- `sync()` for batched fsync
## Key types
- `BlockHash = FixedBytes<32>` — 32-byte block hash key
- `BlockStorageConfig` — configures partitions, section size, write
buffers, and replay buffer
- `BlockStorage<C>` — the storage wrapper (generic over runtime context)
- `BlockStorageError` — error type with `From<archive::Error>`
## Tests
7 new unit tests in `block_store::tests`:
- Basic put/get by number and hash
- `has_block_number` / `has_block_hash`
- `first_block_number` / `last_block_number`
- Idempotent put (second write to same index is a no-op)
- Multiple blocks
- Large blocks (>4KB, demonstrating no QMDB chunk-size constraint)
- `put_sync` durability test
- **Isolation test**: writes to `BlockStorage` do not change `QmdbStorage`
commit hash (the core acceptance criterion)
All 49 storage tests pass (`just test-pkg evolve_storage`).
`just quality` (fmt-check + clippy -D warnings) passes clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a dedicated, archive-backed block storage mechanism that is entirely separate from the existing QMDB state root. This design ensures that block data (headers, transactions, receipts) can be stored and managed without impacting the application's Merkle hash, which is crucial for maintaining the integrity of the state tree while efficiently handling large volumes of block-related information. The new BlockStorage provides a robust API for storing, retrieving, and pruning blocks, offering a scalable and performant solution for historical block data. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughAdds an Archive-backed block archival subsystem: a new generic Changes
Sequence DiagramsequenceDiagram
participant Client as Producer/Caller
participant DevConsensus as DevConsensus
participant OnBlockArchive as OnBlockArchive
participant BlockStorage as BlockStorage
participant Archive as Archive
Client->>DevConsensus: produce block (height, block_hash, block)
DevConsensus->>OnBlockArchive: if configured, to_archived & spawn callback (async)
OnBlockArchive--)BlockStorage: put / put_sync (height, hash, bytes)
BlockStorage->>Archive: write value by section/number (value store)
Archive-->>BlockStorage: ok
BlockStorage->>Archive: write key journal (hash -> number)
Archive-->>BlockStorage: ok
BlockStorage-->>OnBlockArchive: ack (fire-and-forget)
Client->>BlockStorage: get_by_number(number)
BlockStorage->>Archive: read value by number
Archive-->>BlockStorage: bytes?
BlockStorage-->>Client: bytes?
Client->>BlockStorage: get_by_hash(hash)
BlockStorage->>Archive: lookup key journal -> number
Archive-->>BlockStorage: number?
BlockStorage->>Archive: read by number
Archive-->>BlockStorage: bytes?
BlockStorage-->>Client: bytes?
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
The pull request introduces a new BlockStorage module, providing an archive-backed block storage solution independent of QMDB's state root. This is a significant architectural change, ensuring that block data writes do not affect the application's Merkle hash. The implementation appears robust, with clear design rationale, API, and comprehensive unit tests, including a critical test verifying the independence from QMDB's commit hash. The new BlockStorageConfig in types.rs provides sensible defaults and necessary validation. The changes are well-documented and follow good practices for error handling and thread safety considerations. Overall, this is a well-executed feature addition that enhances the storage layer's flexibility and performance characteristics.
There was a problem hiding this comment.
🧹 Nitpick comments (5)
crates/storage/src/block_store.rs (5)
33-34: Module-level#![allow(clippy::disallowed_types)]is overly broad.The comment states this is needed for
Instantonly, but the blanket module-level allow will suppress warnings for any disallowed type used anywhere in this module (including tests). Consider scoping the allow to just theputmethod whereInstantis used.♻️ Suggested scoped allow
-// Instant is used for performance metrics only, not consensus logic. -#![allow(clippy::disallowed_types)] - use crate::types::{BlockHash, BlockStorageConfig};Then on the
putmethod:#[allow(clippy::disallowed_types)] pub async fn put( &mut self, ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 33 - 34, The module currently has a blanket #[allow(clippy::disallowed_types)] at the top; narrow this to only the place using Instant by removing that module-level attribute and adding #[allow(clippy::disallowed_types)] on the put method instead (i.e., the pub async fn put(...) that uses std::time::Instant), so the clippy exception is scoped to the Instant usage and does not suppress warnings across the whole module.
149-165:put_synclacks the timing/tracing instrumentation present input.
putlogsblock_numberandelapsed_usviatracing::debug!, butput_sync(which is described as the durable-write variant used "during block finalization") has no such instrumentation. Sinceput_syncincludes an fsync and is likely to be slower, it would arguably benefit more from timing metrics.♻️ Add tracing to put_sync
pub async fn put_sync( &mut self, block_number: u64, block_hash: BlockHash, block_bytes: bytes::Bytes, ) -> Result<(), BlockStorageError> { + let start = Instant::now(); self.archive .put_sync(block_number, block_hash, block_bytes) .await?; + tracing::debug!( + block_number, + elapsed_us = start.elapsed().as_micros(), + "block stored (sync)" + ); Ok(()) }Also applies to: 234-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 149 - 165, The put_sync function is missing timing/tracing present in put; add the same instrumentation: record let start = Instant::now() before calling self.archive.put_sync(...).await?, then after the await emit a tracing::debug! that includes block_number and elapsed_us = start.elapsed().as_micros() with a message like "block stored (durable)"; apply the same pattern to the other put_sync-like occurrence around lines 234-244, referencing the put_sync function and archive.put_sync call so the durable write latency is measured and logged.
89-141: Hardcoded buffer pool parameters and redundant zero-check.Two observations:
Lines 112–114: The key buffer pool page size (4096) and cache page count (64) are hardcoded with no way to configure them. If the key journal workload varies, callers have no knob to tune this. Consider exposing these in
BlockStorageConfigor at minimum extracting them as named constants for discoverability.Lines 92–96 vs 116–119: The
blocks_per_section == 0check on line 92 is explicitly redundant with theNonZeroU64::newcheck on line 116. The comment justifies it ("better error message"), which is fair, but you could simplify by removing the early check and using the same pattern as the other fields (singleok_or_elseproducing the descriptive error).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 89 - 141, Remove the redundant explicit zero-check for blocks_per_section and consolidate validation using NonZeroU64::new as done for key_write_buffer/value_write_buffer/replay_buffer (replace the initial if config.blocks_per_section == 0 block with the existing NonZeroU64::new(...) ok_or_else path for blocks_per_section), and make the key buffer pool parameters configurable or at least named constants: extract the literals 4096 and 64 used to construct page_size and cache_pages into either new fields on BlockStorageConfig (e.g., key_journal_page_size, key_journal_cache_pages) or top-level named constants (e.g., DEFAULT_KEY_JOURNAL_PAGE_SIZE, DEFAULT_KEY_JOURNAL_CACHE_PAGES) and use those names when creating key_buffer_pool so callers can tune them or the values are discoverable (update uses in new and keep Archive::init(context, cfg).await? unchanged).
486-548: Isolation test: consider adding a state write after block storage writes for stronger proof.The current test writes state → commits → writes blocks → commits (no new state). This proves block writes don't add dirty state, but an even stronger test would interleave: write state → commit → write blocks → write more state → commit, and verify the second hash differs from the first only by the new state write, not the block writes. This would guard against subtle corruption where block storage accidentally shares the QMDB partition namespace.
That said, the current test is already valuable and covers the stated design requirement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 486 - 548, Add a stronger assertion to test_block_storage_does_not_affect_commit_hash: after writing blocks to BlockStorage (block_store.put/... and block_store.sync()), perform an additional qmdb.apply_batch (e.g., another crate::types::Operation::Set for a new key) and then call qmdb.commit_state() to obtain a new hash; verify that this new hash differs from hash_before and that the change matches only the new state write (i.e., block writes did not affect the qmdb state). Locate the existing qmdb.apply_batch, qmdb.commit_state(), block_store.put/sync() and extend the test to apply the extra state write and assert expected hash behavior.
58-59:BlockStorageError::NotFoundvariant is unused and should be removed or suppressed.The
NotFound(u64)variant is defined but never constructed anywhere in the codebase. Theget_by_numberandget_by_hashmethods returnOption<bytes::Bytes>for missing blocks (lines 173, 183), and all tests confirm missing blocks returnNonerather than triggering this error variant. The only active error variants areArchiveandInvalidConfig. Either add#[allow(dead_code)]if this is reserved for future API expansion, or remove it to keep the error enum lean.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 58 - 59, The BlockStorageError::NotFound(u64) variant is defined but never used; open the BlockStorageError enum and either remove the NotFound(u64) variant entirely or mark it with #[allow(dead_code)] to suppress the warning; ensure you modify the enum where BlockStorageError is declared and keep get_by_number and get_by_hash behavior unchanged, and run tests to confirm no usages remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/storage/src/block_store.rs`:
- Around line 33-34: The module currently has a blanket
#[allow(clippy::disallowed_types)] at the top; narrow this to only the place
using Instant by removing that module-level attribute and adding
#[allow(clippy::disallowed_types)] on the put method instead (i.e., the pub
async fn put(...) that uses std::time::Instant), so the clippy exception is
scoped to the Instant usage and does not suppress warnings across the whole
module.
- Around line 149-165: The put_sync function is missing timing/tracing present
in put; add the same instrumentation: record let start = Instant::now() before
calling self.archive.put_sync(...).await?, then after the await emit a
tracing::debug! that includes block_number and elapsed_us =
start.elapsed().as_micros() with a message like "block stored (durable)"; apply
the same pattern to the other put_sync-like occurrence around lines 234-244,
referencing the put_sync function and archive.put_sync call so the durable write
latency is measured and logged.
- Around line 89-141: Remove the redundant explicit zero-check for
blocks_per_section and consolidate validation using NonZeroU64::new as done for
key_write_buffer/value_write_buffer/replay_buffer (replace the initial if
config.blocks_per_section == 0 block with the existing NonZeroU64::new(...)
ok_or_else path for blocks_per_section), and make the key buffer pool parameters
configurable or at least named constants: extract the literals 4096 and 64 used
to construct page_size and cache_pages into either new fields on
BlockStorageConfig (e.g., key_journal_page_size, key_journal_cache_pages) or
top-level named constants (e.g., DEFAULT_KEY_JOURNAL_PAGE_SIZE,
DEFAULT_KEY_JOURNAL_CACHE_PAGES) and use those names when creating
key_buffer_pool so callers can tune them or the values are discoverable (update
uses in new and keep Archive::init(context, cfg).await? unchanged).
- Around line 486-548: Add a stronger assertion to
test_block_storage_does_not_affect_commit_hash: after writing blocks to
BlockStorage (block_store.put/... and block_store.sync()), perform an additional
qmdb.apply_batch (e.g., another crate::types::Operation::Set for a new key) and
then call qmdb.commit_state() to obtain a new hash; verify that this new hash
differs from hash_before and that the change matches only the new state write
(i.e., block writes did not affect the qmdb state). Locate the existing
qmdb.apply_batch, qmdb.commit_state(), block_store.put/sync() and extend the
test to apply the extra state write and assert expected hash behavior.
- Around line 58-59: The BlockStorageError::NotFound(u64) variant is defined but
never used; open the BlockStorageError enum and either remove the NotFound(u64)
variant entirely or mark it with #[allow(dead_code)] to suppress the warning;
ensure you modify the enum where BlockStorageError is declared and keep
get_by_number and get_by_hash behavior unchanged, and run tests to confirm no
usages remain.
- Scope clippy::disallowed_types allow to put/put_sync methods only - Add timing/tracing instrumentation to put_sync - Extract key journal buffer pool constants - Remove redundant blocks_per_section zero-check - Remove unused BlockStorageError::NotFound variant - Strengthen isolation test with post-block-write state mutation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/app/node/src/lib.rs (1)
598-659:⚠️ Potential issue | 🟠 Major
run_dev_node_with_rpc_and_mempooldoes not wire the block archive — inconsistency with all other run paths.
Encodablewas added toTx's bounds at line 561 (presumably to support archival), but neithercontext_for_archivenorbuild_block_archiveare present in this function body. The constructedDevConsensusnever receiveswith_block_archive(cb), soEVOLVE_BLOCK_ARCHIVE=1has no effect when using this entrypoint. Every other run path (run_dev_node_with_rpc,run_dev_node_with_rpc_and_mempool_eth) correctly wires the callback.🐛 Proposed fix
async move { let context_for_shutdown = context.clone(); + let context_for_archive = context.clone(); let storage = (build_storage)(context, storage_config) .await .expect("failed to create storage"); // ... (genesis / stf setup unchanged) ... + // Build block archive callback if enabled + let archive_cb = build_block_archive(context_for_archive).await; let mempool: SharedMempool<Mempool<Tx>> = new_shared_mempool(); - let dev: Arc<DevConsensus<Stf, S, Codes, Tx, evolve_server::NoopChainIndex>> = - Arc::new(DevConsensus::with_mempool(stf, storage, codes, dev_config, mempool)); + let mut consensus = DevConsensus::with_mempool(stf, storage, codes, dev_config, mempool); + if let Some(cb) = archive_cb { + consensus = consensus.with_block_archive(cb); + } + let dev: Arc<DevConsensus<Stf, S, Codes, Tx, evolve_server::NoopChainIndex>> = + Arc::new(consensus);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/node/src/lib.rs` around lines 598 - 659, run_dev_node_with_rpc_and_mempool currently never wires the block archive callback so EVOLVE_BLOCK_ARCHIVE has no effect; fix it by mirroring the other run paths: create a context_for_archive (clone of context), obtain/build the build_block_archive callback (clone Arc like build_genesis_stf/build_stf/build_codes), and if the block-archive feature/env is enabled call DevConsensus::with_mempool(...) and then call .with_block_archive(build_block_archive(context_for_archive)) (or pass the constructed callback) before wrapping in Arc; update the function body around DevConsensus creation to use with_block_archive so the created DevConsensus receives the archive callback.
🧹 Nitpick comments (5)
crates/storage/src/block_store.rs (2)
247-566: Missing test coverage forprune()and index-rebuild-on-restart.Two advertised behaviors have no tests:
prune(min_block)— the public API explicitly documents section-granularity pruning, but there is no test verifying that pruned blocks are no longer retrievable (and that blocks past the horizon are still available).- Index rebuild on restart — the constructor doc states "the in-memory index is rebuilt from the key journal on startup." No test opens a store, writes blocks, drops it, reopens with the same context, and confirms
get_by_number/get_by_hashstill work. This is the most likely regression vector for the restart path.Would you like me to draft both test cases?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 247 - 566, Tests are missing for two behaviors: BlockStorage::prune(min_block) and the in-memory index rebuild on restart; add unit tests that (1) create a BlockStorage, write multiple blocks, call prune(min_block) and assert that blocks with numbers < min_block are no longer returned by get_by_number/get_by_hash while later blocks remain available, and (2) create a BlockStorage, write blocks, drop/close the store (end the runner context), reopen via BlockStorage::new with the same storage directory and confirm get_by_number/get_by_hash return the previously written blocks (verifying index rebuild from the key journal); use helper functions make_block_hash/make_block_bytes and existing Runner/TokioConfig pattern to manage runtime and temp directory setup.
109-110: Prefer typedNonZero*constants over runtime.unwrap().
KEY_JOURNAL_PAGE_SIZEandKEY_JOURNAL_CACHE_PAGESare compile-time constants whose values are already known to be non-zero. Declaring them asNonZeroU16/NonZeroUsizeremoves the runtimeunwrapand makes the invariant part of the type system:♻️ Proposed refactor
-/// Page size for the key journal buffer pool (bytes). -const KEY_JOURNAL_PAGE_SIZE: u16 = 4096; - -/// Number of cached pages in the key journal buffer pool. -/// Total cache: KEY_JOURNAL_PAGE_SIZE * KEY_JOURNAL_CACHE_PAGES = 256KB by default. -const KEY_JOURNAL_CACHE_PAGES: usize = 64; +/// Page size for the key journal buffer pool (bytes). +const KEY_JOURNAL_PAGE_SIZE: std::num::NonZeroU16 = + unsafe { std::num::NonZeroU16::new_unchecked(4096) }; + +/// Number of cached pages in the key journal buffer pool. +/// Total cache: 4096 * 64 = 256 KB. +const KEY_JOURNAL_CACHE_PAGES: std::num::NonZeroUsize = + unsafe { std::num::NonZeroUsize::new_unchecked(64) };Then the
new()constructor simplifies to:- let page_size = std::num::NonZeroU16::new(KEY_JOURNAL_PAGE_SIZE).unwrap(); - let cache_pages = std::num::NonZeroUsize::new(KEY_JOURNAL_CACHE_PAGES).unwrap(); - let key_buffer_pool = PoolRef::new(page_size, cache_pages); + let key_buffer_pool = PoolRef::new(KEY_JOURNAL_PAGE_SIZE, KEY_JOURNAL_CACHE_PAGES);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/block_store.rs` around lines 109 - 110, Replace the runtime .unwrap() constructions for page_size and cache_pages by defining the constants KEY_JOURNAL_PAGE_SIZE and KEY_JOURNAL_CACHE_PAGES as typed NonZero constants and use them directly; specifically change the declarations so KEY_JOURNAL_PAGE_SIZE is a NonZeroU16 and KEY_JOURNAL_CACHE_PAGES is a NonZeroUsize, then assign page_size = KEY_JOURNAL_PAGE_SIZE and cache_pages = KEY_JOURNAL_CACHE_PAGES (removing std::num::NonZeroU16::new(...).unwrap() and std::num::NonZeroUsize::new(...).unwrap()); update any uses of those constants to the new typed names if necessary (symbols: KEY_JOURNAL_PAGE_SIZE, KEY_JOURNAL_CACHE_PAGES, page_size, cache_pages).crates/app/server/src/dev.rs (2)
380-380:produce_block_with_txsexceeds the 70-line function limit.The function spans ~133 lines. The archive block added by this PR pushes it further past the limit. Consider splitting out the archive notification and/or the index-and-publish logic into private helper methods to bring each under 70 lines.
As per coding guidelines: "Keep functions to less than 70 lines to maintain bounded complexity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/server/src/dev.rs` at line 380, The function produce_block_with_txs is much longer than the 70-line guideline; split it into smaller private helpers to reduce cognitive complexity: extract the archive-related logic into a new private function (e.g., send_archive_notification or archive_block) and extract the indexing-and-publishing logic into another private function (e.g., index_and_publish_block or publish_indexed_block); update produce_block_with_txs to call these helpers (preserving existing parameters and error handling) so the main function becomes a short orchestration that stays under 70 lines.
436-447: Redundant outertokio::spawn— call the callback directly.
OnBlockArchiveis a syncFnthat internally fires atokio::spawnand returns()immediately. Wrapping the call in anothertokio::spawnadds an extra task allocation with no benefit; the actual async work is already spawned inside the callback.♻️ Proposed simplification
if let Ok(encoded) = borsh::to_vec(&archived) { let cb = Arc::clone(cb); let archived_bytes = bytes::Bytes::from(encoded); - tokio::spawn(async move { - cb(height, block_hash, archived_bytes); - }); + cb(height, block_hash, archived_bytes); } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/server/src/dev.rs` around lines 436 - 447, The code currently wraps calling the sync callback self.on_block_archive in an unnecessary tokio::spawn; remove the outer tokio::spawn and invoke the Arc-cloned callback directly after encoding: compute archived via block.to_archived, encode with borsh::to_vec, convert to bytes with bytes::Bytes::from, clone the Arc (Arc::clone(cb)) and then call cb(height, block_hash, archived_bytes) synchronously; keep the existing tracing::warn branch for encoding failure.crates/app/server/src/block.rs (1)
179-189:ArchivedBlockis missing#[derive(Debug)].Every other public struct in this file derives
Debug. Without it, archived blocks cannot be printed in tracing or test assertions.♻️ Proposed fix
-#[derive(BorshSerialize, BorshDeserialize)] +#[derive(BorshSerialize, BorshDeserialize, Debug)] pub struct ArchivedBlock {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/server/src/block.rs` around lines 179 - 189, The ArchivedBlock struct is missing Debug derivation which prevents printing in logs/tests; update the struct's derive list for ArchivedBlock to include Debug alongside BorshSerialize and BorshDeserialize (i.e., add #[derive(Debug, BorshSerialize, BorshDeserialize)]) so ArchivedBlock can be formatted in tracing and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/app/node/src/lib.rs`:
- Around line 198-207: The callback currently spawns a tokio::spawn per block
(cb: OnBlockArchive) which allows out-of-order store.put_sync calls; replace
that with a bounded tokio::sync::mpsc channel and a single consumer task that
sequentially calls store.lock().await and guard.put_sync(...) to preserve block
order. Create tx/ rx (e.g., channel size 64) outside the callback, clone tx into
the closure and use tx.try_send((block_number, hash_bytes, block_bytes))
(handling send errors) instead of spawning; spawn one consumer task that loops
on rx.recv().await and performs the put_sync calls and logs failures via
tracing::warn!. Ensure the consumer holds the MutexGuard only while calling
put_sync so other logic remains unchanged.
In `@crates/app/server/Cargo.toml`:
- Line 22: The dependency "bytes" is declared with differing versions; add bytes
= "1.5" under [workspace.dependencies] in the root Cargo.toml and then change
each crate's declaration (the entries currently written as bytes = "1" or bytes
= "1.5") to bytes = { workspace = true } so the workspace resolves a single
version; update the three crates that currently declare "bytes" to use the
workspace override and ensure Cargo.lock is regenerated.
In `@crates/app/server/src/block.rs`:
- Around line 197-201: The current use of filter_map(|tx| tx.encode().ok())
silently drops failed encodings, producing ArchivedBlock entries whose
transactions.len() does not match the original tx_count and corrupting archives;
change the code that builds the transactions vector (the iterator over
self.transactions and the tx.encode() call) to propagate encoding errors instead
of swallowing them—return a Result or propagate an error from the function that
constructs ArchivedBlock, and update callers (including dev.rs) to handle the
error by logging a warning and skipping archival for that block rather than
writing a partial entry; ensure any invariant checks/assertions compare
ArchivedBlock.transactions.len() against tx_count and fail fast on mismatch.
In `@crates/storage/src/block_store.rs`:
- Around line 491-565: The test function
test_block_storage_does_not_affect_commit_hash is over the 70-line limit; split
it into two smaller tests (e.g., test_block_storage_isolation_commits and
test_block_storage_allows_subsequent_state_mutation) or extract the shared setup
into a helper like setup_qmdb_and_blockstore that returns (qmdb, block_store,
runner/context) and then implement two focused test functions that reuse that
helper—ensure each new test performs only one logical assertion (isolation check
vs. post-write state-mutation verification) so no function exceeds the line
limit.
- Around line 141-158: The code currently uses std::time::Instant::now() in put
and put_sync which violates the block-time-from-context guideline; replace those
uses with the context/Clock-provided time (e.g., call context.now() or the
Clock<T>::now equivalent before and after the archive.put/put_sync calls),
compute elapsed using the difference between the two context times, and use that
elapsed value in the tracing::debug elapsed_us field (update any local variable
named start and the elapsed calculation accordingly in the put and put_sync
methods).
- Around line 47-54: Replace the thiserror-based BlockStorageError enum with the
project's define_error! macro so errors are registered with compile-time codes:
convert BlockStorageError to use define_error!, assign numeric codes in the
mandated ranges (choose a system-range code 0x40–0x7F for Archive and a
business/validation-appropriate code for InvalidConfig as required), preserve
the original variant names Archive and InvalidConfig and ensure Archive still
wraps commonware_storage::archive::Error (keep the source conversion/From
mapping if the macro supports it) and InvalidConfig carries a String message;
update any uses of BlockStorageError accordingly.
- Around line 56-80: The block-level doc comment is currently attached to the
constant KEY_JOURNAL_PAGE_SIZE (and thus KEY_JOURNAL_CACHE_PAGES) instead of the
public type BlockStorage; move the constant declarations (KEY_JOURNAL_PAGE_SIZE
and KEY_JOURNAL_CACHE_PAGES) above the doc comment block or insert a blank line
between the doc comment and the first constant so the /// doc attaches to pub
struct BlockStorage<C> (retain the same doc text and ensure the constants remain
private and unchanged).
---
Outside diff comments:
In `@crates/app/node/src/lib.rs`:
- Around line 598-659: run_dev_node_with_rpc_and_mempool currently never wires
the block archive callback so EVOLVE_BLOCK_ARCHIVE has no effect; fix it by
mirroring the other run paths: create a context_for_archive (clone of context),
obtain/build the build_block_archive callback (clone Arc like
build_genesis_stf/build_stf/build_codes), and if the block-archive feature/env
is enabled call DevConsensus::with_mempool(...) and then call
.with_block_archive(build_block_archive(context_for_archive)) (or pass the
constructed callback) before wrapping in Arc; update the function body around
DevConsensus creation to use with_block_archive so the created DevConsensus
receives the archive callback.
---
Nitpick comments:
In `@crates/app/server/src/block.rs`:
- Around line 179-189: The ArchivedBlock struct is missing Debug derivation
which prevents printing in logs/tests; update the struct's derive list for
ArchivedBlock to include Debug alongside BorshSerialize and BorshDeserialize
(i.e., add #[derive(Debug, BorshSerialize, BorshDeserialize)]) so ArchivedBlock
can be formatted in tracing and assertions.
In `@crates/app/server/src/dev.rs`:
- Line 380: The function produce_block_with_txs is much longer than the 70-line
guideline; split it into smaller private helpers to reduce cognitive complexity:
extract the archive-related logic into a new private function (e.g.,
send_archive_notification or archive_block) and extract the
indexing-and-publishing logic into another private function (e.g.,
index_and_publish_block or publish_indexed_block); update produce_block_with_txs
to call these helpers (preserving existing parameters and error handling) so the
main function becomes a short orchestration that stays under 70 lines.
- Around line 436-447: The code currently wraps calling the sync callback
self.on_block_archive in an unnecessary tokio::spawn; remove the outer
tokio::spawn and invoke the Arc-cloned callback directly after encoding: compute
archived via block.to_archived, encode with borsh::to_vec, convert to bytes with
bytes::Bytes::from, clone the Arc (Arc::clone(cb)) and then call cb(height,
block_hash, archived_bytes) synchronously; keep the existing tracing::warn
branch for encoding failure.
In `@crates/storage/src/block_store.rs`:
- Around line 247-566: Tests are missing for two behaviors:
BlockStorage::prune(min_block) and the in-memory index rebuild on restart; add
unit tests that (1) create a BlockStorage, write multiple blocks, call
prune(min_block) and assert that blocks with numbers < min_block are no longer
returned by get_by_number/get_by_hash while later blocks remain available, and
(2) create a BlockStorage, write blocks, drop/close the store (end the runner
context), reopen via BlockStorage::new with the same storage directory and
confirm get_by_number/get_by_hash return the previously written blocks
(verifying index rebuild from the key journal); use helper functions
make_block_hash/make_block_bytes and existing Runner/TokioConfig pattern to
manage runtime and temp directory setup.
- Around line 109-110: Replace the runtime .unwrap() constructions for page_size
and cache_pages by defining the constants KEY_JOURNAL_PAGE_SIZE and
KEY_JOURNAL_CACHE_PAGES as typed NonZero constants and use them directly;
specifically change the declarations so KEY_JOURNAL_PAGE_SIZE is a NonZeroU16
and KEY_JOURNAL_CACHE_PAGES is a NonZeroUsize, then assign page_size =
KEY_JOURNAL_PAGE_SIZE and cache_pages = KEY_JOURNAL_CACHE_PAGES (removing
std::num::NonZeroU16::new(...).unwrap() and
std::num::NonZeroUsize::new(...).unwrap()); update any uses of those constants
to the new typed names if necessary (symbols: KEY_JOURNAL_PAGE_SIZE,
KEY_JOURNAL_CACHE_PAGES, page_size, cache_pages).
| #[derive(Debug, Error)] | ||
| pub enum BlockStorageError { | ||
| #[error("archive error: {0}")] | ||
| Archive(#[from] commonware_storage::archive::Error), | ||
|
|
||
| #[error("invalid configuration: {0}")] | ||
| InvalidConfig(String), | ||
| } |
There was a problem hiding this comment.
Use define_error! instead of thiserror::Error.
The error enum uses #[derive(Error)] from thiserror, which bypasses compile-time error-code registration. As per coding guidelines, errors must be defined with the define_error! macro and assigned codes in the mandated ranges (0x00–0x3F validation, 0x40–0x7F system, 0x80–0xBF business logic).
As per coding guidelines: "Define errors using the define_error! macro for compile-time error registration with error codes in ranges: 0x00-0x3F (validation), 0x40-0x7F (system), 0x80-0xBF (business logic)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/storage/src/block_store.rs` around lines 47 - 54, Replace the
thiserror-based BlockStorageError enum with the project's define_error! macro so
errors are registered with compile-time codes: convert BlockStorageError to use
define_error!, assign numeric codes in the mandated ranges (choose a
system-range code 0x40–0x7F for Archive and a business/validation-appropriate
code for InvalidConfig as required), preserve the original variant names Archive
and InvalidConfig and ensure Archive still wraps
commonware_storage::archive::Error (keep the source conversion/From mapping if
the macro supports it) and InvalidConfig carries a String message; update any
uses of BlockStorageError accordingly.
| #[allow(clippy::disallowed_types)] // Instant is for metrics only, not consensus. | ||
| pub async fn put( | ||
| &mut self, | ||
| block_number: u64, | ||
| block_hash: BlockHash, | ||
| block_bytes: bytes::Bytes, | ||
| ) -> Result<(), BlockStorageError> { | ||
| let start = std::time::Instant::now(); | ||
| self.archive | ||
| .put(block_number, block_hash, block_bytes) | ||
| .await?; | ||
| tracing::debug!( | ||
| block_number, | ||
| elapsed_us = start.elapsed().as_micros(), | ||
| "block stored" | ||
| ); | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
std::time::Instant violates the block-time-from-context guideline.
Both put (line 148) and put_sync (line 234) reach for std::time::Instant::now(). The #[allow(clippy::disallowed_types)] suppresses the linter, but the coding guideline mandates using block time from the runtime context to avoid non-deterministic time sources. C is already bounded by Clock, which provides a deterministic elapsed-time source. Use context.now() (or the equivalent Clock method) instead:
⏱️ Proposed fix for `put` (same pattern applies to `put_sync`)
- #[allow(clippy::disallowed_types)] // Instant is for metrics only, not consensus.
pub async fn put(
&mut self,
+ context: &C,
block_number: u64,
block_hash: BlockHash,
block_bytes: bytes::Bytes,
) -> Result<(), BlockStorageError> {
- let start = std::time::Instant::now();
+ let start = context.now();
self.archive
.put(block_number, block_hash, block_bytes)
.await?;
tracing::debug!(
block_number,
- elapsed_us = start.elapsed().as_micros(),
+ elapsed_us = context.now().duration_since(start).as_micros(),
"block stored"
);
Ok(())
}As per coding guidelines: "Use block time from context instead of std::time::Instant or SystemTime to avoid non-deterministic time sources."
Also applies to: 227-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/storage/src/block_store.rs` around lines 141 - 158, The code currently
uses std::time::Instant::now() in put and put_sync which violates the
block-time-from-context guideline; replace those uses with the
context/Clock-provided time (e.g., call context.now() or the Clock<T>::now
equivalent before and after the archive.put/put_sync calls), compute elapsed
using the difference between the two context times, and use that elapsed value
in the tracing::debug elapsed_us field (update any local variable named start
and the elapsed calculation accordingly in the put and put_sync methods).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/app/server/src/block.rs (1)
179-189: Consider adding standard derives to the publicArchivedBlockstruct.
Debug,Clone, andPartialEqare expected for any public data type and are needed for test assertions and diagnostic logging. All field types (u64,[u8; 32],Vec<Vec<u8>>) already implement these traits.♻️ Proposed change
-#[derive(BorshSerialize, BorshDeserialize)] +#[derive(Debug, Clone, PartialEq, BorshSerialize, BorshDeserialize)] pub struct ArchivedBlock {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/server/src/block.rs` around lines 179 - 189, The public ArchivedBlock struct currently derives only BorshSerialize and BorshDeserialize; add the standard derives Debug, Clone, and PartialEq to the derive attribute for ArchivedBlock so tests and logging can use it (i.e., change the derive on the ArchivedBlock definition to include Debug, Clone, PartialEq alongside BorshSerialize and BorshDeserialize).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/app/node/src/lib.rs`:
- Line 572: The function run_dev_node_with_rpc_and_mempool declares Tx:
Encodable but never uses the archive machinery; either remove the Encodable
bound or wire archival like the other variants—preferably wire it: clone
context_for_archive (as done in run_dev_node_with_rpc /
run_dev_node_with_rpc_and_mempool_eth), pass that clone into build_block_archive
and ensure the archive callback is registered in
run_dev_node_with_rpc_and_mempool so EVOLVE_BLOCK_ARCHIVE=1 actually triggers
archival.
In `@crates/app/server/src/dev.rs`:
- Around line 434-457: The spawned task doesn't move the heavy work off the hot
path because block.to_archived(...) and borsh::to_vec(&archived) run
synchronously; remove the unnecessary tokio::spawn and invoke the non-blocking
callback directly (call cb(height, block_hash, archived_bytes) after creating
archived_bytes) inside the on_block_archive branch in produce_block_with_txs,
referencing on_block_archive, block.to_archived, borsh::to_vec and try_send; if
you do want serialization off-path instead, move ownership of block (or clone
the needed fields/require Tx: Clone) into the spawned task so block.to_archived
and borsh::to_vec run inside tokio::spawn.
---
Duplicate comments:
In `@crates/storage/src/block_store.rs`:
- Around line 47-54: Replace the thiserror-based enum BlockStorageError with the
project's define_error! macro and assign error codes: declare BlockStorageError
via define_error! { pub enum BlockStorageError { Archive = 0x40, InvalidConfig =
0x01 } } (or similar values in the required ranges) so Archive (wrapping
commonware_storage::archive::Error) lives in the system range (0x40-0x7F) and
InvalidConfig uses a validation code in 0x00-0x3F; preserve the #[from]
conversion for Archive so source errors are retained and keep the human-readable
messages for both variants.
- Around line 141-158: The put and put_sync methods currently use
std::time::Instant (start.elapsed()) which violates the block-time-from-context
guideline; update both functions (put and put_sync) to stop creating Instant and
instead accept a block time from the caller (e.g., add a block_time or
block_timestamp parameter of the project's canonical block-time type) and use
that value in the tracing::debug log (replace elapsed_us =
start.elapsed().as_micros() with the context-provided block time or a derived
value based on it), and remove the std::time::Instant usage and its allow
attribute; ensure signature changes are propagated to callers.
---
Nitpick comments:
In `@crates/app/server/src/block.rs`:
- Around line 179-189: The public ArchivedBlock struct currently derives only
BorshSerialize and BorshDeserialize; add the standard derives Debug, Clone, and
PartialEq to the derive attribute for ArchivedBlock so tests and logging can use
it (i.e., change the derive on the ArchivedBlock definition to include Debug,
Clone, PartialEq alongside BorshSerialize and BorshDeserialize).
| // Archive the block if a callback is configured. | ||
| // Runs off the hot path via a spawned task (fire-and-forget). | ||
| if let Some(ref cb) = self.on_block_archive { | ||
| match block.to_archived(block_hash, state_root, gas_used) { | ||
| Ok(archived) => { | ||
| if let Ok(encoded) = borsh::to_vec(&archived) { | ||
| let cb = Arc::clone(cb); | ||
| let archived_bytes = bytes::Bytes::from(encoded); | ||
| tokio::spawn(async move { | ||
| cb(height, block_hash, archived_bytes); | ||
| }); | ||
| } else { | ||
| tracing::warn!("Failed to borsh-encode archived block {}", height); | ||
| } | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| "Failed to encode transactions for block {}: {:?}", | ||
| height, | ||
| e | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The comment "Runs off the hot path via a spawned task" is inaccurate; the heavy work is synchronous.
block.to_archived(...) (encodes every transaction) and borsh::to_vec(&archived) (serialises the full block) both run synchronously inside produce_block_with_txs before the tokio::spawn. Only the resulting try_send (which is already non-blocking) is deferred. The spawn therefore adds task-creation overhead without moving the expensive serialisation off the critical path. It can be eliminated:
♻️ Proposed fix — call the non-blocking callback directly
if let Ok(encoded) = borsh::to_vec(&archived) {
- let cb = Arc::clone(cb);
let archived_bytes = bytes::Bytes::from(encoded);
- tokio::spawn(async move {
- cb(height, block_hash, archived_bytes);
- });
+ cb(height, block_hash, archived_bytes);
} else {If the goal is to move serialisation off-path, block (which is Block<Tx>) would need to be moved into the spawned task — requiring Tx: Clone — or the block fields needed for to_archived cloned individually before spawning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/app/server/src/dev.rs` around lines 434 - 457, The spawned task
doesn't move the heavy work off the hot path because block.to_archived(...) and
borsh::to_vec(&archived) run synchronously; remove the unnecessary tokio::spawn
and invoke the non-blocking callback directly (call cb(height, block_hash,
archived_bytes) after creating archived_bytes) inside the on_block_archive
branch in produce_block_with_txs, referencing on_block_archive,
block.to_archived, borsh::to_vec and try_send; if you do want serialization
off-path instead, move ownership of block (or clone the needed fields/require
Tx: Clone) into the spawned task so block.to_archived and borsh::to_vec run
inside tokio::spawn.
Block archival is a required subsystem — every produced block must be persisted. Remove the EVOLVE_BLOCK_ARCHIVE env var check and make build_block_archive unconditional (panics on init failure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/app/node/src/lib.rs (2)
188-189: Consumer task handle is discarded; archival failures are silently swallowed.Dropping the
JoinHandlefromtokio::spawnmeans a panic inside the consumer (e.g., withinput_sync) will terminate archival without any observable signal to the rest of the node — it will appear healthy while silently no longer persisting blocks. Consider storing the handle or using a supervision loop.The
let mut store = store;rebind on line 189 is purely to introduce mutability; it can be folded into the variable declaration at the point wherestoreis first bound.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/node/src/lib.rs` around lines 188 - 189, The spawned consumer task's JoinHandle is being dropped (tokio::spawn call) which hides panics/failures (e.g., in put_sync); change the code to keep and supervise the JoinHandle (assign the result of tokio::spawn to a variable like consumer_handle and either await it, monitor it, or spawn a supervisor that restarts the task on failure) so archival errors are surfaced or retried; additionally remove the redundant rebind `let mut store = store;` by making the original `store` binding mutable where it is first declared so you don't rebind just to gain mutability.
1013-1013: Remove unnecessaryEncodablebound onTxininit_dev_node.This function only runs genesis—it never constructs
DevConsensusor calls any DevConsensus methods. While DevConsensus does have impl blocks requiringTx: Encodable, they apply only to specific methods (block production/archival). The struct definition itself requires onlyTx: MempoolTx. TheEncodablebound here unnecessarily restricts callers, preventing genesis tooling from using non-encodable stub transaction types.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/node/src/lib.rs` at line 1013, The generic bound on Tx in init_dev_node is overly restrictive—remove the Encodable constraint so the signature uses Tx: Transaction + MempoolTx + Send + Sync + 'static (leaving Encodable only where required by DevConsensus impls); locate the init_dev_node function declaration and delete or omit the Encodable trait from the Tx bounds while keeping Transaction and MempoolTx so genesis-only callers can use non-encodable stub transaction types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/app/node/src/lib.rs`:
- Around line 601-654: The dev node startup currently consumes context via
build_storage so build_block_archive is never called and the produced
DevConsensus (from DevConsensus::with_mempool) is not given a block archive;
clone the context before calling build_storage (use the existing
context_for_shutdown pattern to create context_for_archive), call
build_block_archive(context_for_archive, storage.clone(), /*...*/).await? to
create the archive, and then chain .with_block_archive(archive) onto the
DevConsensus::with_mempool(...) construction so the node archives produced
blocks; ensure you use the same types/Encodable bounds and pass storage/other
required args to build_block_archive consistent with other code paths.
---
Nitpick comments:
In `@crates/app/node/src/lib.rs`:
- Around line 188-189: The spawned consumer task's JoinHandle is being dropped
(tokio::spawn call) which hides panics/failures (e.g., in put_sync); change the
code to keep and supervise the JoinHandle (assign the result of tokio::spawn to
a variable like consumer_handle and either await it, monitor it, or spawn a
supervisor that restarts the task on failure) so archival errors are surfaced or
retried; additionally remove the redundant rebind `let mut store = store;` by
making the original `store` binding mutable where it is first declared so you
don't rebind just to gain mutability.
- Line 1013: The generic bound on Tx in init_dev_node is overly
restrictive—remove the Encodable constraint so the signature uses Tx:
Transaction + MempoolTx + Send + Sync + 'static (leaving Encodable only where
required by DevConsensus impls); locate the init_dev_node function declaration
and delete or omit the Encodable trait from the Tx bounds while keeping
Transaction and MempoolTx so genesis-only callers can use non-encodable stub
transaction types.
Summary
BlockStorage<C>incrates/storage/src/block_store.rsusingcommonware_storage::archive::prunable::ArchiveCommitHashput/get_by_number/get_by_hash/prune/sync/put_syncBlockHashtype alias andBlockStorageConfigwith sensible defaultsDesign
QMDB does not support out-of-Merkle writes or namespacing. The solution uses a separate
archive::prunable::Archiveinstance with its own key journal and value blob partitions, completely decoupled from the state QMDB.Files changed
crates/storage/src/block_store.rs— new module (+549 lines)crates/storage/src/lib.rs— re-exportscrates/storage/src/types.rs— BlockHash, BlockStorageConfig typesTest plan
just qualitypasses (fmt + clippy zero warnings)just checkpasses workspace-wide🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests