fix: make Store's public methods return Result<T,Error>, propagate and update every callers accordingly#340
Conversation
Greptile SummaryThis PR replaces all panicking
Confidence Score: 3/5Safe to merge after fixing the walk-up loop DB-error handling; all other changes are mechanical and correct One P1 logic bug: a storage read error in the pending-block walk-up loop is silently coerced to "block not found", causing spurious P2P requests that restart the cycle on re-delivery. The rest of the error-propagation work is solid and the happy path is unaffected. crates/blockchain/src/lib.rs — the
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Core store: all public methods now return Result; .expect() replaced with ?; error propagation is clean and consistent throughout |
| crates/blockchain/src/lib.rs | Actor layer updated to propagate storage errors; P1 issue: while let Ok(Some(...)) in walk-up loop silently converts DB errors to missing-block, triggering spurious network requests |
| crates/blockchain/src/store.rs | Fork-choice logic updated to propagate errors; redundant store.time() DB read inside hot tick loop (P2) |
| crates/net/p2p/src/req_resp/handlers.rs | Block-by-root handler and build_status updated; DB errors in block lookup silently swallowed without logging (P2) |
| crates/net/rpc/src/fork_choice.rs | Introduced local try_store! macro for 500 responses; macro discards error details without logging (P2) |
| crates/net/rpc/src/lib.rs | RPC handlers correctly map storage errors to 500 and missing resources to 404 |
| bin/ethlambda/src/main.rs | from_anchor_state now propagates via map_err; BlockChain::spawn uses .expect() at startup which is acceptable for initialization failures |
| bin/ethlambda/src/checkpoint_sync.rs | Adds Storage(#[from] ethlambda_storage::Error) variant to CheckpointSyncError; straightforward and correct |
| crates/storage/src/lib.rs | Re-exports Error from the api module; trivial one-line change |
| crates/blockchain/tests/forkchoice_spectests.rs | Test file updated to unwrap Results; correct use of .unwrap() in test context |
| crates/blockchain/tests/signature_spectests.rs | Test file updated to unwrap get_forkchoice_store Result; minimal change, correct |
Comments Outside Diff (2)
-
crates/net/p2p/src/req_resp/handlers.rs, line 901-906 (link)DB errors silently swallowed with no log
The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.
Prompt To Fix With AI
This is a comment left during a code review. Path: crates/net/p2p/src/req_resp/handlers.rs Line: 901-906 Comment: **DB errors silently swallowed with no log** The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production. How can I resolve this? If you propose a fix, please make it concise.
-
crates/net/rpc/src/fork_choice.rs, line 940-947 (link)try_store!macro discards error context with no logErr(_)silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/net/rpc/src/fork_choice.rs Line: 940-947 Comment: **`try_store!` macro discards error context with no log** `Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
crates/blockchain/src/lib.rs:242
**DB error silently treated as missing block during walk-up**
In `process_or_pend_block`, the walk-up loop uses `while let Ok(Some(header)) = self.store.get_block_header(&missing_root)`, which breaks silently on both `Ok(None)` and `Err(...)`. When a storage error occurs mid-walk, the code falls through to `self.request_missing_block(missing_root)` — treating a read failure as if the block is genuinely absent from the DB. This issues a spurious P2P request for a block the node already holds, and on re-delivery the cycle restarts.
### Issue 2 of 4
crates/blockchain/src/store.rs:453-461
**Redundant `store.time()` read inside hot tick loop**
After `store.set_time(t + 1)` succeeds, `new_time` is necessarily `t + 1` — there is no need to call `store.time()` again. Each extra call does a full DB round-trip and the loop runs once per 800 ms interval on every tick. `new_time` can simply be replaced with `t + 1`.
```suggestion
let new_time = t + 1;
```
### Issue 3 of 4
crates/net/p2p/src/req_resp/handlers.rs:901-906
**DB errors silently swallowed with no log**
The comment "DB errors are silently skipped (per spec)" conflates two distinct cases: the spec says missing blocks may be skipped, but a storage error is not a missing block — it signals a potential disk fault. Swallowing it without a log makes these failures invisible in production.
### Issue 4 of 4
crates/net/rpc/src/fork_choice.rs:940-947
**`try_store!` macro discards error context with no log**
`Err(_)` silently drops the actual storage error before returning a 500. Storage errors at this layer indicate a potential disk or backend issue and should be logged so operators can diagnose them.
Reviews (1): Last reviewed commit: "make Store's public methods return Resul..." | Re-trigger Greptile
d360d65 to
e763727
Compare
MegaRedHand
left a comment
There was a problem hiding this comment.
@d4m014 please fix the conflicts
e763727 to
b7154f6
Compare
done, there were some new errors across different files merged to main that called Store function which i already made to return Result in this PR (i've now resolved them), i suggest we review and merge this PR before new ones got merged to main, because any changes that call Store's functions without handling the new Result changes will break this one and we will have to go through each files for errors again. |
MegaRedHand
left a comment
There was a problem hiding this comment.
Thanks for the big contribution! I left some comments. We'll try to merge this once addressed, so no more conflicts appear.
🗒️ Description / Motivation
All public Store methods that perform disk I/O previously panicked on failure via
.expect(). This PR makes every such method returnResult<T, Error>and propagates errors through every caller; the actor layer, RPC handlers, p2p handlers, fork-choice logic, and the binary entrypoint. Panicking on a recoverable storage error is unsafe in a long-running consensus node; this change ensures errors are surfaced and handled deliberately.What Changed
crates/storage/src/store.rs:from_anchor_state/get_forkchoice_store/init_store → Result<Self, Error>;set_time,set_safe_target,insert_state, write_signed_block,prune_live_chain,prune_old_states,prune_old_blocksall returnResult;head_slot/safe_target_slotuse.ok_or_else() instead of.unwrap()crates/storage/src/lib.rs: re-exportsErrorcrates/blockchain/src/store.rs: addedStoreError::Storage(#[source] ethlambda_storage::Error)variant +From impl; all storage calls use?or two-step?.ok_or(); affected functions updated to returnResultcrates/blockchain/src/lib.rs : actor layer uses letOk(...) else { return }for critical paths; non-critical reads useif let Ok(...);errors logged viainspect_errcrates/net/rpc/src/lib.rs:get_latest_finalized_state/get_latest_justified_statemap storage errors to 500, missing resources to 404crates/net/rpc/src/fork_choice.rs: localtry_store!macro short-circuits on storage error with 500; non-critical header reads use.ok().flatten()crates/net/p2p/src/req_resp/handlers.rs: block lookup usesif let Ok(Some(b));build_statususes.unwrap_or_default()for graceful degradationbin/ethlambda/src/checkpoint_sync.rs: addedStorage(#[from] ethlambda_storage::Error)toCheckpointSyncErrorbin/ethlambda/src/main.rs: removedOk(...)wrapper aroundfrom_anchor_state;usesmap_errfor error conversioncrates/blockchain/tests/forkchoice_spectests.rs/signature_spectests.rs: all store calls updated to handleResult(.unwrap()in test context).Correctness / Behavior Guarantees
.filter_maprather than aborting the entire pruneTests Added / Run
Related Issues / PRs
.expect(), propagate Result through storage layer #306