Skip to content

Add timing logs for block processing and XMSS signature verification#225

Open
pablodeymo wants to merge 2 commits intomainfrom
feat/log-signature-verification-timing
Open

Add timing logs for block processing and XMSS signature verification#225
pablodeymo wants to merge 2 commits intomainfrom
feat/log-signature-verification-timing

Conversation

@pablodeymo
Copy link
Collaborator

@pablodeymo pablodeymo commented Mar 13, 2026

Summary

  • Instruments on_block_core() with per-phase timing breakdown of full block processing
  • Instruments verify_signatures() with per-phase timing breakdown of XMSS signature verification
  • Helps diagnose Check lean_fork_choice_block_processing_time_seconds metric #197 where lean_fork_choice_block_processing_time_seconds shows constant ~1s on devnet dashboards

Context

Issue #197 reports that block processing time on the devnet Grafana dashboard is always approximately 1 second. Investigation revealed that:

  1. The lean_fork_choice_block_processing_time_seconds histogram has a bucket gap between 0.1s and 1.0s, which could cause Grafana's histogram_quantile to interpolate misleadingly
  2. The actual bottleneck is likely in XMSS signature verification (post-quantum crypto), but there's no visibility into how long each phase takes

This PR adds structured timing logs at two levels to pinpoint exactly where time is spent, without changing any behavior.

Example output

Each block produces two log lines — first the signature detail, then the block summary:

2026-03-13T15:30:42.123456Z  INFO ethlambda_blockchain::store: Signature verification timing slot=42 attestation_count=3 aggregated_sigs_ms=450 proposer_sig_ms=120 total_ms=572
2026-03-13T15:30:42.140123Z  INFO ethlambda_blockchain::store: Processed new block slot=42 block_root=0xabcd1234 state_root=0xef567890 sig_verification_ms=572 state_transition_ms=15 block_total_ms=590

In this example you can see:

  • Signature verification dominatessig_verification_ms=572 out of block_total_ms=590
  • Aggregated sigs are the bottleneckaggregated_sigs_ms=450 vs proposer_sig_ms=120
  • State transition is faststate_transition_ms=15
  • Minimal overhead590 - 572 - 15 = 3ms for storage, fork choice, etc.

Logged fields

Block processing breakdown (on_block_core)

Field Description
sig_verification_ms Time spent in verify_signatures() (0 when verification is skipped)
state_transition_ms Time spent in state_transition() (process_slots + process_block)
block_total_ms Wall-clock time for the entire on_block_core() function

Signature verification breakdown (verify_signatures)

Field Description
attestation_count Number of aggregated attestation signatures verified
aggregated_sigs_ms Total time verifying all aggregated attestation proofs (leanVM calls)
proposer_sig_ms Time verifying the proposer's individual XMSS signature
total_ms Wall-clock time for the entire verify_signatures() function

How to use

# Full block processing breakdown
grep "Processed new block" node.log

# Signature verification detail
grep "Signature verification timing" node.log

# Key questions these logs answer:
# - Is sig verification the bottleneck? (sig_verification_ms ≈ block_total_ms)
# - Is state transition the bottleneck? (state_transition_ms >> sig_verification_ms)
# - Within sigs, is it aggregated or proposer? (aggregated_sigs_ms vs proposer_sig_ms)
# - Is there overhead outside both phases? (block_total_ms >> sig + state_transition)

Test plan

  • cargo check -p ethlambda-blockchain passes
  • cargo fmt --all -- --check passes
  • cargo clippy -p ethlambda-blockchain -- -D warnings passes
  • Deploy to devnet and observe log output during block production
  • Compare timing fields to identify the dominant bottleneck

Closes #197 (diagnostic step — may need follow-up based on findings)

…rocessing latency

Instruments verify_signatures() with std::time::Instant to log per-phase
breakdown: aggregated attestation signatures, proposer signature, and total
elapsed time in milliseconds. This helps diagnose issue #197 where
lean_fork_choice_block_processing_time_seconds shows constant ~1s on devnet.
@github-actions
Copy link

🤖 Kimi Code Review

Review of PR #225

Issues Found

  1. Performance regression in signature verification (lines 1162-1237)

    • Adding timing measurements inside verify_signatures introduces overhead in a critical path. This function is called for every block during sync and consensus, and the timing calls (Instant::now(), elapsed(), and info!) add measurable latency.
    • Suggestion: Gate timing behind a compile-time flag or use a tracing span instead of manual timing.
  2. Logging level inconsistency (line 1238-1247)

    • Using info! for timing data will spam logs during sync. This should be debug! or trace! at most.
    • Line 1238: Change info! to debug! or make it conditional.
  3. Integer truncation risk (lines 1242-1244)

    • as_millis() as u64 can silently truncate on 32-bit systems where as_millis() returns u128. While unlikely to be an issue in practice, it's better to use saturating conversion or keep as u128.
    • Lines 1242-1244: Consider using .saturating_cast() or handle potential overflow.
  4. Missing feature gate

    • The timing code should be behind a #[cfg(feature = "timing")] or similar to avoid overhead in production builds.

Minor Issues

  • Line 1178: The timing starts after some initial processing, making the "total" measurement slightly inaccurate. Move total_start to the very beginning of the function.

Recommendations

// Instead of manual timing, consider:
#[cfg(feature = "detailed-timing")]
{
    let _span = tracing::span!(
        tracing::Level::DEBUG,
        "signature_verification",
        slot = block.slot,
        attestation_count = attestations.len()
    );
    let _enter = _span.enter();
}

This PR introduces non-trivial overhead in a critical consensus function. The timing data might be useful for debugging, but should not be enabled by default in production builds.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

  1. Performance / operational risk (medium): the new unconditional info! in the block validation hot path can become a throughput bottleneck and log-amplification vector under high block/attestation load. In consensus clients, synchronous/structured logging at info per block is usually too expensive unless explicitly enabled.
    Ref: crates/blockchain/src/store.rs:1244

  2. Redundant instrumentation (low): aggregated signature verification is already timed via metrics (metrics::time_pq_sig_aggregated_signatures_verification()), and this adds overlapping wall-clock timers around the same section. That is not incorrect, but it adds extra overhead and noise without clear additional value unless you need per-block correlated timings.
    Refs: crates/blockchain/src/store.rs:1203, crates/blockchain/src/store.rs:1178

No correctness bugs or consensus-logic regressions are apparent in this PR (fork choice, attestation validity, justification/finalization, and state transition behavior are unchanged by this diff). I’d keep the change if logging is moved to debug!/trace!, gated, or sampled.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Here is my review of PR 225:


PR 225: Add timing logs to XMSS signature verification

Overall: The change is minimal, non-behavioral, and achieves its stated diagnostic goal. The implementation is straightforward and correct. A few minor points worth addressing:


Correctness

as_millis() as u64 truncation (store.rs:1247-1249): as_millis() returns a u128. Casting to u64 truncates silently on overflow. For XMSS operations this is unlikely to matter in practice (u64 max is ~584 million years in ms), but the cast is technically lossy. Since the info! macro can accept u128 directly, the cast is unnecessary — prefer passing aggregated_elapsed.as_millis() without the cast, or use u64::try_from(...).unwrap_or(u64::MAX) if you want to be explicit.

total_elapsed includes non-crypto overhead (store.rs:1162 vs store.rs:1243): total_start is captured before the early-return length check and public key setup, so total_ms will always be slightly larger than aggregated_sigs_ms + proposer_sig_ms. The gap includes pubkey decoding, SSZ hashing, and iteration. This is intentional per the PR description, but it would be worth noting in a code comment so future readers don't wonder why the numbers don't add up.


Performance

The log itself (the info! call) occurs whether or not the log level is enabled. In tracing, field expressions are still evaluated even when the subscriber filters out the level. However, since all fields here are already-computed Duration conversions (cheap arithmetic), there's no meaningful cost. No concern.


Logging style (project convention)

Per the codebase's logging conventions, the field order should follow temporal → identity → identifiers → context → metadata. The current order (slot, attestation_count, aggregated_sigs_ms, proposer_sig_ms, total_ms) is fine — slot leads, timing fields trail. The % Display sigil is absent on slot and attestation_count, which is correct since they're integer types that implement Value directly.

One minor convention note: the project uses shorthand %slot for display-formatted fields (e.g., info!(%slot, ...) in other log sites). However, slot = block.slot is needed here since the local binding is block.slot, not a local slot. This is fine.


Diagnostic coverage gap

The timing currently covers:

  • Aggregated attestation proofs (leanVM loop)
  • Proposer XMSS signature

But it does not time the intermediate steps between aggregated_elapsed and proposer_start: pubkey decoding (ValidatorSignature::from_bytes, get_pubkey), and tree_hash_root(). If the bottleneck turns out to be elsewhere, a follow-up will be needed. This is acceptable for a first-pass diagnostic but worth noting.


Summary

The PR is safe to merge as-is. The main actionable suggestion is to drop the redundant as u64 casts on as_millis() — it avoids a technically lossy conversion for no benefit. Everything else is cosmetic or informational.


Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR adds structured timing instrumentation to verify_signatures() in crates/blockchain/src/store.rs as a diagnostic step for issue #197 (constant ~1s block processing time on devnet). It introduces three std::time::Instant timers to measure the aggregated attestation loop (aggregated_sigs_ms), the proposer XMSS signature check (proposer_sig_ms), and the total wall-clock duration (total_ms), emitting a single tracing::info! log line per successful block verification. No behavior is changed.

Key observations:

  • The proposer_start timer is placed after ValidatorSignature::from_bytes and proposer.get_pubkey(), meaning those deserialization steps are captured in total_ms but not in proposer_sig_ms. This creates a silent accounting gap that may be confusing during log analysis — moving proposer_start earlier (before the from_bytes call) would make the three fields fully self-consistent.
  • The log is only emitted on the happy path; error returns exit silently without timing output, which is reasonable.
  • The as_millis() as u64 cast is safe for any realistic duration.

Confidence Score: 5/5

  • This PR is safe to merge — it adds read-only logging with no behavioral changes.
  • The change is purely additive: three Instant timers and one info! log line on the success path. No logic is altered, no error paths are modified, and the existing Prometheus metrics are untouched. The one identified issue (timer placed after deserialization) is a diagnostic accuracy concern, not a correctness or safety problem.
  • No files require special attention beyond the minor timer placement noted in crates/blockchain/src/store.rs.

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Adds structured tracing::info! timing instrumentation to verify_signatures() with Instant timers for the aggregated-attestation loop, proposer signature check, and total wall-clock duration. The only issue is that proposer_start is placed after ValidatorSignature::from_bytes and get_pubkey(), leaving those deserialisation calls unaccounted in the logged proposer_sig_ms field while still contributing to total_ms.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[verify_signatures called] --> B[total_start = Instant::now]
    B --> C{attestations.len == sigs.len?}
    C -- No --> ERR1[return Err AttestationSignatureMismatch]
    C -- Yes --> D[aggregated_start = Instant::now]
    D --> E[for each attestation + proof]
    E --> F{participants match?}
    F -- No --> ERR2[return Err ParticipantsMismatch]
    F -- Yes --> G[verify_aggregated_signature via leanVM]
    G --> H{Ok?}
    H -- No --> ERR3[return Err AggregateVerificationFailed]
    H -- Yes --> E
    E -- loop done --> I[aggregated_elapsed = aggregated_start.elapsed]
    I --> J[ValidatorSignature::from_bytes\nproposer.get_pubkey\ntree_hash_root\n⚠️ NOT timed in proposer_sig_ms]
    J --> K[proposer_start = Instant::now]
    K --> L[proposer_signature.is_valid]
    L --> M{valid?}
    M -- No --> ERR4[return Err ProposerSignatureVerificationFailed]
    M -- Yes --> N[proposer_elapsed = proposer_start.elapsed]
    N --> O[total_elapsed = total_start.elapsed]
    O --> P[info! slot aggregated_sigs_ms proposer_sig_ms total_ms]
    P --> Q[return Ok]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1237

Comment:
**Proposer timer starts after deserialization, creating an unaccounted gap**

`proposer_start` is created after the proposer signature bytes have been deserialized and the public key has been decoded. Both of those calls process cryptographic material and their cost is included in `total_ms` but not in `proposer_sig_ms`. This means `total_ms - (aggregated_sigs_ms + proposer_sig_ms)` will be non-zero and could be confusing when reading the logs to diagnose the timing bottleneck.

Since the goal of this PR is to account for all time inside `verify_signatures()`, consider moving `proposer_start` to just before the `ValidatorSignature::from_bytes` call on line 1218. That way the logged fields are fully self-consistent: `aggregated_sigs_ms + proposer_sig_ms` would account for nearly all of `total_ms`.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: cd25c30

.expect("slot exceeds u32");
let message = proposer_attestation.data.tree_hash_root();

let proposer_start = std::time::Instant::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposer timer starts after deserialization, creating an unaccounted gap

proposer_start is created after the proposer signature bytes have been deserialized and the public key has been decoded. Both of those calls process cryptographic material and their cost is included in total_ms but not in proposer_sig_ms. This means total_ms - (aggregated_sigs_ms + proposer_sig_ms) will be non-zero and could be confusing when reading the logs to diagnose the timing bottleneck.

Since the goal of this PR is to account for all time inside verify_signatures(), consider moving proposer_start to just before the ValidatorSignature::from_bytes call on line 1218. That way the logged fields are fully self-consistent: aggregated_sigs_ms + proposer_sig_ms would account for nearly all of total_ms.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1237

Comment:
**Proposer timer starts after deserialization, creating an unaccounted gap**

`proposer_start` is created after the proposer signature bytes have been deserialized and the public key has been decoded. Both of those calls process cryptographic material and their cost is included in `total_ms` but not in `proposer_sig_ms`. This means `total_ms - (aggregated_sigs_ms + proposer_sig_ms)` will be non-zero and could be confusing when reading the logs to diagnose the timing bottleneck.

Since the goal of this PR is to account for all time inside `verify_signatures()`, consider moving `proposer_start` to just before the `ValidatorSignature::from_bytes` call on line 1218. That way the logged fields are fully self-consistent: `aggregated_sigs_ms + proposer_sig_ms` would account for nearly all of `total_ms`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Collaborator

@MegaRedHand MegaRedHand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's add for block processing too

Adds sig_verification_ms, state_transition_ms, and block_total_ms fields
to the existing "Processed new block" log line. This allows comparing
the overall block processing time against the signature verification
breakdown from the previous commit.
@pablodeymo pablodeymo changed the title Add timing logs to XMSS signature verification Add timing logs for block processing and XMSS signature verification Mar 13, 2026
Comment on lines +658 to +660
sig_verification_ms,
state_transition_ms,
block_total_ms,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the Debug impl of Duration for prettier logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check lean_fork_choice_block_processing_time_seconds metric

2 participants