Skip to content

feat: accept multiple URLs for checkpoint sync#374

Open
Aliemeka wants to merge 3 commits into
lambdaclass:mainfrom
Aliemeka:feat/accept-multiple-urls-for-checkpoint-sync-i
Open

feat: accept multiple URLs for checkpoint sync#374
Aliemeka wants to merge 3 commits into
lambdaclass:mainfrom
Aliemeka:feat/accept-multiple-urls-for-checkpoint-sync-i

Conversation

@Aliemeka
Copy link
Copy Markdown
Contributor

🗒️ Description / Motivation

Adds redundancy to checkpoint sync by letting operators supply more than one peer URL via --checkpoint-sync-url. The flag now accepts either a single URL (existing behavior), a comma-separated list, or multiple repeated occurrences. URLs are tried sequentially: the first peer that successfully serves a valid anchor wins, and we only fail startup if every URL fails. It is backward compatible. PR closes #111

What Changed

  • bin/ethlambda/src/main.rs
    • --checkpoint-sync-url is now Option<Vec<String>> withvalue_delimiter = ',', so u1,u2 and repeated --checkpoint-sync-url flags both populate the list. A single URL still works unchanged (backwards-compatible).
    • Extracted try_checkpoint_url, which wraps the existing per-URLAnchorPairingMismatch retry (3 attempts, 1s backoff). No behavior change for the single-URL case.
    • fetch_initial_state now takes &[String], iterates URLs in order, logs a warn! with the URL and underlying error on each failure, and moves on to the next. Returns the last error only if every URL fails.
    • Updated CLI help and function doc comments to describe the failover semantics.

Correctness / Behavior Guarantees

  • Empty URL list ⇒ genesis init, identical to before.
  • Single URL ⇒ same retry/error behavior as before (the per-URL retry was moved into try_checkpoint_url, not changed).
  • Multi-URL ⇒ first-success-wins failover. Operators can see which peers failed via the warn logs; the surfaced error on total failure is the last URL's error.
  • No changes to the anchor verification rules in checkpoint_sync.rs — every peer's anchor is still validated against the local genesis config and internal state/block consistency before it is accepted.

Tests Added / Run

  • No new unit tests — all multi-URL logic lives in main.rs orchestration code that drives the already-tested fetch_finalized_anchor / verify_checkpoint_state paths. Existing checkpoint_sync tests (verification + URL normalization) still pass.
  • Commands run:
    • cargo clippy -p ethlambda --all-targets -- -D warnings
    • cargo test --workspace --release

Related Issues / PRs

✅ Verification Checklist

  • Ran make fmt — clean
  • Ran make lint (clippy with -D warnings) — clean
  • Ran cargo test --workspace --release — all passing

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR makes --checkpoint-sync-url accept multiple URLs (comma-separated or repeated flags) and tries them sequentially, falling back to the next peer on failure and only aborting startup if every URL fails. The existing per-URL retry logic for AnchorPairingMismatch is cleanly extracted into try_checkpoint_url with no behavior change for the single-URL case.

  • The failover loop is logically correct and backward-compatible, but the warning message \"trying next URL\" fires unconditionally after every failure — including the last URL in the list — which misleads operators right before a startup abort.
  • The info! at startup logs all URLs via their full debug representation; any embedded authentication material in URLs would appear in plaintext in the log stream.

Confidence Score: 4/5

Safe to merge with minor log message improvements recommended.

The failover logic is correct and backward-compatible. Two non-blocking issues exist: a warning message that says "trying next URL" even when there is no next URL to try, and a startup log that prints full URL strings which could expose embedded credentials in environments that use auth-bearing URLs.

bin/ethlambda/src/main.rs — the warning log wording and the startup info log that prints all URLs verbatim.

Security Review

  • Credential exposure in logs (bin/ethlambda/src/main.rs, line 636): The startup info! log prints all checkpoint-sync URLs using Debug formatting. If any URL contains embedded authentication material (basic-auth credentials or token query parameters), they will be written to the log stream in plaintext.

Important Files Changed

Filename Overview
bin/ethlambda/src/main.rs Extends --checkpoint-sync-url to accept a Vec of URLs with comma-delimiter support; adds try_checkpoint_url helper for per-URL retry; fetch_initial_state now iterates the slice and fails over sequentially. Two minor issues: the "trying next URL" warning fires even when no next URL exists, and the startup info log prints all URLs verbatim which could expose embedded credentials.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
bin/ethlambda/src/main.rs:639-658
**Misleading "trying next URL" on last/only failure**

The warning message `"Checkpoint sync failed for this peer; trying next URL"` is emitted unconditionally after every failed attempt, including the final URL in the list (and the only URL in the single-URL case). When startup is about to abort with no more peers to try, operators will see the log say "trying next URL" right before the process exits, which is confusing. Consider checking whether there are remaining URLs before deciding which message to emit — or drop the "trying next URL" suffix and log the total count of remaining peers instead.

### Issue 2 of 2
bin/ethlambda/src/main.rs:636
**Potential credential exposure in startup log**

`info!(urls = ?checkpoint_urls, ...)` prints all URLs via their `Debug` representation, which includes the full URL string. If any URL contains embedded authentication material (basic-auth credentials or token query parameters), they will appear in plaintext in the log stream. Consider logging only the count of URLs, or stripping auth components before logging.

Reviews (1): Last reviewed commit: "feat: accept multiple URLs for checkpoin..." | Re-trigger Greptile

Comment thread bin/ethlambda/src/main.rs
Comment on lines +639 to +658
if let Err(err) = &result {
warn!(
url = %first_url,
%err,
"Checkpoint sync failed for this peer; trying next URL"
);
}

let mut attempt = 1;
let (state, signed_block) = loop {
match checkpoint_sync::fetch_finalized_anchor(
checkpoint_url,
genesis.genesis_time,
&validators,
)
.await
{
Ok(pair) => break pair,
Err(checkpoint_sync::CheckpointSyncError::AnchorPairingMismatch)
if attempt < MAX_ANCHOR_FETCH_ATTEMPTS =>
{
warn!(
attempt,
max = MAX_ANCHOR_FETCH_ATTEMPTS,
"Anchor state and block disagree (peer likely advanced finalization mid-fetch); retrying"
);
tokio::time::sleep(ANCHOR_FETCH_RETRY_DELAY).await;
attempt += 1;
}
Err(err) => return Err(err),
for url in rest_urls {
if result.is_ok() {
break;
}
};
result = try_checkpoint_url(url, genesis.genesis_time, &validators).await;
if let Err(err) = &result {
warn!(
%url,
%err,
"Checkpoint sync failed for this peer; trying next URL"
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading "trying next URL" on last/only failure

The warning message "Checkpoint sync failed for this peer; trying next URL" is emitted unconditionally after every failed attempt, including the final URL in the list (and the only URL in the single-URL case). When startup is about to abort with no more peers to try, operators will see the log say "trying next URL" right before the process exits, which is confusing. Consider checking whether there are remaining URLs before deciding which message to emit — or drop the "trying next URL" suffix and log the total count of remaining peers instead.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 639-658

Comment:
**Misleading "trying next URL" on last/only failure**

The warning message `"Checkpoint sync failed for this peer; trying next URL"` is emitted unconditionally after every failed attempt, including the final URL in the list (and the only URL in the single-URL case). When startup is about to abort with no more peers to try, operators will see the log say "trying next URL" right before the process exits, which is confusing. Consider checking whether there are remaining URLs before deciding which message to emit — or drop the "trying next URL" suffix and log the total count of remaining peers instead.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread bin/ethlambda/src/main.rs Outdated
// Checkpoint sync path
info!(%checkpoint_url, "Starting checkpoint sync");
// Checkpoint sync path: try URLs in order, fail over to the next on error.
info!(urls = ?checkpoint_urls, "Starting checkpoint sync");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 security Potential credential exposure in startup log

info!(urls = ?checkpoint_urls, ...) prints all URLs via their Debug representation, which includes the full URL string. If any URL contains embedded authentication material (basic-auth credentials or token query parameters), they will appear in plaintext in the log stream. Consider logging only the count of URLs, or stripping auth components before logging.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 636

Comment:
**Potential credential exposure in startup log**

`info!(urls = ?checkpoint_urls, ...)` prints all URLs via their `Debug` representation, which includes the full URL string. If any URL contains embedded authentication material (basic-auth credentials or token query parameters), they will appear in plaintext in the log stream. Consider logging only the count of URLs, or stripping auth components before logging.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accept multiple URLs for checkpoint sync

1 participant