Skip to content

feat: make Rocket reloadable via on-chain restart trigger#267

Open
GTC6244 wants to merge 5 commits intomainfrom
feature/cpl-143-make-rocket-reloadable
Open

feat: make Rocket reloadable via on-chain restart trigger#267
GTC6244 wants to merge 5 commits intomainfrom
feature/cpl-143-make-rocket-reloadable

Conversation

@GTC6244
Copy link
Copy Markdown
Contributor

@GTC6244 GTC6244 commented Apr 3, 2026

Summary

On-chain restart trigger: Adds a serverTrigger(uint256) function to APIConfigFacet (owner-only via LibDiamond.enforceIsContractOwner()) that emits a ServerTriggered event. A new Rust listener (restart.rs) polls for these events and signals the main loop to restart Rocket.

Restart loop: Refactors main() into a restart-aware loop that rebuilds Rocket on signal while preserving long-lived state (signer pool, chain config, CPU monitor, IPFS cache). Includes restart-loop protection (max 3 restarts per 60s window).

Reliability hardening (from pre-landing review):

  • Listener retries with exponential backoff on startup failure (up to 5 attempts)
  • Propagates error on initial block fetch instead of silently falling back to block 0
  • Uses try_send so the listener is never blocked by the main loop
  • Saturating arithmetic for block number operations
  • Correct off-by-one fix in restart loop protection gate

Files changed

  • APIConfigFacet.sol — new serverTrigger() function + ServerTriggered event
  • AppStorage.sol — new serverTriggerValue storage slot
  • AccountConfig.json / account_config_contract.rs — regenerated bindings
  • restart.rs — new event listener module
  • main.rs — restart loop, extracted build_rocket() function
  • cpu_overload.rs — derive Clone for sharing across restart iterations
  • lib.rs — expose restart module

Pre-Landing Review

No critical issues. Adversarial review findings addressed in second commit.

Test plan

  • cargo check passes (rustc 1.91)
  • Hardhat compiles all 22 Solidity files
  • Contract bindings regenerated cleanly (82 ABI entries)
  • Manual: deploy to testnet, call serverTrigger(1) from owner, verify server restarts
  • Manual: verify restart loop protection trips after 3 rapid restarts

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

🤖 Generated with Claude Code

GTC6244 added 3 commits April 3, 2026 14:38
Add ServerTriggered event and serverTrigger() function to APIConfigFacet
so the diamond owner can signal restarts on-chain. Introduce restart.rs
event listener that polls for ServerTriggered events and sends a restart
signal via mpsc channel. Refactor main() into a restart loop that rebuilds
Rocket on signal while preserving long-lived state (signer pool, IPFS cache,
CPU monitor). Includes restart-loop protection (max 3 restarts per 60s).
…rocket-reloadable

# Conflicts:
#	lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/APIConfigFacet.sol
#	lit-api-server/src/accounts/contracts/AccountConfig.json
#	lit-api-server/src/accounts/contracts/account_config_contract.rs
- Fix off-by-one in restart loop protection (> to >=, now correctly
  limits to MAX_RESTARTS within the window)
- Use try_send instead of send().await so the event listener is never
  blocked waiting for the main loop to consume a previous restart signal
- Propagate error instead of unwrap_or(0) when fetching initial block
  number to avoid scanning from genesis on RPC failure
- Add retry with exponential backoff for listener startup failures
- Use saturating_add for block number arithmetic
- Simplify restart_count from Arc<AtomicU64> to plain u64 (single-threaded use)
@GTC6244 GTC6244 requested review from a team and Copilot April 3, 2026 18:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an on-chain “restart trigger” mechanism to the AccountConfig diamond and wires a new Rust background listener into lit-api-server so the Rocket server can be cleanly restarted in-process when the contract owner emits a ServerTriggered event.

Changes:

  • Added serverTrigger(uint256) (owner-only) + ServerTriggered event to the diamond, plus storage for the trigger value.
  • Introduced lit-api-server restart listener (restart.rs) that polls for ServerTriggered and signals main via an mpsc channel.
  • Refactored main into a restart-aware loop and preserved long-lived state across restarts (signer pool, chain config, CPU monitor, IPFS cache).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lit-api-server/src/restart.rs New polling listener for on-chain restart events; sends restart signals to main loop.
lit-api-server/src/main.rs Restart-aware launch loop; extracts build_rocket() and adds loop protection.
lit-api-server/src/lib.rs Exposes the new restart module.
lit-api-server/src/core/v1/guards/cpu_overload.rs Makes CpuOverloadMonitor clonable so it can be shared across restarts.
lit-api-server/src/accounts/contracts/AccountConfig.json Regenerated ABI including ServerTriggered + serverTrigger.
lit-api-server/src/accounts/contracts/account_config_contract.rs Regenerated Rust bindings to match updated ABI (event/function/error).
lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/AppStorage.sol Adds serverTriggerValue to diamond storage.
lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/APIConfigFacet.sol Adds owner-only serverTrigger() and emits ServerTriggered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

//! listener detects the event and sends a restart signal to the main loop
//! via [`RestartHandle`].

use crate::accounts::signable_contract::{Middleware, get_read_only_account_config_contract};
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Middleware is imported but never used in this module. Dropping the unused import avoids warnings (and potential CI failures if warnings are denied).

Suggested change
use crate::accounts::signable_contract::{Middleware, get_read_only_account_config_contract};
use crate::accounts::signable_contract::get_read_only_account_config_contract;

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +38
/// Send a restart signal. Returns `true` if the signal was sent.
/// Uses `try_send` so the listener is never blocked waiting for the
/// main loop to consume a previous signal.
pub fn trigger(&self) -> bool {
match self.tx.try_send(()) {
Ok(()) => true,
Err(mpsc::error::TrySendError::Full(_)) => {
// A restart is already queued; no need to queue another.
true
}
Err(mpsc::error::TrySendError::Closed(_)) => false,
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RestartHandle::trigger() is documented as “Returns true if the signal was sent”, but the Full(_) branch also returns true even though no new signal is enqueued. Either adjust the doc to reflect “sent or already queued” semantics, or return false on Full so callers can distinguish the cases.

Copilot uses AI. Check for mistakes.
Comment on lines +184 to +202
let rocket = match r.ignite().await {
Ok(rocket) => rocket,
Err(e) => {
tracing::error!("Failed to ignite Rocket: {e}. Exiting restart loop.");
break;
}
};
let shutdown = rocket.shutdown();
let mut server_handle = tokio::spawn(rocket.launch());

tokio::select! {
res = &mut server_handle => {
match res {
Ok(Ok(_)) => tracing::info!("Server exited cleanly."),
Ok(Err(e)) => tracing::error!("Server exited with error: {e}"),
Err(e) => tracing::error!("Server task panicked: {e}"),
}
break;
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restart loop logs Rocket ignite/launch errors and then breaks, but main() always returns Ok(()) afterward. This causes the process to exit with status 0 even on startup failure, Rocket runtime errors, or a panic in the launch task, which can defeat supervisor/Kubernetes restart logic. Consider propagating failures (return Err) and using a non-zero exit when the server stops unexpectedly.

Copilot uses AI. Check for mistakes.
use tokio::sync::mpsc;

/// Maximum number of restarts allowed within `RESTART_WINDOW`.
/// If exceeded, the process exits to avoid an infinite restart loop.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restart-loop protection comment says “Maximum number of restarts allowed… If exceeded…”, but the guard triggers when restarts_in_window >= MAX_RESTARTS. Either update the wording to match the current behavior (trip on the 3rd signal when MAX_RESTARTS=3), or change the condition so it only trips when the limit is exceeded.

Suggested change
/// If exceeded, the process exits to avoid an infinite restart loop.
/// Once this limit is reached, the process exits to avoid an infinite restart loop.

Copilot uses AI. Check for mistakes.
"Restart signal received. Shutting down current instance..."
);

if restart_timestamps.len() as u64 >= MAX_RESTARTS {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one error in restart protection check. The condition >= MAX_RESTARTS exits on the 3rd restart signal, allowing only 2 actual restarts instead of 3.

With MAX_RESTARTS = 3 and the current >= check:

  • 1st signal: len=1, 1>=3 false → restart happens
  • 2nd signal: len=2, 2>=3 false → restart happens
  • 3rd signal: len=3, 3>=3 true → exits WITHOUT restarting

This contradicts the comment on line 23 which states "Maximum number of restarts allowed" should be 3.

if restart_timestamps.len() as u64 > MAX_RESTARTS {

This allows exactly 3 restarts (signals 1-3 trigger restarts, signal 4 triggers exit).

Suggested change
if restart_timestamps.len() as u64 >= MAX_RESTARTS {
if restart_timestamps.len() as u64 > MAX_RESTARTS {

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@GTC6244 GTC6244 self-assigned this Apr 3, 2026
- Fix trigger() doc to reflect "sent or already queued" semantics (Copilot)
- Keep Middleware import (needed as trait for get_block_number) (Copilot)
- Propagate server errors from restart loop instead of always Ok(()) (Copilot)
- Fix MAX_RESTARTS comment to match >= behavior (Copilot)
- Revert >= back to > so exactly MAX_RESTARTS restarts are allowed (Graphite)
@GTC6244 GTC6244 changed the base branch from main to next April 3, 2026 21:01
@GTC6244 GTC6244 changed the base branch from next to main April 3, 2026 21:52
@GTC6244 GTC6244 changed the base branch from main to next April 3, 2026 21:52
@GTC6244 GTC6244 changed the base branch from next to main April 3, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants