feat: make Rocket reloadable via on-chain restart trigger#267
feat: make Rocket reloadable via on-chain restart trigger#267
Conversation
Add ServerTriggered event and serverTrigger() function to APIConfigFacet so the diamond owner can signal restarts on-chain. Introduce restart.rs event listener that polls for ServerTriggered events and sends a restart signal via mpsc channel. Refactor main() into a restart loop that rebuilds Rocket on signal while preserving long-lived state (signer pool, IPFS cache, CPU monitor). Includes restart-loop protection (max 3 restarts per 60s).
…rocket-reloadable # Conflicts: # lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/APIConfigFacet.sol # lit-api-server/src/accounts/contracts/AccountConfig.json # lit-api-server/src/accounts/contracts/account_config_contract.rs
- Fix off-by-one in restart loop protection (> to >=, now correctly limits to MAX_RESTARTS within the window) - Use try_send instead of send().await so the event listener is never blocked waiting for the main loop to consume a previous restart signal - Propagate error instead of unwrap_or(0) when fetching initial block number to avoid scanning from genesis on RPC failure - Add retry with exponential backoff for listener startup failures - Use saturating_add for block number arithmetic - Simplify restart_count from Arc<AtomicU64> to plain u64 (single-threaded use)
There was a problem hiding this comment.
Pull request overview
Adds an on-chain “restart trigger” mechanism to the AccountConfig diamond and wires a new Rust background listener into lit-api-server so the Rocket server can be cleanly restarted in-process when the contract owner emits a ServerTriggered event.
Changes:
- Added
serverTrigger(uint256)(owner-only) +ServerTriggeredevent to the diamond, plus storage for the trigger value. - Introduced
lit-api-serverrestart listener (restart.rs) that polls forServerTriggeredand signalsmainvia an mpsc channel. - Refactored
maininto a restart-aware loop and preserved long-lived state across restarts (signer pool, chain config, CPU monitor, IPFS cache).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
lit-api-server/src/restart.rs |
New polling listener for on-chain restart events; sends restart signals to main loop. |
lit-api-server/src/main.rs |
Restart-aware launch loop; extracts build_rocket() and adds loop protection. |
lit-api-server/src/lib.rs |
Exposes the new restart module. |
lit-api-server/src/core/v1/guards/cpu_overload.rs |
Makes CpuOverloadMonitor clonable so it can be shared across restarts. |
lit-api-server/src/accounts/contracts/AccountConfig.json |
Regenerated ABI including ServerTriggered + serverTrigger. |
lit-api-server/src/accounts/contracts/account_config_contract.rs |
Regenerated Rust bindings to match updated ABI (event/function/error). |
lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/AppStorage.sol |
Adds serverTriggerValue to diamond storage. |
lit-api-server/blockchain/lit_node_express/contracts/AccountConfigFacets/APIConfigFacet.sol |
Adds owner-only serverTrigger() and emits ServerTriggered. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| //! listener detects the event and sends a restart signal to the main loop | ||
| //! via [`RestartHandle`]. | ||
|
|
||
| use crate::accounts::signable_contract::{Middleware, get_read_only_account_config_contract}; |
There was a problem hiding this comment.
Middleware is imported but never used in this module. Dropping the unused import avoids warnings (and potential CI failures if warnings are denied).
| use crate::accounts::signable_contract::{Middleware, get_read_only_account_config_contract}; | |
| use crate::accounts::signable_contract::get_read_only_account_config_contract; |
lit-api-server/src/restart.rs
Outdated
| /// Send a restart signal. Returns `true` if the signal was sent. | ||
| /// Uses `try_send` so the listener is never blocked waiting for the | ||
| /// main loop to consume a previous signal. | ||
| pub fn trigger(&self) -> bool { | ||
| match self.tx.try_send(()) { | ||
| Ok(()) => true, | ||
| Err(mpsc::error::TrySendError::Full(_)) => { | ||
| // A restart is already queued; no need to queue another. | ||
| true | ||
| } | ||
| Err(mpsc::error::TrySendError::Closed(_)) => false, |
There was a problem hiding this comment.
RestartHandle::trigger() is documented as “Returns true if the signal was sent”, but the Full(_) branch also returns true even though no new signal is enqueued. Either adjust the doc to reflect “sent or already queued” semantics, or return false on Full so callers can distinguish the cases.
| let rocket = match r.ignite().await { | ||
| Ok(rocket) => rocket, | ||
| Err(e) => { | ||
| tracing::error!("Failed to ignite Rocket: {e}. Exiting restart loop."); | ||
| break; | ||
| } | ||
| }; | ||
| let shutdown = rocket.shutdown(); | ||
| let mut server_handle = tokio::spawn(rocket.launch()); | ||
|
|
||
| tokio::select! { | ||
| res = &mut server_handle => { | ||
| match res { | ||
| Ok(Ok(_)) => tracing::info!("Server exited cleanly."), | ||
| Ok(Err(e)) => tracing::error!("Server exited with error: {e}"), | ||
| Err(e) => tracing::error!("Server task panicked: {e}"), | ||
| } | ||
| break; | ||
| } |
There was a problem hiding this comment.
The restart loop logs Rocket ignite/launch errors and then breaks, but main() always returns Ok(()) afterward. This causes the process to exit with status 0 even on startup failure, Rocket runtime errors, or a panic in the launch task, which can defeat supervisor/Kubernetes restart logic. Consider propagating failures (return Err) and using a non-zero exit when the server stops unexpectedly.
lit-api-server/src/main.rs
Outdated
| use tokio::sync::mpsc; | ||
|
|
||
| /// Maximum number of restarts allowed within `RESTART_WINDOW`. | ||
| /// If exceeded, the process exits to avoid an infinite restart loop. |
There was a problem hiding this comment.
The restart-loop protection comment says “Maximum number of restarts allowed… If exceeded…”, but the guard triggers when restarts_in_window >= MAX_RESTARTS. Either update the wording to match the current behavior (trip on the 3rd signal when MAX_RESTARTS=3), or change the condition so it only trips when the limit is exceeded.
| /// If exceeded, the process exits to avoid an infinite restart loop. | |
| /// Once this limit is reached, the process exits to avoid an infinite restart loop. |
lit-api-server/src/main.rs
Outdated
| "Restart signal received. Shutting down current instance..." | ||
| ); | ||
|
|
||
| if restart_timestamps.len() as u64 >= MAX_RESTARTS { |
There was a problem hiding this comment.
Off-by-one error in restart protection check. The condition >= MAX_RESTARTS exits on the 3rd restart signal, allowing only 2 actual restarts instead of 3.
With MAX_RESTARTS = 3 and the current >= check:
- 1st signal: len=1, 1>=3 false → restart happens
- 2nd signal: len=2, 2>=3 false → restart happens
- 3rd signal: len=3, 3>=3 true → exits WITHOUT restarting
This contradicts the comment on line 23 which states "Maximum number of restarts allowed" should be 3.
if restart_timestamps.len() as u64 > MAX_RESTARTS {This allows exactly 3 restarts (signals 1-3 trigger restarts, signal 4 triggers exit).
| if restart_timestamps.len() as u64 >= MAX_RESTARTS { | |
| if restart_timestamps.len() as u64 > MAX_RESTARTS { |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
- Fix trigger() doc to reflect "sent or already queued" semantics (Copilot) - Keep Middleware import (needed as trait for get_block_number) (Copilot) - Propagate server errors from restart loop instead of always Ok(()) (Copilot) - Fix MAX_RESTARTS comment to match >= behavior (Copilot) - Revert >= back to > so exactly MAX_RESTARTS restarts are allowed (Graphite)
Summary
On-chain restart trigger: Adds a
serverTrigger(uint256)function toAPIConfigFacet(owner-only viaLibDiamond.enforceIsContractOwner()) that emits aServerTriggeredevent. A new Rust listener (restart.rs) polls for these events and signals the main loop to restart Rocket.Restart loop: Refactors
main()into a restart-aware loop that rebuilds Rocket on signal while preserving long-lived state (signer pool, chain config, CPU monitor, IPFS cache). Includes restart-loop protection (max 3 restarts per 60s window).Reliability hardening (from pre-landing review):
try_sendso the listener is never blocked by the main loopFiles changed
APIConfigFacet.sol— newserverTrigger()function +ServerTriggeredeventAppStorage.sol— newserverTriggerValuestorage slotAccountConfig.json/account_config_contract.rs— regenerated bindingsrestart.rs— new event listener modulemain.rs— restart loop, extractedbuild_rocket()functioncpu_overload.rs— deriveClonefor sharing across restart iterationslib.rs— exposerestartmodulePre-Landing Review
No critical issues. Adversarial review findings addressed in second commit.
Test plan
cargo checkpasses (rustc 1.91)serverTrigger(1)from owner, verify server restartsCo-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
🤖 Generated with Claude Code