feat: F3 e2e lifecycle #1469

karlem · 2025-10-28T15:52:55Z

Note

High Risk
Introduces a new F3 proof-based parent finality path and changes on-chain light-client state layout; mistakes could break top-down finality progression or block execution on upgraded networks.

Overview
Adds an end-to-end F3 proof-based top-down finality flow alongside the existing legacy vote-based path. Node startup now chooses between legacy and F3 modes via new ipc.topdown.f3 settings, validates config vs genesis state, initializes a persistent proof cache, and (when enabled) runs a background proof generator service; legacy resolver/voting/polling syncer setup is refactored into a dedicated service/topdown.rs.

Updates the on-chain f3-light-client actor to store only the latest finalized height/instance and a HAMT-backed power table root (with power_be as big-endian bytes), adds monotonicity checks for updates, and materializes the power table on GetState. Genesis-from-parent now fetches the F3 certificate to derive base_epoch and parses parent power as BigInt, and the interpreter gains shared EVM log decoding utilities plus bundle event extraction for top-down messages and validator power changes.

^{Written by Cursor Bugbot for commit 0e3593c. This will update automatically on new commits. Configure here.}

fendermint/vm/interpreter/src/fvm/topdown.rs

fendermint/actors/f3-light-client/src/state.rs

…eline

…t and execute logic

fendermint/actors/f3-light-client/src/state.rs

fendermint/actors/f3-light-client/src/lib.rs

fendermint/app/src/service/topdown.rs

fendermint/vm/interpreter/src/fvm/f3_topdown.rs

fendermint/vm/interpreter/src/fvm/topdown.rs

fendermint/vm/message/src/ipc.rs

fendermint/vm/topdown/proof-service/src/service.rs

sergefdrv · 2026-01-22T22:52:11Z

fendermint/vm/topdown/proof-service/src/service.rs

+            self.verifier
+                .verify_proof_bundle_with_tipsets(&proof_bundle, &finalized_tipsets)
+                .with_context(|| format!("Failed to verify proof for epoch {}", parent_epoch))?;


Apparently, there's no verification of continuity of top-down event nonces, yet.

Yes there is not. My understanding was you were suggesting to skip it for now. But I can add it here.

Maybe also skip the verification of proof bundles for now and tackle both in a separate PR? Or complete it in this PR. Up to you

Hmm. I have a strong desire to merge, but I also want to see if the proofs and everything is going to work. I might implement the check tomorrow.

Is it hard to check the nonces? If it's relatively easy then let's do it in this PR. Otherwise, it would appear that there's everything fully verified where as it's not quite, and we'd need to make sure we don't forget about that.

It is not hard, but the nonce need to be stored somewhere. That is it annoying bit. But I will do it in this PR.

fendermint/vm/interpreter/src/fvm/f3_topdown.rs

fendermint/vm/message/src/ipc.rs

fendermint/actors/f3-light-client/src/lib.rs

fendermint/vm/interpreter/src/fvm/topdown.rs

fendermint/vm/topdown/proof-service/src/assembler.rs

fendermint/vm/topdown/proof-service/src/cache.rs

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-28T16:10:32Z

fendermint/app/src/service/topdown.rs

+    }
+    if f3_enabled_in_config && f3_state_in_genesis.is_none() {
+        bail!("F3 is enabled in config but initial F3 state is missing in genesis");
+    }


Fresh node with F3 config fails to start

High Severity

The F3 state validation check prevents a fresh node with F3 configuration from starting. start_topdown_if_enabled is called before App::new(), and for a fresh node, query_f3_state_in_genesis returns None because no database state exists yet. The check at line 121-123 then fails with "F3 is enabled in config but initial F3 state is missing in genesis". However, the F3 state is only created when genesis is applied during the InitChain ABCI call, which requires the node to start first. This creates a chicken-and-egg situation that blocks startup.

Additional Locations (1)

fendermint/app/src/service/topdown.rs#L140-L158

fendermint/actors/f3-light-client/src/types.rs

sergefdrv · 2026-01-28T20:56:37Z

fendermint/vm/evm-event-utils/src/lib.rs

+    if b.len() > 32 {
+        anyhow::bail!("expected <= 32 bytes, got {}", b.len());
+    }
+    if b.len() < 32 {
+        let mut padded = vec![0u8; 32 - b.len()];
+        padded.append(&mut b);
+        b = padded;
+    }


Is it even allowed to be not exactly 32 bytes?

sergefdrv · 2026-01-28T20:57:25Z

fendermint/vm/evm-event-utils/src/lib.rs

+        padded.append(&mut b);
+        b = padded;
+    }
+    let tail: [u8; 8] = b[24..32].try_into().expect("slice is 8 bytes");


Maybe need to check if the higher bits are all zero.

fendermint/vm/interpreter/src/fvm/topdown.rs

sergefdrv · 2026-01-28T22:06:25Z

fendermint/vm/interpreter/src/fvm/topdown.rs

+        // This path may be hit during catch-up for a node that did not have the local proof cache
+        // entry during attestation. In that case, wait for the cache to be filled by the proof-service.
+        let extracted = Self::extract_top_down_effects_retry_cache_miss(
+            &self.f3_execution_cache_retry,
+            f3,
+            &msg,
+        )
+        .await?;


I think this should be infallible.

Hmm. Not sure. I think that it should not fail because of cache miss, but it should probably fail if the data are not extractable?

But that would be fatal, no?

yes. and it si propagate as fatal

No, in the current code, it only causes apply_message to return with error, which CometBFT treats as ordinary transaction failure.

fendermint/vm/topdown/proof-service/src/assembler.rs

fendermint/vm/topdown/proof-service/src/cache.rs

sergefdrv · 2026-01-29T11:31:50Z

fendermint/actors/f3-light-client/src/types.rs

+/// - Latest Instance ID: The latest F3 instance that has been committed
+/// - Latest Finalized Height: The highest epoch that has been finalized
 /// - Power Table: Current validator power table (can change between instances)
 ///
 /// This state is extracted from F3 certificates received from the parent chain
 /// and stored by the actor for use in finality proofs.
 #[derive(Deserialize_tuple, Serialize_tuple, Debug, Clone, PartialEq, Eq)]
 pub struct LightClientState {
-    /// Current F3 instance ID
-    pub instance_id: u64,
-    /// Finalized chain - full list of finalized epochs
-    /// Matches ECChain from F3 certificates
-    /// Empty initially at genesis until first update
-    pub finalized_epochs: Vec<ChainEpoch>,
-    /// Current power table for this instance
-    /// Power table can change between instances
-    pub power_table: Vec<PowerEntry>,
+    /// Latest F3 instance ID that has been committed
+    pub latest_instance_id: u64,
+    /// The latest finalized height
+    pub latest_finalized_height: ChainEpoch,
+    /// Root CID of the on-chain power table (HAMT).
+    ///
+    /// The actual entries are stored in the actor's blockstore and reachable from this root.
+    pub power_table_root: Cid,
 }


I think the former comment to the F3 instance ID was more clear: in many regards, it is the current instance because the same cert can be used to justify and commit parent chain updates for multiple epochs; one can consider a cert "committed" only when the last epoch it certifies is also "committed". The epoch number (height) then signifies the latest accepted parent chain extension. (In principle, the latest finalized epoch is the last one in the cert). Maybe we should call those fields instance_id (or current_instance_id) and latest_height (the latest height for which the parent chain updates were applied)?

BTW, there seems to already be something very much like latest_finalized_height in the gateway contract, see commit_finality.

Makes sense.

sergefdrv · 2026-01-29T12:02:24Z

fendermint/vm/interpreter/src/fvm/topdown.rs

+        // Store validator changes in gateway
+        self.gateway_caller
+            .store_validator_changes(state, extracted.validator_changes)
+            .context("failed to store validator changes")?;
+
+        // Execute topdown messages
+        let ret = self
+            .execute_topdown_msgs(state, extracted.topdown_msgs)
+            .await
+            .context("failed to execute top down messages")?;
+
+        // Finalize F3 execution only after all effects were applied successfully.
+        f3.finalize_after_execution(state, msg.height, extracted.instance_id)
+            .context("failed to finalize F3 execution")?;


I'm wondering what may cause store_validator_changes or execute_topdown_msgs to fail? Should that happen, finalize_after_execution won't be called, and, IIUC, no further update from the parent chain will ever make it to the subnet chain.

sergefdrv · 2026-01-29T13:26:28Z

fendermint/app/src/service/topdown.rs

+        let service = ProofGeneratorService::new(
+            proof_config.clone(),
+            proof_cache.clone(),
+            &subnet_id,
+            initial_instance,
+            fendermint_vm_topdown_proof_service::power_entries_from_actor(&f3_state.power_table),
+        )


The proof generator service initializes the F3 client with initial_instance, which will start fetching certificates from initial_instance+1; therefore, we won't generate any proof bundles for initial_instance. If a validator joins late, it may find itself in a situation where an F3 cert is partially committed (some epochs already committed, but some not yet). In fact, this may even happen at genesis because we initialize the F3 client actor with the base epoch number.

Wait but the initial_instance + 1 is by design. With F3 you can't validate current cert with current power table. You validate cert N with power table from N - 1. Hmm. So really the initial instance should not be called initial tbh. And it should ne N - 1 of where we actually want to start. Otherwise we would need to be storing previous power table or something like that. It is pretty complicated... Hmm. WDYT?

Perhaps we should update the instance number in the F3 actor state only once all epochs certified by that instance are committed. And we should initialize it in the genesis accordingly. Maybe also reconsider what we mark as "committed" in the proof cache, to make the logic consistent.

sergefdrv · 2026-01-29T13:49:31Z

fendermint/app/src/cmd/genesis.rs

+    })?;
+
    // Get base power table for the specified instance
    let power_table_response = lotus_client.f3_get_power_table(instance_id).await?;


Hmm, we should keep in mind that we trust the endpoint here, when generating the genesis block, in that it provides us the correct initial power table for F3.

Also, when we create a new subnet, we trust that the parent endpoint doesn't provide us an F3 instance ID "from future".

Yes we do. But there is not other way around it IMO. But we should mention it docs.

I think, in principle, we could derive it, upon initialization, from the EC chain after 900 epochs (or earlier using the finality calculator), same way as Filecoin nodes are supposed to do, ultimately anchoring trust into the drand beacon

karlem changed the title ~~feat: init lifecycle~~ feat: F3 e2e lifecycle Oct 29, 2025

karlem force-pushed the f3-lifecycle branch from aecf7f5 to a864009 Compare October 29, 2025 18:25

karlem force-pushed the f3-proofs-cache branch from fbaa095 to b34142d Compare October 29, 2025 18:26

karlem force-pushed the f3-lifecycle branch from a864009 to 1e5125d Compare October 29, 2025 18:27

karlem force-pushed the f3-proofs-cache branch from b34142d to 31feb85 Compare November 4, 2025 15:47

karlem force-pushed the f3-lifecycle branch 2 times, most recently from 91db005 to cbce51c Compare November 4, 2025 17:20

sergefdrv reviewed Nov 4, 2025

View reviewed changes

fendermint/vm/interpreter/src/fvm/topdown.rs Outdated Show resolved Hide resolved

karlem force-pushed the f3-lifecycle branch from cbce51c to 0514cd9 Compare November 5, 2025 16:12

karlem force-pushed the f3-proofs-cache branch from 39e59d6 to bfdc6f7 Compare November 5, 2025 22:26

karlem mentioned this pull request Nov 5, 2025

feat: add F3 proofs cache #1457

Merged

sergefdrv reviewed Nov 10, 2025

View reviewed changes

fendermint/actors/f3-light-client/src/state.rs Outdated Show resolved Hide resolved

karlem force-pushed the f3-proofs-cache branch from 93a1066 to 1747f78 Compare November 28, 2025 19:49

karlem force-pushed the f3-lifecycle branch from 82d5f04 to 9a782ce Compare December 1, 2025 20:45

karlem force-pushed the f3-lifecycle branch from 9a782ce to 5c188b8 Compare December 15, 2025 21:34

Base automatically changed from f3-proofs-cache to main December 18, 2025 16:15

karlem added 14 commits December 19, 2025 16:30

feat: add f3 cert actor

1fa0693

feat: add fetching from parent

264e69d

feat: add extra checks and tests

0c436bd

feat: multiple epochs in certificate

07160fd

fix: clippy

236feab

feat: fix comments

1b5ac3b

feat: fix comment

d7935f5

fix: e2e tests

f9ac821

feat: implement coments changes

993153e

feat: add proofs service skeleton

6d5734b

feat: add persistence and include proofs libraryr

0736fa6

feat: add perstance, real libraries, wather

506de2a

feat: implement cache e2e

ad80adb

feat: debug issues + make functional

25f5d1c

karlem added 5 commits December 19, 2025 17:37

feat: progress with top down manager

bfb692c

fix: revert genesis and manifest changes to match f3-proofs-cache bas…

9970708

…eline

feat: finish implementing e2e

d5396a2

feat: makes changes after rebase

fbefdde

feat: rebase cache

801c388

karlem force-pushed the f3-lifecycle branch from 5c188b8 to 801c388 Compare December 19, 2025 16:40

karlem added 9 commits December 19, 2025 17:44

fix: after rebase

c493897

feat: introduce generialised approach with local f3 cache

3901b96

feat: make it configurable

a6ce3b9

feat: cleanup topdown moduel and node startup

baaadfb

feat: cleanup topdown moduel and node startup

f259d44

feat: improve the F3 topdown to rely on local cache and fix the attes…

c427732

…t and execute logic

feat: cleanup, fix logic add tests

b0481e3

feat: add extraction tests

38506c9

feat: loggig and integration test

8857277

karlem marked this pull request as ready for review January 16, 2026 19:52

karlem requested a review from a team as a code owner January 16, 2026 19:52

cursor bot reviewed Jan 16, 2026

View reviewed changes

fendermint/actors/f3-light-client/src/state.rs Show resolved Hide resolved

fendermint/actors/f3-light-client/src/lib.rs Show resolved Hide resolved

sergefdrv requested changes Jan 22, 2026

View reviewed changes

sergefdrv reviewed Jan 23, 2026

View reviewed changes

fendermint/vm/interpreter/src/fvm/f3_topdown.rs Outdated Show resolved Hide resolved

fendermint/vm/interpreter/src/fvm/f3_topdown.rs Outdated Show resolved Hide resolved

sergefdrv reviewed Jan 23, 2026

View reviewed changes

fendermint/vm/message/src/ipc.rs Outdated Show resolved Hide resolved

feat: comments

6958080

sergefdrv reviewed Jan 27, 2026

View reviewed changes

sergefdrv mentioned this pull request Jan 28, 2026

Store parent chain's gateway actor ID on subnet chain #1487

Open

feat: update based on next comments

5bd1d1e

cursor bot reviewed Jan 28, 2026

View reviewed changes

feat: add support for nonce verification

0e3593c

sergefdrv reviewed Jan 28, 2026

View reviewed changes

sergefdrv reviewed Jan 29, 2026

View reviewed changes

feat: F3 e2e lifecycle #1469

Are you sure you want to change the base?

feat: F3 e2e lifecycle #1469

Uh oh!

Conversation

karlem commented Oct 28, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 28, 2026

Choose a reason for hiding this comment

Fresh node with F3 config fails to start

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karlem commented Oct 28, 2025 •

edited by cursor bot

Loading