feat: implement sync mode #2961

tac0turtle · 2026-01-08T13:43:41Z

Overview

this pr add sync modes allowing us to move to subscriptions when at the head of the chain

…DA client

claude · 2026-01-08T13:43:55Z

Claude finished @tac0turtle's task in 2m 55s —— View job

Code Review Complete

I've conducted a comprehensive review of the sync mode implementation. This is a significant enhancement that introduces dynamic synchronization modes for DA block retrieval. Below is my detailed feedback organized by focus area.

Summary

This PR successfully implements a dual-mode synchronization mechanism for the DA worker:

Catchup Mode: Aggressive polling when behind the DA chain head
Follow Mode: Subscription-based real-time updates when caught up

The implementation is generally well-structured with good separation of concerns. However, I've identified one critical bug related to loop variable capture (also caught by Gemini Code Assist), along with several important architectural and resilience considerations.

Critical Issues

1. Loop Variable Capture Bug (syncer.go:565-578) 🔴 CRITICAL

File: block/internal/syncing/syncer.go:565-578

The event variable is reused in each loop iteration, and taking its address with &event passes the same memory location every time. After the loop completes, all cached events will point to the last event, causing data corruption.

// Current (WRONG):
for _, event := range events {
    select {
    case s.heightInCh <- event:
        // ...
    default:
        s.cache.SetPendingEvent(event.Header.Height(), &event)  // ❌ BUG HERE
        // ...
    }
}

Fix:

for _, event := range events {
    event := event  // Create new variable in loop scope
    select {
    case s.heightInCh <- event:
        // ...
    default:
        s.cache.SetPendingEvent(event.Header.Height(), &event)
        // ...
    }
}

Impact: This can lead to incorrect block processing, missing blocks, or applying the wrong block data, potentially causing chain state divergence.

Fix this →

High Priority Issues

2. Interface Abstraction Violation (syncer.go:562) 🟠 HIGH

File: block/internal/syncing/syncer.go:562

events := s.daRetriever.ProcessBlobs(s.ctx, blobs, resp.Height)

The Syncer depends on the DARetriever interface but directly calls ProcessBlobs which was just added to the interface. While this is now technically correct, the design pattern here creates tight coupling.

Issue: The processSubscriptionResponse method was clearly designed for subscription handling but the ProcessBlobs method is also called directly from RetrieveFromDA. This dual-use pattern could lead to maintenance issues.

Recommendation: The current design is acceptable since ProcessBlobs is now part of the interface contract. However, consider documenting the dual-use pattern clearly in the interface definition to prevent future confusion.

3. Subscription Cancellation and Resource Cleanup (syncer.go:436-446) 🟠 HIGH

File: block/internal/syncing/syncer.go:436-446

The runFollowMode function handles subscription errors but doesn't explicitly track or clean up subscription resources:

func (s *Syncer) runFollowMode() {
    s.logger.Debug().Msg("running follow mode")
    s.metrics.SyncMode.Set(float64(SyncModeFollow))

    err := s.subscribeAndFollow()
    if err != nil && !errors.Is(err, context.Canceled) {
        s.metrics.SubscribeErrors.Add(1)
        s.logger.Warn().Err(err).Msg("subscribe failed, falling back to catchup")
        // Don't sleep - go straight to catchup mode to recover
    }
}

Issues:

No explicit handling of goroutine leaks if subscriptions don't close properly
The DA client subscription channels could potentially block or leak if not consumed
No timeout for how long to stay in follow mode before re-checking

Recommendations:

Add monitoring for stuck subscriptions
Consider periodic health checks even in follow mode
Document the expected behavior when subscriptions silently stop sending data

4. Watchdog Timeout Edge Case (syncer.go:533-540) 🟠 MEDIUM

File: block/internal/syncing/syncer.go:533-540

case <-time.After(watchdogTimeout):
    // Watchdog: if no events for watchdogTimeout, recheck mode
    // Might have fallen behind due to network issues
    s.logger.Debug().Dur("timeout", watchdogTimeout).Msg("subscription watchdog triggered, checking sync mode")
    if s.determineSyncMode() == SyncModeCatchup {
        return errors.New("fell behind, switching to catchup")
    }

Issues:

time.After creates a new timer on every loop iteration that may not be garbage collected immediately
If the watchdog triggers but determineSyncMode() returns SyncModeFollow, the loop continues but doesn't reset the timer
This could lead to excessive watchdog checks if the DA is producing blocks slower than the watchdog timeout

Recommendations:

Use time.NewTimer and explicitly stop/reset it to prevent timer leaks
Add logging when watchdog triggers but determines we're still caught up
Consider making the watchdog timeout adaptive based on actual DA block time

Medium Priority Issues

5. Redundant Nil Channel Check (syncer.go:512-519) 🟡 MEDIUM

File: block/internal/syncing/syncer.go:512-519

The comment states:

case resp, ok := <-dataCh:
    // Note: if dataCh is nil (same namespace as header), this case never fires

This is correct - Go's select statement automatically skips cases on nil channels. The check is not needed, but the comment is helpful. However, there's a missing closed channel check:

if !ok {
    return errors.New("data subscription closed")
}

Recommendation: The current implementation is correct. The closed channel check (!ok) is appropriate and necessary.

6. DA Height Tracking Consistency (syncer.go:580-583) 🟡 MEDIUM

File: block/internal/syncing/syncer.go:580-583

// Update retriever height
if resp.Height >= s.daRetrieverHeight.Load() {
    s.daRetrieverHeight.Store(resp.Height + 1)
}

Issues:

This updates the DA retriever height immediately after processing subscription responses
If multiple subscription responses arrive at the same height (different namespaces), this could skip heights
The >= check means we could process height N multiple times and still advance the height tracker

Recommendations:

Track processed DA heights per namespace if namespaces can return different heights
Consider using a monotonic height tracker that only advances, never skips
Add metrics for DA height tracking discrepancies

7. Mode Determination Race Condition (syncer.go:363-375) 🟡 MEDIUM

File: block/internal/syncing/syncer.go:363-375

mode := s.determineSyncMode()
previousMode := SyncMode(s.currentSyncMode.Load())

// Track mode switches
if mode != previousMode {
    s.currentSyncMode.Store(int32(mode))
    s.metrics.ModeSwitches.Add(1)
    // ...
}

Issue: Between Load() and Store(), another goroutine could modify currentSyncMode, causing the mode switch metric to be incorrect or mode transitions to be missed.

Recommendation: Use atomic compare-and-swap (CAS) to ensure mode transitions are tracked correctly:

if s.currentSyncMode.CompareAndSwap(int32(previousMode), int32(mode)) {
    s.metrics.ModeSwitches.Add(1)
    // ... log message
}

Testing and Reliability

8. Test Coverage ✅ GOOD

The test coverage is generally good:

TestSyncer_determineSyncMode covers mode determination logic
TestSyncer_runCatchupMode tests catchup mode
TestSyncer_modeSwitching tests mode tracking
Mock implementations properly handle the new ProcessBlobs method

Areas for improvement:

Missing integration test for subscribe-to-catchup fallback: When subscription fails, verify catchup mode correctly resumes
No test for watchdog timeout: Verify watchdog correctly detects falling behind
No test for concurrent mode switches: Verify thread safety during rapid mode transitions
Missing test for loop variable bug: Add test that caches multiple events and verifies they're all different

9. LocalDA RPC Server Implementation 🟡 MEDIUM

File: tools/local-da/rpc.go:132-137

// Subscribe returns a closed channel; LocalDA does not push live updates.
func (s *blobServer) Subscribe(_ context.Context, _ libshare.Namespace) (<-chan *jsonrpc.SubscriptionResponse, error) {
    ch := make(chan *jsonrpc.SubscriptionResponse)
    close(ch)
    return ch, nil
}

Issue: Returning a closed channel means subscription mode will immediately fail for LocalDA. This is fine for testing catchup mode, but makes it impossible to test follow mode with LocalDA.

Recommendations:

Document this limitation clearly
Consider implementing a simple polling-based subscription for LocalDA to enable follow mode testing
Add a comment explaining that this intentionally disables follow mode for LocalDA

Architecture and Design

10. Sync Mode State Machine ✅ GOOD

The state machine design is clean and well-structured:

Clear separation between catchup and follow modes
Mode determination logic is simple and understandable
Fallback from follow to catchup is automatic on subscription failure

Strengths:

The determineSyncMode() function is simple and testable
Mode switches are properly logged and metricsed
The catchup threshold (2 blocks) is reasonable

Consideration: The threshold is hardcoded. Consider making it configurable if different deployments have different needs.

11. Metrics Implementation ✅ GOOD

File: block/internal/common/metrics.go:73-76

The new metrics are well-designed:

SyncMode        metrics.Gauge   // Current sync mode: 0=catchup, 1=follow
SubscribeErrors metrics.Counter // Number of subscription failures
ModeSwitches    metrics.Counter // Number of catchup<->follow mode transitions

Strengths:

Clear naming and documentation
Appropriate metric types (gauge for state, counters for events)
Integrated into existing metrics infrastructure

Recommendations:

Add a histogram for "time spent in each mode" to understand sync behavior
Add a gauge for "subscription age" (time since last subscription message)
Consider a counter for "watchdog triggers"

12. DA Client Interface Extensions ✅ GOOD

File: block/internal/da/interface.go:27-34

The new methods are well-designed:

Subscribe(ctx context.Context, namespace []byte) (<-chan *blobrpc.SubscriptionResponse, error)
LocalHead(ctx context.Context) (uint64, error)

Strengths:

Clean interface contract
Proper use of channels for subscription
Context support for cancellation

Note: The LocalHead method is crucial for mode determination. Ensure all DA client implementations handle errors gracefully.

Error Handling and Resilience

13. Subscription Error Handling ✅ GOOD

File: block/internal/syncing/syncer.go:451-463

The subscription setup properly handles errors for each namespace:

headerCh, err := s.daClient.Subscribe(subCtx, headerNS)
if err != nil {
    return fmt.Errorf("failed to subscribe to header namespace: %w", err)
}

Strengths:

Errors are wrapped with context
Subscription failures cause immediate fallback to catchup mode
Separate subscriptions for header, data, and forced inclusion namespaces

Consideration: If header subscription succeeds but data subscription fails, the header subscription is leaked. Consider adding cleanup:

headerCh, err := s.daClient.Subscribe(subCtx, headerNS)
if err != nil {
    return fmt.Errorf("failed to subscribe to header namespace: %w", err)
}

dataCh, err := s.daClient.Subscribe(subCtx, dataNS)
if err != nil {
    cancel() // Clean up header subscription
    return fmt.Errorf("failed to subscribe to data namespace: %w", err)
}

Actually, on second look, the defer cancel() at line 458 handles this correctly. ✅

14. Catchup Mode Backoff ✅ GOOD

File: block/internal/syncing/syncer.go:418-432

The backoff strategy is sensible:

backoff := s.config.DA.BlockTime.Duration
if backoff <= 0 {
    backoff = 2 * time.Second
}

Strengths:

Uses configured block time as backoff duration
Reasonable default (2 seconds)
Same backoff for errors and successful catchup

Consideration: Exponential backoff for persistent errors might be beneficial, but the current approach is adequate.

Performance Considerations

15. Channel Buffering 🟡 MEDIUM

File: block/internal/syncing/syncer.go:207

heightInCh: make(chan common.DAHeightEvent, 100),

Analysis:

Buffer size of 100 is reasonable for most scenarios
Subscription mode could potentially generate events faster than processing
When buffer is full, events are cached to disk (good fallback)

Recommendations:

Monitor channel fullness with metrics
Consider making buffer size configurable
Log when events are queued as pending due to full channel

16. Memory Management in ProcessBlobs ✅ GOOD

File: block/internal/syncing/da_retriever.go:156-235

The ProcessBlobs method maintains pending headers and data in memory:

pendingHeaders map[uint64]*types.SignedHeader
pendingData    map[uint64]*types.Data

Strengths:

Transient cache is cleared on restarts (documented)
Events are created only when header+data are matched
Duplicate detection prevents memory bloat

Consideration: If a malicious sequencer publishes headers without data (or vice versa), these maps could grow unbounded. Consider:

Adding a maximum size limit
Adding age-based cleanup
Monitoring pending map sizes with metrics

Security Considerations

17. Subscription Data Validation ✅ GOOD

File: block/internal/syncing/syncer.go:544-562

Subscription responses are processed through the same validation pipeline as polled data:

events := s.daRetriever.ProcessBlobs(s.ctx, blobs, resp.Height)

Strengths:

Reuses existing validation logic from ProcessBlobs
No special trust assumptions for subscription data
Headers and data are validated before processing

Excellent: This ensures subscription mode doesn't introduce new attack vectors.

18. DA Client Nil Check ✅ GOOD

File: block/internal/syncing/syncer.go:387-390

if s.daClient == nil {
    return SyncModeCatchup
}

Properly handles nil DA client (test scenarios) by defaulting to catchup mode.

Code Quality and Style

19. Code Organization ✅ EXCELLENT

The code is well-organized:

Clear separation between catchup and follow modes
Helper functions are appropriately sized
Good use of early returns and guard clauses
Consistent error handling patterns

20. Documentation ✅ GOOD

Comments are generally clear and helpful:

Sync mode constants are documented
Method purposes are explained
Edge cases are noted (e.g., nil channels in select)

Minor improvements:

Add package-level documentation explaining the sync mode architecture
Document the catchup threshold rationale
Add examples in interface documentation

21. Logging ✅ GOOD

Logging is appropriate:

Mode switches are logged at Info level
Subscription events at Debug level
Errors at Warn/Error level

Suggestions:

Add structured logging for subscription stats (messages received, latency)
Log watchdog trigger frequency to detect DA issues

Recommendations Summary

Must Fix (Critical)

❌ Fix loop variable capture bug in processSubscriptionResponse (syncer.go:565)

Should Fix (High Priority)

Consider improving subscription resource cleanup and monitoring
Fix watchdog timer to use time.NewTimer instead of time.After
Use atomic CAS for mode transition tracking to prevent race conditions

Nice to Have (Medium Priority)

Implement simple subscription for LocalDA to enable follow mode testing
Add integration tests for subscription failure scenarios
Add metrics for channel buffer usage and subscription health
Add bounds checking for pending headers/data maps
Make catchup threshold configurable

Documentation

Document dual-use pattern of ProcessBlobs in interface
Add package-level documentation for sync mode architecture
Document LocalDA subscription limitation

Conclusion

This is a solid implementation of an important feature. The dual-mode synchronization strategy is well-designed and will significantly improve DA sync efficiency for nodes at the head of the chain.

Key Strengths:

Clean state machine design
Good separation of concerns
Comprehensive metrics
Proper fallback mechanisms

Critical Issue:

The loop variable capture bug must be fixed before merging

Overall Assessment: After fixing the critical bug, this PR will be ready to merge. The architecture is sound, the implementation is clean, and the testing is reasonable. The suggested improvements are mostly optimizations and edge case handling that can be addressed in follow-up PRs.

Suggested Merge Plan:

Fix the loop variable capture bug
Consider the high-priority issues (subscription cleanup, watchdog timer)
Add integration test for subscription failure fallback
Merge and monitor metrics in production
Address medium-priority issues based on real-world behavior

github-actions · 2026-01-08T13:43:57Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jan 8, 2026, 2:21 PM

gemini-code-assist · 2026-01-08T13:44:05Z

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the data availability (DA) worker's synchronization capabilities by introducing dynamic sync modes. It allows the system to efficiently transition between an aggressive polling strategy when catching up to the DA chain and a more resource-optimized, subscription-based approach once it reaches the chain head. This change aims to improve the overall performance and responsiveness of DA data processing.

Highlights

Dynamic Sync Modes: Introduced two dynamic synchronization modes for the DA worker: 'Catchup' for aggressive polling when behind the chain head, and 'Follow' for efficient, subscription-based real-time updates when caught up.
Subscription-based Updates: Implemented a new Subscribe method in the DA client, enabling the 'Follow' mode to receive real-time blob notifications, significantly reducing polling overhead.
Local Head Tracking: Added a LocalHead method to the DA client and a corresponding RPC endpoint, allowing the syncer to determine its current position relative to the DA chain head and facilitate mode switching.
New Metrics for Observability: Integrated new Prometheus metrics (SyncMode, SubscribeErrors, ModeSwitches) to provide visibility into the current sync mode, subscription failures, and transitions between modes.
Refactored DA Worker Loop: The daWorkerLoop has been refactored to intelligently determine and switch between 'Catchup' and 'Follow' modes based on the node's synchronization status, including a watchdog mechanism for 'Follow' mode.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a dual-mode synchronization mechanism, allowing the node to switch between an aggressive polling 'catchup' mode and a more efficient subscription-based 'follow' mode. This is a significant enhancement for nodes that are at the head of the chain. The changes are well-structured, introducing new DA client methods, metrics, and the core state machine logic in the daWorkerLoop. My review identified two critical bugs related to incorrect loop variable capturing that could lead to data corruption, and a couple of medium-severity design and style issues. Once these points are addressed, the implementation will be much more robust.

block/internal/syncing/syncer.go

…in daRetriever; update tests and syncer for subscription handling

feat: implement sync mode metrics and subscription functionality for …

e3a0ed3

…DA client

github-actions bot assigned tac0turtle Jan 8, 2026

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

block/internal/syncing/syncer.go Show resolved Hide resolved

block/internal/syncing/syncer.go Outdated Show resolved Hide resolved

block/internal/syncing/syncer.go Outdated Show resolved Hide resolved

feat: add ProcessBlobs method to DARetriever interface and implement …

a09461e

…in daRetriever; update tests and syncer for subscription handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement sync mode #2961

feat: implement sync mode #2961

Uh oh!

tac0turtle commented Jan 8, 2026

Uh oh!

claude bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: implement sync mode #2961

Are you sure you want to change the base?

feat: implement sync mode #2961

Uh oh!

Conversation

tac0turtle commented Jan 8, 2026

Overview

Uh oh!

claude bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Critical Issues

1. Loop Variable Capture Bug (syncer.go:565-578) 🔴 CRITICAL

High Priority Issues

2. Interface Abstraction Violation (syncer.go:562) 🟠 HIGH

3. Subscription Cancellation and Resource Cleanup (syncer.go:436-446) 🟠 HIGH

4. Watchdog Timeout Edge Case (syncer.go:533-540) 🟠 MEDIUM

Medium Priority Issues

5. Redundant Nil Channel Check (syncer.go:512-519) 🟡 MEDIUM

6. DA Height Tracking Consistency (syncer.go:580-583) 🟡 MEDIUM

7. Mode Determination Race Condition (syncer.go:363-375) 🟡 MEDIUM

Testing and Reliability

8. Test Coverage ✅ GOOD

9. LocalDA RPC Server Implementation 🟡 MEDIUM

Architecture and Design

10. Sync Mode State Machine ✅ GOOD

11. Metrics Implementation ✅ GOOD

12. DA Client Interface Extensions ✅ GOOD

Error Handling and Resilience

13. Subscription Error Handling ✅ GOOD

14. Catchup Mode Backoff ✅ GOOD

Performance Considerations

15. Channel Buffering 🟡 MEDIUM

16. Memory Management in ProcessBlobs ✅ GOOD

Security Considerations

17. Subscription Data Validation ✅ GOOD

18. DA Client Nil Check ✅ GOOD

Code Quality and Style

19. Code Organization ✅ EXCELLENT

20. Documentation ✅ GOOD

21. Logging ✅ GOOD

Recommendations Summary

Must Fix (Critical)

Should Fix (High Priority)

Nice to Have (Medium Priority)

Documentation

Conclusion

Uh oh!

github-actions bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Jan 8, 2026 •

edited

Loading

github-actions bot commented Jan 8, 2026 •

edited

Loading