Skip to content

fix: prevent missed messages in stream after valid error#3427

Merged
cameronvoell merged 5 commits intomainfrom
04-09-prevent_missed_messages_in_stream_after_valid_error
Apr 14, 2026
Merged

fix: prevent missed messages in stream after valid error#3427
cameronvoell merged 5 commits intomainfrom
04-09-prevent_missed_messages_in_stream_after_valid_error

Conversation

@cameronvoell
Copy link
Copy Markdown
Contributor

@cameronvoell cameronvoell commented Apr 10, 2026

Prevent missed messages in stream after a valid error by falling back to alternate cursors

  • When a streamed cursor errors during sync, MessageProcessor now checks other successful cursors from the same sync batch for the same group before giving up.
  • Introduces lookup_stored_from_sync in factory.rs to first look up the streamed cursor, then scan alternate successful MessageIdentifiers ordered by ascending cursor.
  • When an alternate cursor hits, next_message is set to that cursor rather than the errored one, so the stream does not stall or skip the delivered message.
  • Adds a regression test covering the scenario where cursor 10 errors, cursor 11 succeeds, and cursor 12 errors, verifying cursor 11 is surfaced correctly.

Macroscope summarized 4986ee0.

@claude
Copy link
Copy Markdown

claude bot commented Apr 10, 2026

Claude finished @cameronvoell's task —— View job


Code Review Complete

Review Date: 2026-04-13 23:52 UTC

Todo List:

  • Read changed files to understand the implementation
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Post comprehensive review feedback

Summary

The implementation is solid and well-tested. The fallback logic correctly prevents missed messages by scanning alternate successful cursors when the primary cursor errors. Code quality is high with good documentation and thorough test coverage.

Detailed Feedback

Code Quality & Best Practices ✅

  • Clear separation of concerns with lookup_stored_from_sync extracted as a dedicated method
  • Comprehensive documentation explaining the fallback logic and edge case scenario (factory.rs:173-180)
  • Good use of iterator methods (filter, sort_by_key) for cursor scanning
  • Test helper functions (generate_errored_summary_with_group) properly ensure group_id consistency

Potential Issues 🔍

Minor: Short-circuit opportunity in fallback loop

factory.rs:202-206 - The fallback loop could theoretically scan many candidates if multiple cursors succeed for the same group. Consider adding an early break or limiting to the first successful candidate, since we're already sorting ascending:

for id in candidates {
    if let Some(stored) = self.group_db.msg(Some(id), msg)? {
        return Ok(Some((stored, id.cursor)));
        // This already returns, but if the query is expensive,
        // consider whether we want the *first* ascending or truly
        // iterate all candidates
    }
}

Current behavior looks correct (returns on first hit), but worth confirming this is intentional vs. wanting to scan all.

Observation: Mock expectation range in regression test

process_message.rs:241 - .times(1..=2) allows 1 or 2 database probes. This is correct (primary lookup + potential fallback), but the test would still pass if the fallback logic was removed and only 1 probe occurred. Consider adding an assertion that the mock was called exactly 2 times to ensure the fallback is exercised, or verify the fallback was triggered via logging/state.

Performance Considerations ✅

  • Sorting candidates has O(n log n) cost but n is typically small (successful cursors in a single sync batch)
  • Database lookups are already happening in the success path, so the fallback adds minimal overhead
  • Early return pattern prevents unnecessary work

Security Concerns ✅

  • No security issues identified
  • Group ID validation occurs in GroupDatabase::msg (factory.rs:65-67)
  • No injection risks or unsafe operations

Test Coverage ✅

  • Excellent regression test (test_process_surfaces_decrypt_between_failed_cursors) covering the exact bug scenario
  • Parametric tests verify cursor selection across multiple error/success patterns
  • Mock expectations properly validate the fallback logic is triggered
  • Only 1 line missing coverage according to Codecov (98.92% patch coverage)

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 98.92473% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.25%. Comparing base (dbe47a2) to head (4986ee0).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...ates/xmtp_mls/src/subscriptions/process_message.rs 97.36% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3427   +/-   ##
=======================================
  Coverage   83.24%   83.25%           
=======================================
  Files         378      378           
  Lines       51943    52035   +92     
=======================================
+ Hits        43240    43321   +81     
- Misses       8703     8714   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cameronvoell cameronvoell force-pushed the 04-09-prevent_missed_messages_in_stream_after_valid_error branch from dd4e974 to a0e57b5 Compare April 10, 2026 22:14
@cameronvoell cameronvoell force-pushed the 04-09-prevent_missed_messages_in_stream_after_valid_error branch from a0e57b5 to 20d41cf Compare April 11, 2026 00:16
@cameronvoell cameronvoell marked this pull request as ready for review April 11, 2026 00:31
@cameronvoell cameronvoell requested a review from a team as a code owner April 11, 2026 00:31
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp bot commented Apr 11, 2026

Approvability

Verdict: Approved

A straightforward bug fix adding fallback logic to prevent stream from missing valid messages when adjacent cursors error. The author owns all modified files, the change is well-scoped with clear intent, and includes a regression test. The runtime behavior change is defensive and targeted.

No code changes detected at 4986ee0. Prior analysis still applies.

You can customize Macroscope's approvability policy. Learn more.

@cameronvoell cameronvoell changed the title prevent missed messages in stream after valid error fix: prevent missed messages in stream after valid error Apr 11, 2026
@cameronvoell cameronvoell enabled auto-merge (squash) April 13, 2026 23:52
@cameronvoell cameronvoell merged commit 40d0b01 into main Apr 14, 2026
31 checks passed
@cameronvoell cameronvoell deleted the 04-09-prevent_missed_messages_in_stream_after_valid_error branch April 14, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants