Skip to content

fix(rivetkit): prevent sleep races during disconnect and db work#4644

Draft
NathanFlurry wants to merge 1 commit intofix/isolate-engine-envoysfrom
fix/prevent-sleep-db-races
Draft

fix(rivetkit): prevent sleep races during disconnect and db work#4644
NathanFlurry wants to merge 1 commit intofix/isolate-engine-envoysfrom
fix/prevent-sleep-db-races

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 13, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

PR Review: fix(rivetkit): prevent sleep races during disconnect and db work

Summary

A small, focused bug-fix PR (5 additions, 0 deletions) addressing two race conditions where an actor could incorrectly fall asleep:

  1. connection-manager.ts - Adds this.#actor.resetSleepTimer() immediately after this.#pendingDisconnectCount += 1 inside connDisconnected. This ensures the sleep timer is reset as soon as a pending disconnect begins being tracked, so the actor stays awake while the async onDisconnect callback and KV cleanup work are still running.

  2. mod.ts - Adds a #canSleep() guard inside the setTimeout callback in resetSleepTimer. Previously, a sleep timeout could be scheduled at a moment when sleeping was not yet appropriate, fire later when sleep was blocked mid-disconnect, and call startSleep() anyway. The guard re-checks whether sleeping is allowed at fire time and reschedules the timer if not.


Code Quality and Best Practices

The changes are clean and consistent with the existing codebase:

  • No new imports introduced; changes are minimal and self-contained.
  • The resetSleepTimer() call in connection-manager.ts is consistent with all other call sites (connConnected, connResumed, connHibernated, and the existing one at the end of connDisconnected).
  • No logging changes, no style violations, no use of deprecated domains or patterns.

Potential Bugs / Issues

The new resetSleepTimer() at the top of connDisconnected may appear redundant given the existing one at the end, but it is not.

Without the new line, there is a window between incrementing #pendingDisconnectCount and the already-pending timer firing. If a sleep timer was scheduled before connDisconnected was called, it could fire in that gap and (without the mod.ts guard) call startSleep() even though pendingDisconnectCount > 0.

After the mod.ts change, the timer guard would catch this case anyway. However, the connection-manager.ts call is good defense-in-depth - it cancels the pending timer eagerly rather than relying on the setTimeout callback to self-correct. Both changes are complementary and correct.

No correctness issues found.


Performance

No concerns. clearTimeout + setTimeout in resetSleepTimer is O(1) and is already called on every connect/disconnect/message/HTTP request lifecycle event. Adding one more call site is negligible.


Security

No security implications.


Test Coverage

No tests are included. Given the race condition depends on a timer firing in a narrow async window, deterministic unit testing would require timer mocking, which is reasonable to defer.

If tests are added later, a useful case would be: start an actor with a very short sleepTimeout, trigger a disconnect, and assert that startSleep is not called until after the onDisconnect callback completes.


Verdict

The PR is correct and well-scoped. The mod.ts guard is the core fix; the connection-manager.ts eager reset is a solid complementary improvement. Ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant