Skip to content

fix(rivetkit): restore hibernatable sockets and hydrate serverless starts#4658

Draft
NathanFlurry wants to merge 1 commit intobreak-up/expose-hibernation-metadatafrom
break-up/restore-hibernating-sockets
Draft

fix(rivetkit): restore hibernatable sockets and hydrate serverless starts#4658
NathanFlurry wants to merge 1 commit intobreak-up/expose-hibernation-metadatafrom
break-up/restore-hibernating-sockets

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 14, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

PR Review: fix(rivetkit): restore hibernatable sockets and hydrate serverless starts

This PR restores hibernatable WebSocket connections across serverless actor restarts and improves sleep test reliability. The core idea is sound - persisting connection metadata in KV and injecting it into the serverless start payload enables proper socket rebinding after a cold start. Overall quality is good with a few things worth addressing.


What This Does

  • Serverless start hydration (#hydrateServerlessStartPayload): reads persisted connection metadata from KV and injects hibernating requests into the CommandStartActor payload before forwarding to the envoy, restoring hibernatable sockets after a serverless restart.
  • Binding infrastructure: introduces HibernatableConnectBinding and HibernatableRunnerWebSocketBinding with detach/rebind lifecycle, so connections can be re-wired to a newly started actor instance.
  • Proper dispose() on shutdown: replaces #dynamicRuntimes.clear() with await this.#disposeAllDynamicRuntimes(), and adds force-stop of stuck actors during envoy drain.
  • Test improvements: polling retry helper (readAfterSleepCycle) replaces fixed-delay waits; toBeGreaterThanOrEqual relaxes brittle exact-count assertions.
  • Gateway URL path fix /api/hello to /request/api/hello to match router changes.

Issues

waitForHibernatableRegistration is an unexplained magic delay

No comment explains what is being waited for or why 100ms is sufficient. This is a timing assumption that will be flaky in slow CI- A statement explaining the race being guarded against would help; ideally a polling approach (similar to readAfterSleepCycle) would be more robust.

"alarms keep actor awake" test semantics changed

The original test verified the actor did NOT sleep while an alarm was active by checking counts mid-way before the alarm fires. The new version removes that intermediate assertion and only checks that a sleep eventually happens. This no longer verifies the alarm's keep-alive effect - it just confirms baseline sleep behavior. The test name no longer matches its intent.

Logging field name inconsistency

In the dynamic actor start error handler, err: stringifyError(error) is used, but all other error logging in the file uses error:.

dynamic.runtime.disposed handling is inconsistent between binding types

In #bindDynamicHibernatableConnectSocket, the onProxyClose handler only ignores "dynamic.runtime.disposed" when isRestoringHibernatable is true. In #bindDynamicHibernatableRunnerWebSocket, it is always ignored. If intentional, a comment explaining why would prevent future confusion.

Broad try/catch in dynamic actor start masks which step failed

The try/catch wraps rebinding + restoring in one block with a single warning message. If rebinding succeeds but restoreHibernatingRequests fails, the warning is misleading. Consider logging the step name or using separate try/catch blocks per phase.

No version validation in #hydrateServerlessStartPayload

The code slices off a 2-byte version prefix and reassembles the payload after re-encoding without validating what those bytes represent. A comment on the expected prefix format or a defensive assertion would help prevent silent corruption if the format changes.


Minor Observations

  • readAfterSleepCycle error message prints lastError=undefined when the read never throws but counts never reach threshold - consider omitting lastError when it is not set.
  • #disposeDynamicRuntime using finally to guarantee delete even on dispose failure is the right pattern.
  • The .sequential() annotation on hibernation tests is correct and necessary.
  • toBeGreaterThanOrEqual paired with expect(startCount).toBe(sleepCount + 1) is a good invariant - lenient on absolute counts while still enforcing the relationship.

@NathanFlurry NathanFlurry force-pushed the break-up/expose-hibernation-metadata branch from 26f98bc to fde1e0b Compare April 15, 2026 02:40
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch from ccc38b5 to 8293235 Compare April 15, 2026 02:40
@NathanFlurry NathanFlurry force-pushed the break-up/expose-hibernation-metadata branch from fde1e0b to 789b9cd Compare April 15, 2026 02:50
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch from 8293235 to 1ba6a3b Compare April 15, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant