Move shutdown_worker RPC to initiate_shutdown#1197
Move shutdown_worker RPC to initiate_shutdown#1197yuandrew wants to merge 17 commits intotemporalio:masterfrom
Conversation
| assert_eq!(workflow_task_slots.current_available_slots, 5); | ||
| assert_eq!(workflow_task_slots.current_used_slots, 1); |
There was a problem hiding this comment.
With the new shutdown flow taking a little longer, these checks aren't deterministic. These checks are technically all checked in in_activity_checks, kept the assert_eq!(workflow_task_slots.total_processed_tasks, 2); check, which is unique to the shutdown case
…me timeout to avoid race
20d28e0 to
5dcbf1c
Compare
| [env] | ||
| # This temporarily overrides the version of the CLI used for integration tests, locally and in CI | ||
| #CLI_VERSION_OVERRIDE = "v1.4.1-cloud-v1-29-0-139-2.0" | ||
| CLI_VERSION_OVERRIDE = "v1.6.3-serverless" |
There was a problem hiding this comment.
Temporary CLI version override left uncommented
Medium Severity
CLI_VERSION_OVERRIDE is uncommented and set to "v1.6.3-serverless", pinning a specific CLI version for all integration tests (locally and in CI). The PR description explicitly states this is temporary ("will be safe to remove once the fix is fully deployed into cloud") and the reviewer confirmed it as "Intentionally temporarily checked in." This override changes test behavior globally and risks being forgotten after the upstream fix lands.
Reviewed by Cursor Bugbot for commit 5dcbf1c. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit f2be7f7. Configure here.
| _ => {} | ||
| } | ||
| }); | ||
| *guard = Some(handle); |
There was a problem hiding this comment.
Sync initiate_shutdown now requires tokio runtime context
Medium Severity
initiate_shutdown is a public sync function (pub fn) that now calls tokio::spawn, which panics if no tokio runtime is available. Previously this function was purely synchronous and safe to call from any context (e.g., signal handlers, non-tokio threads). The C bridge correctly adds enter_sync!, but the Rust public API silently changed its contract without documenting the new runtime requirement.
Reviewed by Cursor Bugbot for commit f2be7f7. Configure here.


What was changed
Moved
shutdown_workerRPC toinitiate_shutdown, which requiresinitiate_shutdownto now be an async fnNOTE: also added a temp 5s timeout in order to avoid the race in temporalio/temporal#9545, will be safe to remove once the fix is fully deployed into cloud
Why?
shutdown_workerRPC needs to be sent at beginning of shutdown in order for server to know to send empty responses to pending polls.Checklist
Closes
How was this tested:
Added graceful_shutdown_sends_shutdown_worker_rpc_during_initiate to ensure RPC is sent at beginning of shutdown process
Note
Medium Risk
Changes worker shutdown sequencing and poll-stream termination behavior, which can affect in-flight task polling and shutdown reliability. Concurrency is introduced via a spawned RPC task and new capability plumbing, but it’s covered by added unit/integration tests.
Overview
Fixes graceful shutdown deadlocks by moving the
ShutdownWorkerRPC toWorker::initiate_shutdown(spawned in the background and awaited early inWorker::shutdown), ensuring the server is notified to flush long polls before the SDK waits for polls to drain.Refactors server feature flags into a shared
NamespaceCapabilitiesobject (exported fromsdk-core) and updates pollers to treat empty poll responses after shutdown as a termination signal rather than a timeout to retry. Adds targeted unit/integration tests for the new shutdown/poll behavior, enables the relevant dev-server dynamic config for tests, and bumps the integration-test CLI override version.Reviewed by Cursor Bugbot for commit f2be7f7. Bugbot is set up for automated code reviews on this repo. Configure here.