Dispatch activity cancellation to worker using Nexus#9233
Open
Dispatch activity cancellation to worker using Nexus#9233
Conversation
This was referenced Feb 5, 2026
a91654f to
6a655f7
Compare
d402868 to
54b6d1a
Compare
13d8513 to
eb79c81
Compare
d103589 to
417606c
Compare
eb79c81 to
d1f2abf
Compare
417606c to
6246c4e
Compare
d1f2abf to
7489808
Compare
6246c4e to
fec3c41
Compare
37a51d2 to
11b1049
Compare
fec3c41 to
1dd975d
Compare
45ac313 to
e96dbe8
Compare
11c74b6 to
2cb3108
Compare
72fe398 to
37c2a1c
Compare
ac7c57e to
89d2cf5
Compare
5a9c85e to
d1572d7
Compare
bergundy
reviewed
Feb 12, 2026
Member
bergundy
left a comment
There was a problem hiding this comment.
Can you confirm whether you're covering cancel requests and pause requests?
Do we also care about canceling activities that we know timed out or are we letting the worker take care of that?
When will you be adding support for standalone activities too?
89d2cf5 to
79e4e3e
Compare
84e5d58 to
d5f5067
Compare
Matching uses the clock from RecordActivityTaskStarted to build the task token sent to the worker. Store this clock in ActivityInfo so that history can later reconstruct the same task token (e.g. for cancel worker commands). Key changes: - Add started_clock field to ActivityInfo proto - Create clock before AddActivityTaskStartedEvent so it's persisted in the same write (same pattern as WorkerControlTaskQueue) - Return stored clock on retry path (same RequestId) so matching always gets the clock that the cancel handler will use - Use binary/protobuf encoding for worker command payloads (SDK Core decodes these directly via prost, not through lang-SDK Nexus handlers) - Cancel handler reconstructs task token using ai.StartedClock - Functional test asserts cancel token matches the original activity token Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d5f5067 to
dfac6d2
Compare
Worker commands are best-effort (activity will eventually time out anyway), so retrying up to 70 times (the global DLQ default) wastes resources. Expose the in-memory attempt counter from Executable and check it in the dispatcher to drop tasks after 3 attempts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Records worker_commands_sent{outcome=max_attempts_exceeded} so dropped
tasks are observable in dashboards and alerts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yycptt
reviewed
Apr 3, 2026
Member
yycptt
left a comment
There was a problem hiding this comment.
I didn't review the actual nexus request dispatch part closely. Nexus crew can do a better job than me I guess :) .
proto/internal/temporal/server/api/persistence/v1/executions.proto
Outdated
Show resolved
Hide resolved
rkannan82
added a commit
to temporalio/api
that referenced
this pull request
Apr 7, 2026
## Summary Defines a Nexus service for server-to-worker communication, starting with activity cancellation support. ## Design Decision We chose a **generic command API** (`ExecuteCommandsRequest` with `oneof` command types) instead of a cancel-specific API. This allows a future optimization to batch multiple commands (cancel, pause, etc) in a single request and deliver to a worker in one RPC. ## Files - `temporal/api/nexusservices/workerservice/v1/request_response.proto` - request response definitions - `nexus-rpc/temporal-proto-models-nexusrpc.yaml` - Nexus service definition ## Related - [Server PR](temporalio/temporal#9233) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
temporal-cicd bot
pushed a commit
to temporalio/api-go
that referenced
this pull request
Apr 7, 2026
## Summary Defines a Nexus service for server-to-worker communication, starting with activity cancellation support. ## Design Decision We chose a **generic command API** (`ExecuteCommandsRequest` with `oneof` command types) instead of a cancel-specific API. This allows a future optimization to batch multiple commands (cancel, pause, etc) in a single request and deliver to a worker in one RPC. ## Files - `temporal/api/nexusservices/workerservice/v1/request_response.proto` - request response definitions - `nexus-rpc/temporal-proto-models-nexusrpc.yaml` - Nexus service definition ## Related - [Server PR](temporalio/temporal#9233) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
spkane31
pushed a commit
to temporalio/api
that referenced
this pull request
Apr 9, 2026
## Summary Defines a Nexus service for server-to-worker communication, starting with activity cancellation support. ## Design Decision We chose a **generic command API** (`ExecuteCommandsRequest` with `oneof` command types) instead of a cancel-specific API. This allows a future optimization to batch multiple commands (cancel, pause, etc) in a single request and deliver to a worker in one RPC. ## Files - `temporal/api/nexusservices/workerservice/v1/request_response.proto` - request response definitions - `nexus-rpc/temporal-proto-models-nexusrpc.yaml` - Nexus service definition ## Related - [Server PR](temporalio/temporal#9233) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
5d17c04 to
5521837
Compare
- Add StartedClock nil check in cancel handler (backward compat for activities started before deploy) - Add WorkerCommandsTask case to standby executor (drop task) - Use shardCtx.GetConfig() instead of passing config param - Add lock around Executable.Attempt() for thread safety - Add replication comment on started_clock proto field - Add tests for cancel command with/without StartedClock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5521837 to
3e4267f
Compare
rkannan82
added a commit
that referenced
this pull request
Apr 11, 2026
…worker (#9231) ## What changed? As part of RecordActivityTaskStarted flow, store worker_control_task_queue for an activity in the mutable state (ActivityInfo). Main changes: - executions.proto: Added the new worker_control_task_queue field. - mutable_state_impl.go: Update mutable state. - matching/forwarder.go: Propagate worker_control_task_queue when polls get forwarded. Otherwise, RecordActivityTaskStarted request will not have it set when invoked from a forwarded poll. ## Why? To support activity cancellation without activity heartbeat. Overall flow: - [This PR] Store worker attributes in ActivityInfo as part of RecordActivityTaskStarted call. - [#9232] When user cancels a workflow, create 1 or more tasks. Group all activities belonging to a worker into the task (for efficiency). - [#9233] Lookup the Nexus task queue for each worker, and send a Nexus operation for each transfer task. - [SDK] Worker will receive this cancel task and cancel the running activities. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The replace directive pointed to a pre-release commit. The released v1.62.8 includes all needed protos (WorkerCommand, etc). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The else-if branch already implies StartedEventId != EmptyEventID since the prior if-branch handles the == case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rkannan82
added a commit
that referenced
this pull request
Apr 11, 2026
…Nexus (#9232) ## What changed? New outbound task type (`WorkerCommandsTask`) that carries worker commands to be dispatched to workers via Nexus. Uses the generic `WorkerCommand` proto (not cancel-activity-specific), so this task type can carry any future command types. Suggested review order: proto changes → `worker_commands_task.go` → `task_generator.go` → `workflow_task_completed_handler.go` Key pieces: - **Proto**: `TASK_TYPE_WORKER_COMMANDS` enum, `WorkerCommandsTask` in `OutboundTaskInfo` with `repeated WorkerCommand`. - **Task definition**: `worker_commands_task.go` — implements outbound `Task` and `HasDestination` interfaces. - **Task creation** (`workflow_task_completed_handler.go`, `task_generator.go`): When `RequestCancelActivityTask` is processed for a started activity whose worker has a control queue, collects a `CancelActivityCommand` with the activity's task token. Commands are batched by destination control queue and flushed as one `WorkerCommandsTask` per queue at the end of WFT processing. - **Serialization**: `task_serializers.go` for persistence round-tripping. Dispatch is a no-op here — handled in #9233. Gated by dynamic config `EnableCancelActivityWorkerCommand` (default: off). ## Why? To support proactive activity cancellation without waiting for heartbeat. This is the task creation leg of the flow. 1. [#9231] Store `worker_control_task_queue` in `ActivityInfo` at activity start. 2. **[This PR]** On `RequestCancelActivityTask`, batch commands by control queue into `WorkerCommandsTask` outbound tasks. 3. [#9233] Dispatch each task as a Nexus `ExecuteCommands` operation to the worker, with a 3-attempt retry cap. 4. [SDK] Worker receives the cancel command and cancels the running activity. Gated by dynamic config `EnableCancelActivityWorkerCommand` (default: off). ## How did you test it? **Unit tests** cover task generation, command batching (including multi-queue batching), task serialization round-tripping, and the feature-flag-off path. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restore tab alignment for TaskDiscarded in metric_defs.go - Remove extra blank line in handler test - Replace assert.Equal/Contains/Empty with require equivalents in dispatcher, handler, and dispatch response tests - Remove unused assert imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Nexus service descriptor (WorkerService.ServiceName, WorkerService.ExecuteCommands.Name()) is not yet published in go.temporal.io/api v1.62.8. Use hardcoded constants until it is. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Dispatches worker commands (starting with activity cancellation) to workers via their Nexus control queue. When the outbound queue processes a
WorkerCommandsTask, the dispatcher sends anExecuteCommandsNexus operation to the worker's control queue viaDispatchNexusTask. Retries are capped at 3 attempts since these commands are best-effort (the activity will eventually time out anyway).Suggested review order:
worker_commands_task_dispatcher.go→recordactivitytaskstarted/api.go(clock storage for task token reconstruction).Why
To support activity cancellation without activity heartbeat. This is the dispatch leg of the flow:
worker_control_task_queueinActivityInfoat activity start.RequestCancelActivityTask, batch commands by control queue intoWorkerCommandsTaskoutbound tasks.ExecuteCommandsoperation to the worker, with a 3-attempt retry cap.Gated by dynamic config
EnableCancelActivityWorkerCommand(default: off).How did you test it?
Unit tests cover all dispatch outcomes (success, RPC error, timeout, operation error, feature-flag-off, max-attempts-exceeded) and response-to-error conversion paths. Functional test verifies end-to-end: cancel request → Nexus dispatch → correct payload arrives on the control queue, and asserts that the cancel command's task token matches the one from the original activity poll response.