fix(workflow-executor): add per-invocation AI timeout to surface hanging provider errors [PRD-409]#1609
Open
matthv wants to merge 1 commit into
Open
Conversation
…ing provider errors [PRD-409] When the AI provider hangs (no response, internal retries, or holds the connection open), the previous code relied on the global STEP_TIMEOUT_MS (default 5 min) to fail the step. From the user's perspective this looks like an infinite spinner. Add a dedicated timeout on each AI invocation (default 60s, configurable via AI_INVOKE_TIMEOUT_MS) using AbortController + signal so the underlying HTTP request is actually cancelled. On timeout, throws the new AiInvokeTimeoutError, which BaseStepExecutor.execute() converts to an error outcome with a user-friendly message — the orchestrator then sets context.error on the step and the frontend exits its isLoading state immediately. fixes PRD-409 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 new issue
|
|
Coverage Impact This PR will not change total coverage. Modified Files with Diff Coverage (3)
🛟 Help
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
When the AI provider hangs (no response, internal retries, or holds the connection open), the previous code relied on the global
STEP_TIMEOUT_MS(default 5 min) to fail the step. From the user's perspective this looks like an infinite spinner.This PR adds a dedicated timeout on each AI invocation (default 60s, configurable via
AI_INVOKE_TIMEOUT_MS) usingAbortController+signalso the underlying HTTP request is actually cancelled.On timeout, the executor throws the new
AiInvokeTimeoutError, whichBaseStepExecutor.execute()converts to an error outcome with a user-friendly message — the orchestrator then setscontext.erroron the step and the frontend exits itsisLoadingstate immediately.Why not just lower STEP_TIMEOUT_MS globally
STEP_TIMEOUT_MScovers more than the AI call (it also covers slow agent fetches, DB lookups, etc.). Lowering it globally would kill legitimately slow non-AI work. A dedicated AI timeout is more surgical.Changes
defaults.ts: newDEFAULT_AI_INVOKE_TIMEOUT_MS = 60_000errors.ts: newAiInvokeTimeoutError extends WorkflowExecutorErrorwith provider-specific user messagebase-step-executor.ts:invokeWithToolsnow wrapsmodel.invokewithAbortController+ timeoutRunnerConfig→StepContextConfig→ExecutionContextcli-core.ts: parseAI_INVOKE_TIMEOUT_MSenv varfixes PRD-409
Test plan
SIMULATE_AI_HANG=1 AI_INVOKE_TIMEOUT_MS=10000, the frontend shows the new user message after 10s instead of spinning for 5min🤖 Generated with Claude Code
Note
Add per-invocation AI timeout to surface hanging provider errors in workflow executor
aiInvokeTimeoutMs(default 60,000ms) to the workflow executor'sRunnerConfig,ExecutionContext, andExecutorOptions, configurable via theAI_INVOKE_TIMEOUT_MSenvironment variable.BaseStepExecutor.invokeWithTools, wraps AI provider calls with anAbortControllertimer; if the provider hangs past the timeout, the invocation is aborted and throwsAiInvokeTimeoutError.AiInvokeTimeoutErrorwith a user-facing retry message to distinguish timeout failures from other AI errors.aiInvokeTimeoutMsto0or leaving it unset disables the timeout, preserving existing behavior.Macroscope summarized 1718cb4.