fix(workflow-executor): add per-invocation AI timeout to surface hanging provider errors [PRD-409] by matthv · Pull Request #1609 · ForestAdmin/agent-nodejs

matthv · 2026-05-28T14:52:22Z

Summary

When the AI provider hangs (no response, internal retries, or holds the connection open), the previous code relied on the global STEP_TIMEOUT_MS (default 5 min) to fail the step. From the user's perspective this looks like an infinite spinner.

This PR adds a dedicated timeout on each AI invocation (default 60s, configurable via AI_INVOKE_TIMEOUT_MS) using AbortController + signal so the underlying HTTP request is actually cancelled.

On timeout, the executor throws the new AiInvokeTimeoutError, which BaseStepExecutor.execute() converts to an error outcome with a user-friendly message — the orchestrator then sets context.error on the step and the frontend exits its isLoading state immediately.

Why not just lower STEP_TIMEOUT_MS globally

STEP_TIMEOUT_MS covers more than the AI call (it also covers slow agent fetches, DB lookups, etc.). Lowering it globally would kill legitimately slow non-AI work. A dedicated AI timeout is more surgical.

Changes

defaults.ts: new DEFAULT_AI_INVOKE_TIMEOUT_MS = 60_000
errors.ts: new AiInvokeTimeoutError extends WorkflowExecutorError with provider-specific user message
base-step-executor.ts: invokeWithTools now wraps model.invoke with AbortController + timeout
Config plumbing through RunnerConfig → StepContextConfig → ExecutionContext
cli-core.ts: parse AI_INVOKE_TIMEOUT_MS env var
6 new unit tests covering timeout fires, signal is passed, disabled when unset/<=0, non-abort errors rethrown as-is, timer cleared on success

fixes PRD-409

Test plan

811 unit tests pass (6 new)
Lint: 0 errors (6 pre-existing warnings unrelated)
Live test: with SIMULATE_AI_HANG=1 AI_INVOKE_TIMEOUT_MS=10000, the frontend shows the new user message after 10s instead of spinning for 5min
Reviewer to confirm 60s default is appropriate (vs e.g. 30s or 120s)

🤖 Generated with Claude Code

Note

Add per-invocation AI timeout to surface hanging provider errors in workflow executor

Adds aiInvokeTimeoutMs (default 60,000ms) to the workflow executor's RunnerConfig, ExecutionContext, and ExecutorOptions, configurable via the AI_INVOKE_TIMEOUT_MS environment variable.
In BaseStepExecutor.invokeWithTools, wraps AI provider calls with an AbortController timer; if the provider hangs past the timeout, the invocation is aborted and throws AiInvokeTimeoutError.
Introduces AiInvokeTimeoutError with a user-facing retry message to distinguish timeout failures from other AI errors.
Setting aiInvokeTimeoutMs to 0 or leaving it unset disables the timeout, preserving existing behavior.
Risk: AI invocations that previously hung indefinitely will now fail after 60s by default, which may surface as new errors in workflows that relied on slow providers.

^{Macroscope summarized 1718cb4.}

…ing provider errors [PRD-409] When the AI provider hangs (no response, internal retries, or holds the connection open), the previous code relied on the global STEP_TIMEOUT_MS (default 5 min) to fail the step. From the user's perspective this looks like an infinite spinner. Add a dedicated timeout on each AI invocation (default 60s, configurable via AI_INVOKE_TIMEOUT_MS) using AbortController + signal so the underlying HTTP request is actually cancelled. On timeout, throws the new AiInvokeTimeoutError, which BaseStepExecutor.execute() converts to an error outcome with a user-friendly message — the orchestrator then sets context.error on the step and the frontend exits its isLoading state immediately. fixes PRD-409 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear · 2026-05-28T14:52:27Z

PRD-409

qltysh · 2026-05-28T14:53:38Z

1 new issue

Tool	Category	Rule	Count
qlty	Structure	Function with high complexity (count = 13): invokeWithTools	1

qltysh · 2026-05-28T14:58:23Z

Coverage Impact

This PR will not change total coverage.

Modified Files with Diff Coverage (3)

Rating	File	% Diff	Uncovered Line #s
	packages/workflow-executor/src/executors/base-step-executor.ts	100.0%
	packages/workflow-executor/src/errors.ts	100.0%
	packages/workflow-executor/src/defaults.ts	100.0%
	Total	100.0%

🚦 See full report on Qlty Cloud »

🛟 Help

Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.
Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.
File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)
- Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflow-executor): add per-invocation AI timeout to surface hanging provider errors [PRD-409]#1609

fix(workflow-executor): add per-invocation AI timeout to surface hanging provider errors [PRD-409]#1609
matthv wants to merge 1 commit into
feat/prd-214-server-step-mapperfrom
fix/prd-409-ai-invoke-timeout

matthv commented May 28, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

linear Bot commented May 28, 2026

Uh oh!

qltysh Bot commented May 28, 2026

Uh oh!

qltysh Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matthv commented May 28, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why not just lower STEP_TIMEOUT_MS globally

Changes

Test plan

Add per-invocation AI timeout to surface hanging provider errors in workflow executor

Uh oh!

linear Bot commented May 28, 2026

Uh oh!

qltysh Bot commented May 28, 2026

1 new issue

Uh oh!

qltysh Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

matthv commented May 28, 2026 •

edited by macroscopeapp Bot

Loading