Skip to content

fix(core): a chunk timeout when processing llm stream#16366

Open
jlongster wants to merge 2 commits intodevfrom
jlongster/chunk-timeout
Open

fix(core): a chunk timeout when processing llm stream#16366
jlongster wants to merge 2 commits intodevfrom
jlongster/chunk-timeout

Conversation

@jlongster
Copy link
Contributor

@jlongster jlongster commented Mar 6, 2026

It's possible for an SSE stream to stall. We handle some of the cases, but when reading the SSE stream from the provider, we are currently prone to it stalling in the case where the OS freezes network requests (due to going to sleep or other reasons), and the provider closes the request. When the OS unfreezes the network request, the SSE stream will hang because it never receives any notification that the other end has closed it.

(The above may not be 100% correct, I am not super familiar with the specifics of TCP requests when it comes to SSE. It seems to sometimes be able to become aware that the other end has closed it, but I am to reliably reproduce a stalled SSE connection by forcing the OS to freeze network requests, waiting a while, and unfreezing them)

We need to timeout if we don't receive a chunk from the SSE stream after a period of time. It's currently set to 2 minutes which is quite high; this should handle the case of a stalled connection with no impact on normal slow SSE streams. Note that this does NOT timeout the initial fetch to get the SSE stream; that may take a while, but once streaming starts, 2 minutes should be a very high upper bound (I scanned some other projects, some people set it to 30 seconds)

AI SDK has this feature: https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text#timeout. However they added that in v6 and we are on v5; still, it's a pattern for us to handle timeouts ourselves so we can control it.

I originally added this in the processor loop, but I quickly realized we can't do it there. We need to do it much more low-level. The processor loop may wait a while if tools are running, especially for things like subagents. We need to do it directly for every call to /messages, which will work because that is a direct start, stream, and finish turn. This PR does that.

It adds a new chunkTimeout option in the providers config, so users can customize this.

Testing

I manually test this by running complex conversations that invoked long-running tools and subagents. At this low-level, it works to apply a timeout here as each fetch to the provider endpoint starts a stream that will end when the chunks are fully streamed. I added extensive logging and everything looked correct. I also test overriding this config per model.

@jlongster
Copy link
Contributor Author

jlongster commented Mar 6, 2026

Hm, this is not the right solution. I'll open a new PR

We need to do this at a much lower-level; the events coming from this stream may be very slow due to tool calls etc

Nevermind, just forcing over this PR

@jlongster jlongster force-pushed the jlongster/chunk-timeout branch from a306a33 to 5df0c6e Compare March 6, 2026 19:07
@jlongster jlongster requested a review from rekram1-node March 6, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant