Skip to content

fix(server): emit keep-alive newlines during /session/:id/message#25959

Open
nez wants to merge 1 commit intoanomalyco:devfrom
nez:fix/session-message-keepalive
Open

fix(server): emit keep-alive newlines during /session/:id/message#25959
nez wants to merge 1 commit intoanomalyco:devfrom
nez:fix/session-message-keepalive

Conversation

@nez
Copy link
Copy Markdown

@nez nez commented May 6, 2026

Problem

The POST /session/:sessionID/message handler awaits SessionPrompt.prompt() to fully complete before writing any byte to the response stream:

return stream(c, async (stream) => {
  const sessionID = c.req.valid("param").sessionID
  const body = c.req.valid("json")
  const msg = await SessionPrompt.prompt({ ...body, sessionID })  // can take 60+ minutes
  stream.write(JSON.stringify(msg))                                // first write happens here
})

For long synchronous tool calls (large kubectl get events -A, multi-step gh pr create flows, anything that produces no streaming bus events between LLM calls), the wait can exceed many minutes during which no application bytes flow on the response stream. The TCP connection stays open with no traffic.

HTTP clients with finite per-recv timeouts then ReadTimeout before any byte arrives. We hit this in production with a downstream workflow runner using httpx; the failure stack lands in _receive_response_headers:

httpcore/_async/http11.py:177  in _receive_response_headers
ApplicationFailureInfo: ReadTimeout
duration: 3604s  (matches the client's read=3600 ceiling)

Failures were 100% reproducible on hourly cluster-health agent runs.

Fix

Emit a \n keep-alive every 30s on the response stream while SessionPrompt.prompt runs. The response stays valid application/json because JSON parsers ignore leading whitespace before a value (verified via JSON.parse("\n\n\n{...}") and Python json.loads(b"\n\n\n{...}")).

const keepalive = setInterval(() => {
  stream.write("\n").catch(() => {})
}, 30_000)

try {
  const msg = await SessionPrompt.prompt({ ...body, sessionID })
  clearInterval(keepalive)
  await stream.write(JSON.stringify(msg))
} finally {
  clearInterval(keepalive)
}

clearInterval runs before the JSON body write so no keep-alive newline can interleave between bytes of the JSON value. The finally block covers the throw path.

Scope

  • Only POST /session/:sessionID/message is changed. The sibling /command and /shell endpoints use c.json(msg) (non-streaming), which is a different code path with different semantics — left alone here to keep the diff minimal. If they hit the same problem in practice they can be migrated in a follow-up.
  • The hey-api-generated TypeScript SDK consumes the response with response.json(), which calls JSON.parse(text) and tolerates leading whitespace. No SDK change needed.
  • application/json content-type header is preserved; clients expecting a strict JSON byte stream still get a parseable body.

Test plan

  • bun turbo typecheck --filter=opencode passes
  • (reviewer) verify a long-running prompt no longer triggers ReadTimeout in clients with a 3600s recv ceiling
  • (reviewer) verify the existing JS SDK / TUI / desktop clients still parse the response correctly (expected: yes, they read the full body and JSON.parse it)

The POST /session/:sessionID/message handler awaits the entire
SessionPrompt.prompt() before writing any response bytes. For long
synchronous tool calls (large `kubectl get events -A`, multi-step
`gh pr create` flows, etc.) the wait can exceed many minutes — during
which no application bytes flow on the response stream and the TCP
connection is held open with no traffic.

HTTP clients with finite per-recv timeouts then ReadTimeout before any
byte arrives. We observed downstream workflow runners hitting this on
every long-running cluster-health agent run (60+ min) until they
disabled their timeout entirely; that loses the ability to detect
genuinely dead connections.

Emit a `\n` every 30s while the prompt runs. The response remains
application/json since JSON parsers ignore leading whitespace before
the value. clearInterval on completion (and in the finally block)
prevents stray writes from interleaving with the JSON body.

Tested: `bun turbo typecheck --filter=opencode` passes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request May 6, 2026
- anomalyco/opencode#25959 keep-alive newlines on POST /session/:id/message [merge-after-nits]
- anomalyco/opencode#25855 wide-text paste-summary order fix via Intl.Segmenter [merge-after-nits]
- openai/codex#21290 extract codex-file-watcher crate from core [merge-after-nits]
- openai/codex#21272 add 'compact' SessionStartSource with FIFO queue [merge-after-nits]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:compliance This means the issue will auto-close after 2 hours. needs:issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant