Skip to content

Request: Propagate abort signal to HTTP fetch layer for instant stream cancellation #2375

@zemuro

Description

@zemuro

What feature would you like to see?

Problem

The CLI's stop/cancel mechanism is cooperative-only. When a user presses the stop button (or the extension sends a wire cancel request), the _handle_cancel() method sets an asyncio.Event that is checked at the next await boundary inside kosong.generate().

If the LLM is in a long "thinking" block or analysis paralysis, the next chunk may not arrive for minutes. During this time:

  • The HTTP connection stays open
  • The LLM server keeps generating tokens
  • The user pays for unwanted output
  • The UI appears frozen

Root Cause

kosong.generate() consumes the entire HTTP response stream internally before returning. The StepResult returned by kosong.step() already contains the fully materialized message — the stream is exhausted. There is no live stream handle exposed that the caller can close to force an immediate TCP teardown.

This means asyncio.Task.cancel() is the only mechanism, and it only takes effect at the next await inside the async for part in stream: loop.

How other projects solve this

Antigravity (and browser-based UIs generally) use AbortController / AbortSignal propagated through the entire stack to the HTTP fetch layer. When .abort() is called, the runtime immediately closes the TCP connection. stream.__anext__() throws instantly — no waiting for the next chunk.

Proposed Path Forward

Add abort-signal support to kosong.generate():

  1. Accept an optional abort_event: asyncio.Event parameter
  2. Check abort_event.is_set() inside the async for part in stream: loop
  3. If set, close the underlying HTTP response (response.aclose() / response.close() depending on provider)
  4. stream.__anext__() throws immediately, generate() exits within milliseconds

This requires provider-specific stream closure:

  • OpenAI Responses: stream.close()
  • Anthropic: stream.close()
  • Google GenAI: generator aclose()
  • Kimi native: aclose() if available

Why this matters

Scenario Current behavior With abort signal
Model thinking for 60s Stop button does nothing for 60s Stop button works in <100ms
User realizes wrong prompt immediately Pays for full generation Saves tokens
Extension UX "Stop" feels broken "Stop" is responsive

Workarounds considered

  • Closing the HTTP client from outside generate() — Would destroy the connection pool and break future turns. Not viable.
  • Using asyncio.wait_for() with a short timeout — Doesn't solve the problem; just moves the stall point.
  • Cooperative cancellation only — This is the current state and is insufficient for long-thinking models.

Environment

  • kimi-cli version: 1.44.0 (and all prior versions)
  • kosong version: bundled with kimi-cli
  • Affected modes: wire, acp, shell (all modes that use KimiSoul)

Would the maintainers be open to a PR that adds abort_event support to kosong.generate() and the chat provider implementations? I'm happy to implement this if there's agreement on the API shape.

Additional information

I'm working on a kimi-cli fork right now and trying to maintain upstream compatibility, so I wanted to raise this here rather than hack around it locally. I hit this constantly during a complex reverse engineering session — the stop button would sit unresponsive for minutes while the model was deep in reasoning. Having used tools with proper fetch-layer abort (e.g., Antigravity's AbortController propagation), the difference in UX is night and day. Would love to see this in Kimi Code proper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions