What feature would you like to see?
Problem
The CLI's stop/cancel mechanism is cooperative-only. When a user presses the stop button (or the extension sends a wire cancel request), the _handle_cancel() method sets an asyncio.Event that is checked at the next await boundary inside kosong.generate().
If the LLM is in a long "thinking" block or analysis paralysis, the next chunk may not arrive for minutes. During this time:
- The HTTP connection stays open
- The LLM server keeps generating tokens
- The user pays for unwanted output
- The UI appears frozen
Root Cause
kosong.generate() consumes the entire HTTP response stream internally before returning. The StepResult returned by kosong.step() already contains the fully materialized message — the stream is exhausted. There is no live stream handle exposed that the caller can close to force an immediate TCP teardown.
This means asyncio.Task.cancel() is the only mechanism, and it only takes effect at the next await inside the async for part in stream: loop.
How other projects solve this
Antigravity (and browser-based UIs generally) use AbortController / AbortSignal propagated through the entire stack to the HTTP fetch layer. When .abort() is called, the runtime immediately closes the TCP connection. stream.__anext__() throws instantly — no waiting for the next chunk.
Proposed Path Forward
Add abort-signal support to kosong.generate():
- Accept an optional
abort_event: asyncio.Event parameter
- Check
abort_event.is_set() inside the async for part in stream: loop
- If set, close the underlying HTTP response (
response.aclose() / response.close() depending on provider)
stream.__anext__() throws immediately, generate() exits within milliseconds
This requires provider-specific stream closure:
- OpenAI Responses:
stream.close()
- Anthropic:
stream.close()
- Google GenAI: generator
aclose()
- Kimi native:
aclose() if available
Why this matters
| Scenario |
Current behavior |
With abort signal |
| Model thinking for 60s |
Stop button does nothing for 60s |
Stop button works in <100ms |
| User realizes wrong prompt immediately |
Pays for full generation |
Saves tokens |
| Extension UX |
"Stop" feels broken |
"Stop" is responsive |
Workarounds considered
- Closing the HTTP client from outside
generate() — Would destroy the connection pool and break future turns. Not viable.
- Using
asyncio.wait_for() with a short timeout — Doesn't solve the problem; just moves the stall point.
- Cooperative cancellation only — This is the current state and is insufficient for long-thinking models.
Environment
- kimi-cli version: 1.44.0 (and all prior versions)
- kosong version: bundled with kimi-cli
- Affected modes: wire, acp, shell (all modes that use
KimiSoul)
Would the maintainers be open to a PR that adds abort_event support to kosong.generate() and the chat provider implementations? I'm happy to implement this if there's agreement on the API shape.
Additional information
I'm working on a kimi-cli fork right now and trying to maintain upstream compatibility, so I wanted to raise this here rather than hack around it locally. I hit this constantly during a complex reverse engineering session — the stop button would sit unresponsive for minutes while the model was deep in reasoning. Having used tools with proper fetch-layer abort (e.g., Antigravity's AbortController propagation), the difference in UX is night and day. Would love to see this in Kimi Code proper.
What feature would you like to see?
Problem
The CLI's stop/cancel mechanism is cooperative-only. When a user presses the stop button (or the extension sends a wire
cancelrequest), the_handle_cancel()method sets anasyncio.Eventthat is checked at the nextawaitboundary insidekosong.generate().If the LLM is in a long "thinking" block or analysis paralysis, the next chunk may not arrive for minutes. During this time:
Root Cause
kosong.generate()consumes the entire HTTP response stream internally before returning. TheStepResultreturned bykosong.step()already contains the fully materialized message — the stream is exhausted. There is no live stream handle exposed that the caller can close to force an immediate TCP teardown.This means
asyncio.Task.cancel()is the only mechanism, and it only takes effect at the nextawaitinside theasync for part in stream:loop.How other projects solve this
Antigravity (and browser-based UIs generally) use
AbortController/AbortSignalpropagated through the entire stack to the HTTP fetch layer. When.abort()is called, the runtime immediately closes the TCP connection.stream.__anext__()throws instantly — no waiting for the next chunk.Proposed Path Forward
Add abort-signal support to
kosong.generate():abort_event: asyncio.Eventparameterabort_event.is_set()inside theasync for part in stream:loopresponse.aclose()/response.close()depending on provider)stream.__anext__()throws immediately,generate()exits within millisecondsThis requires provider-specific stream closure:
stream.close()stream.close()aclose()aclose()if availableWhy this matters
Workarounds considered
generate()— Would destroy the connection pool and break future turns. Not viable.asyncio.wait_for()with a short timeout — Doesn't solve the problem; just moves the stall point.Environment
KimiSoul)Would the maintainers be open to a PR that adds
abort_eventsupport tokosong.generate()and the chat provider implementations? I'm happy to implement this if there's agreement on the API shape.Additional information
I'm working on a kimi-cli fork right now and trying to maintain upstream compatibility, so I wanted to raise this here rather than hack around it locally. I hit this constantly during a complex reverse engineering session — the stop button would sit unresponsive for minutes while the model was deep in reasoning. Having used tools with proper fetch-layer abort (e.g., Antigravity's AbortController propagation), the difference in UX is night and day. Would love to see this in Kimi Code proper.