Request: Propagate abort signal to HTTP fetch layer for instant stream cancellation

### What feature would you like to see?

## Problem

The CLI's stop/cancel mechanism is **cooperative-only**. When a user presses the stop button (or the extension sends a wire `cancel` request), the `_handle_cancel()` method sets an `asyncio.Event` that is checked at the next `await` boundary inside `kosong.generate()`.

If the LLM is in a long "thinking" block or analysis paralysis, the next chunk may not arrive for **minutes**. During this time:
- The HTTP connection stays open
- The LLM server keeps generating tokens
- The user pays for unwanted output
- The UI appears frozen

## Root Cause

`kosong.generate()` consumes the entire HTTP response stream **internally** before returning. The `StepResult` returned by `kosong.step()` already contains the fully materialized message — the stream is exhausted. There is no live stream handle exposed that the caller can close to force an immediate TCP teardown.

This means `asyncio.Task.cancel()` is the only mechanism, and it only takes effect at the next `await` inside the `async for part in stream:` loop.

## How other projects solve this

**Antigravity** (and browser-based UIs generally) use `AbortController` / `AbortSignal` propagated through the entire stack to the HTTP fetch layer. When `.abort()` is called, the runtime **immediately closes the TCP connection**. `stream.__anext__()` throws instantly — no waiting for the next chunk.

## Proposed Path Forward

Add abort-signal support to `kosong.generate()`:

1. Accept an optional `abort_event: asyncio.Event` parameter
2. Check `abort_event.is_set()` inside the `async for part in stream:` loop
3. If set, **close the underlying HTTP response** (`response.aclose()` / `response.close()` depending on provider)
4. `stream.__anext__()` throws immediately, `generate()` exits within milliseconds

This requires provider-specific stream closure:
- OpenAI Responses: `stream.close()`
- Anthropic: `stream.close()`
- Google GenAI: generator `aclose()`
- Kimi native: `aclose()` if available

## Why this matters

| Scenario | Current behavior | With abort signal |
|----------|---------------|-------------------|
| Model thinking for 60s | Stop button does nothing for 60s | Stop button works in <100ms |
| User realizes wrong prompt immediately | Pays for full generation | Saves tokens |
| Extension UX | "Stop" feels broken | "Stop" is responsive |

## Workarounds considered

- **Closing the HTTP client from outside `generate()`** — Would destroy the connection pool and break future turns. Not viable.
- **Using `asyncio.wait_for()` with a short timeout** — Doesn't solve the problem; just moves the stall point.
- **Cooperative cancellation only** — This is the current state and is insufficient for long-thinking models.

## Environment

- kimi-cli version: 1.44.0 (and all prior versions)
- kosong version: bundled with kimi-cli
- Affected modes: wire, acp, shell (all modes that use `KimiSoul`)

---

Would the maintainers be open to a PR that adds `abort_event` support to `kosong.generate()` and the chat provider implementations? I'm happy to implement this if there's agreement on the API shape.

### Additional information

I'm working on a kimi-cli fork right now and trying to maintain upstream compatibility, so I wanted to raise this here rather than hack around it locally. I hit this constantly during a complex reverse engineering session — the stop button would sit unresponsive for minutes while the model was deep in reasoning. Having used tools with proper fetch-layer abort (e.g., Antigravity's AbortController propagation), the difference in UX is night and day. Would love to see this in Kimi Code proper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Propagate abort signal to HTTP fetch layer for instant stream cancellation #2375

What feature would you like to see?

Problem

Root Cause

How other projects solve this

Proposed Path Forward

Why this matters

Workarounds considered

Environment

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	Current behavior	With abort signal
Model thinking for 60s	Stop button does nothing for 60s	Stop button works in <100ms
User realizes wrong prompt immediately	Pays for full generation	Saves tokens
Extension UX	"Stop" feels broken	"Stop" is responsive

Request: Propagate abort signal to HTTP fetch layer for instant stream cancellation #2375

Description

What feature would you like to see?

Problem

Root Cause

How other projects solve this

Proposed Path Forward

Why this matters

Workarounds considered

Environment

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions