Skip to content

feat(acp): expose per-turn token usage to ACP clients (StatusUpdate.token_usage is dropped, PromptResponse.usage left empty) #2394

@javierbarroso22-dev

Description

@javierbarroso22-dev

Summary

When Kimi runs as an ACP server (kimi acp), it does not report token usage to the connected ACP client — even though (a) Kimi already computes per-step usage internally and (b) the ACP schema bundled with Kimi has first-class fields for it. As a result, ACP host applications that orchestrate Kimi (multi-agent runners, cost/quota dashboards) see zero token usage for Kimi turns and cannot meter cost or context.

Environment

  • kimi-cli 1.37.0 (installed via uv)
  • Mode: kimi acp (ACP server), driven by an external ACP client

Current behaviour

Kimi clearly has the data. The CLI emits a StatusUpdate wire message carrying per-step token usage:

kimi_cli/wire/types.py

class StatusUpdate(BaseModel):
    context_usage: float | None = None
    context_tokens: int | None = None
    max_context_tokens: int | None = None
    token_usage: TokenUsage | None = None   # {input_other, output, input_cache_read, input_cache_creation}
    ...

Observed in a real session's wire.jsonl:

{"type":"StatusUpdate","payload":{
  "context_tokens":7426,"max_context_tokens":262144,
  "token_usage":{"input_other":2306,"output":420,"input_cache_read":5120,"input_cache_creation":0}
}}

But the ACP session handler discards it. In kimi_cli/acp/session.py, SessionImpl.prompt():

case StatusUpdate():
    pass            # <-- token usage dropped here

…and every turn returns a PromptResponse with no usage:

return acp.PromptResponse(stop_reason="end_turn")     # likewise "max_turn_requests" / "cancelled"

The ACP server never constructs an ACP Usage object anywhere in kimi_cli/acp/.

The ACP schema already supports this

The bundled acp/schema.py defines exactly the right structure, currently unused by the server:

class Usage(BaseModel):
    cached_read_tokens / cached_write_tokens / input_tokens / output_tokens / thought_tokens / total_tokens

class PromptResponse(BaseModel):
    stop_reason: StopReason
    usage: Optional[Usage] = None     # "Token usage for this turn (optional)"  (annotated **UNSTABLE**)

Proposed change

Populate PromptResponse.usage from the turn's accumulated StatusUpdate.token_usage, e.g.:

ACP Usage from token_usage
input_tokens input_other
output_tokens output
cached_read_tokens input_cache_read
cached_write_tokens input_cache_creation
total_tokens sum

(Maintainers know the exact accounting better than I do — this is just the obvious mapping.)

I realise PromptResponse.usage is annotated UNSTABLE in the current schema. If you'd rather not depend on it yet, exposing the same numbers via the _meta extensibility field on the prompt response — or a session/update notification at end-of-turn — would also let clients meter usage today.

Why it matters / use case

ACP host applications orchestrate Kimi alongside other engines and want a unified cost/usage view. Today every Kimi turn reports 0 tokens to the host, so Kimi work is invisible in any host-side metering — including direct Kimi-vs-other-engine comparisons. Kimi already pays the cost of computing these numbers; surfacing them over ACP makes them usable.

Not a duplicate of #1517 (ACP terminal execution on Windows) or #2024 (in-app subagent statusbar visibility) — this is specifically about emitting per-turn token usage to the ACP client.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions