Summary
When Kimi runs as an ACP server (kimi acp), it does not report token usage to the connected ACP client — even though (a) Kimi already computes per-step usage internally and (b) the ACP schema bundled with Kimi has first-class fields for it. As a result, ACP host applications that orchestrate Kimi (multi-agent runners, cost/quota dashboards) see zero token usage for Kimi turns and cannot meter cost or context.
Environment
- kimi-cli 1.37.0 (installed via uv)
- Mode:
kimi acp (ACP server), driven by an external ACP client
Current behaviour
Kimi clearly has the data. The CLI emits a StatusUpdate wire message carrying per-step token usage:
kimi_cli/wire/types.py
class StatusUpdate(BaseModel):
context_usage: float | None = None
context_tokens: int | None = None
max_context_tokens: int | None = None
token_usage: TokenUsage | None = None # {input_other, output, input_cache_read, input_cache_creation}
...
Observed in a real session's wire.jsonl:
{"type":"StatusUpdate","payload":{
"context_tokens":7426,"max_context_tokens":262144,
"token_usage":{"input_other":2306,"output":420,"input_cache_read":5120,"input_cache_creation":0}
}}
But the ACP session handler discards it. In kimi_cli/acp/session.py, SessionImpl.prompt():
case StatusUpdate():
pass # <-- token usage dropped here
…and every turn returns a PromptResponse with no usage:
return acp.PromptResponse(stop_reason="end_turn") # likewise "max_turn_requests" / "cancelled"
The ACP server never constructs an ACP Usage object anywhere in kimi_cli/acp/.
The ACP schema already supports this
The bundled acp/schema.py defines exactly the right structure, currently unused by the server:
class Usage(BaseModel):
cached_read_tokens / cached_write_tokens / input_tokens / output_tokens / thought_tokens / total_tokens
class PromptResponse(BaseModel):
stop_reason: StopReason
usage: Optional[Usage] = None # "Token usage for this turn (optional)" (annotated **UNSTABLE**)
Proposed change
Populate PromptResponse.usage from the turn's accumulated StatusUpdate.token_usage, e.g.:
ACP Usage |
from token_usage |
input_tokens |
input_other |
output_tokens |
output |
cached_read_tokens |
input_cache_read |
cached_write_tokens |
input_cache_creation |
total_tokens |
sum |
(Maintainers know the exact accounting better than I do — this is just the obvious mapping.)
I realise PromptResponse.usage is annotated UNSTABLE in the current schema. If you'd rather not depend on it yet, exposing the same numbers via the _meta extensibility field on the prompt response — or a session/update notification at end-of-turn — would also let clients meter usage today.
Why it matters / use case
ACP host applications orchestrate Kimi alongside other engines and want a unified cost/usage view. Today every Kimi turn reports 0 tokens to the host, so Kimi work is invisible in any host-side metering — including direct Kimi-vs-other-engine comparisons. Kimi already pays the cost of computing these numbers; surfacing them over ACP makes them usable.
Not a duplicate of #1517 (ACP terminal execution on Windows) or #2024 (in-app subagent statusbar visibility) — this is specifically about emitting per-turn token usage to the ACP client.
Summary
When Kimi runs as an ACP server (
kimi acp), it does not report token usage to the connected ACP client — even though (a) Kimi already computes per-step usage internally and (b) the ACP schema bundled with Kimi has first-class fields for it. As a result, ACP host applications that orchestrate Kimi (multi-agent runners, cost/quota dashboards) see zero token usage for Kimi turns and cannot meter cost or context.Environment
kimi acp(ACP server), driven by an external ACP clientCurrent behaviour
Kimi clearly has the data. The CLI emits a
StatusUpdatewire message carrying per-step token usage:kimi_cli/wire/types.pyObserved in a real session's
wire.jsonl:{"type":"StatusUpdate","payload":{ "context_tokens":7426,"max_context_tokens":262144, "token_usage":{"input_other":2306,"output":420,"input_cache_read":5120,"input_cache_creation":0} }}But the ACP session handler discards it. In
kimi_cli/acp/session.py,SessionImpl.prompt():…and every turn returns a
PromptResponsewith no usage:The ACP server never constructs an ACP
Usageobject anywhere inkimi_cli/acp/.The ACP schema already supports this
The bundled
acp/schema.pydefines exactly the right structure, currently unused by the server:Proposed change
Populate
PromptResponse.usagefrom the turn's accumulatedStatusUpdate.token_usage, e.g.:Usagetoken_usageinput_tokensinput_otheroutput_tokensoutputcached_read_tokensinput_cache_readcached_write_tokensinput_cache_creationtotal_tokens(Maintainers know the exact accounting better than I do — this is just the obvious mapping.)
I realise
PromptResponse.usageis annotated UNSTABLE in the current schema. If you'd rather not depend on it yet, exposing the same numbers via the_metaextensibility field on the prompt response — or asession/updatenotification at end-of-turn — would also let clients meter usage today.Why it matters / use case
ACP host applications orchestrate Kimi alongside other engines and want a unified cost/usage view. Today every Kimi turn reports 0 tokens to the host, so Kimi work is invisible in any host-side metering — including direct Kimi-vs-other-engine comparisons. Kimi already pays the cost of computing these numbers; surfacing them over ACP makes them usable.
Not a duplicate of #1517 (ACP terminal execution on Windows) or #2024 (in-app subagent statusbar visibility) — this is specifically about emitting per-turn token usage to the ACP client.