[bug] --acp 模式下 extended thinking + tool_use 同回合时间歇性触发 400

## 现象

`--acp` 模式下，多轮对话中如果触发 extended thinking 同时还要调工具，**约 60% 概率**触发：

```
API Error: 400 due to tool use concurrency issues.
```

包到 [src/services/api/errors.ts:697](src/services/api/errors.ts#L697) 上游的原始 Anthropic 错误是：

> `tool_use` ids were found without `tool_result` blocks immediately after

错误文字是误导性的——后面会看到，**真实原因是 payload 里出现了连续两条 user 消息**（违反 role alternation），但 Anthropic 把它当成 tool_use/tool_result 配对错误报回来。

## 复现环境

- upstream: `c499bfb4`（HEAD of `claude-code-best/claude-code` main 当前）
- runtime: Bun 1.3.14, Node 24.12（Node 跑会先死在 undici `markResourceTiming`，必须用 `ccb-bun`）
- platform: Windows 11
- auth: OAuth 已登录（`~/.claude/.credentials.json`）

## 复现脚本

测试脚本直接 spawn `cli-bun.js --acp`、用 `@agentclientprotocol/sdk` 走 ACP 协议发两轮 prompt，每轮都触发 Bash 工具：

```js
// test-acp.mjs
import { spawn } from 'node:child_process'
import { Readable, Writable } from 'node:stream'
import * as acp from 'file:///<path-to>/node_modules/@agentclientprotocol/sdk/dist/acp.js'

const child = spawn('bun', ['dist/cli-bun.js', '--acp'], { stdio: ['pipe', 'pipe', 'pipe'] })
const stream = acp.ndJsonStream(Writable.toWeb(child.stdin), Readable.toWeb(child.stdout))
const conn = new acp.ClientSideConnection(_ => myClient, stream)

await conn.initialize({ protocolVersion: acp.PROTOCOL_VERSION, clientInfo: {...}, clientCapabilities: {...} })
const session = await conn.newSession({ cwd: process.cwd(), mcpServers: [] })

await conn.prompt({ sessionId: session.sessionId, prompt: [{ type: 'text', text: 'Run `ls` and report.' }] })
// ← 第一轮成功

await conn.prompt({ sessionId: session.sessionId, prompt: [{ type: 'text', text: 'Now run `pwd`.' }] })
// ← 约 60% 在这里 400
```

5 跑 3 崩 2 过，间歇性，**强相关于第二轮模型是否使用 extended thinking**。

## 诊断输出

给 [src/services/api/errors.ts:222](src/services/api/errors.ts#L222) 的 `logToolUseToolResultMismatch` 加一行 `console.error`（env gated），跑失败一次拿到：

```
[TACO_DEBUG] tool_use/tool_result mismatch
  orphan tool_use_id: toolu_018UuqziqdtMpuhisagLv39q
  normalizedToolUseIndex=5/8  originalToolUseIndex=9/14

  normalizedSequence (what follows the orphan in the API payload):
    user:tool_result:toolu_018UuqziqdtMpuhisagLv39q
    user:string_content                         ← ❌ 多了一条 user

  preNormalizedSequence (what follows the orphan in mutableMessages):
    user:tool_result:toolu_018UuqziqdtMpuhisagLv39q
    assistant:thinking
    assistant:tool_use:toolu_018UuqziqdtMpuhisagLv39q   ← ❌ 同一个 ID 出现两次
    user:tool_result:toolu_018UuqziqdtMpuhisagLv39q
```

## 因果分析

```
[模型用 extended thinking + tool_use 同一回合回应]
   ↓
src/services/api/claude.ts 流式装配把同一个 assistant 消息 push 两次到 mutableMessages
   （怀疑是 [thinking] 块 stop 时 yield 一次，[thinking, tool_use] 完整 stop 时再 yield 一次）
   ↓
mutableMessages 里出现重复 tool_use ID 和重复 tool_result
   ↓
ensureToolResultPairing (src/utils/messages.ts:5568) 正确删了第二个 tool_use ✓
但留下了 assistant 只剩 [thinking] 一块
对应的第二个 tool_result 变孤儿
   ↓
进 messages.ts:5837 的「empty after strip → push NO_CONTENT_MESSAGE」分支
塞了一条 user:"(no content)" 占位（NO_CONTENT_MESSAGE 是字符串而非 content block）
   ↓
normalizeMessagesForAPI 把只有 thinking 的 assistant 干掉了
   ↓
最终 payload: ... assistant[tool_use(X)] → user[tool_result(X)] → user["(no content)"]
                                            └── 两条 user 连着 ──┘
   ↓
Anthropic 400 (错误文字提示 tool_use/tool_result 配对，实际是 role alternation 违规)
```

## 修复建议

### 战术修（小，立即解 400）

`src/utils/messages.ts:5837` 那个分支 push `NO_CONTENT_MESSAGE` 之前加一个 user→user 检查：

```diff
       } else {
+        // If the previous result entry is already a user, inserting another
+        // user placeholder creates consecutive-user messages which Anthropic
+        // rejects with a misleading "tool_use without tool_result" 400.
+        // Skip the placeholder — alternation is preserved by the next
+        // assistant message in the loop.
+        if (result.at(-1)?.message?.role === 'user') {
+          i++
+          continue
+        }
         // Content is empty after stripping orphaned tool_results. We still
         // need a user message here to maintain role alternation — ...
         i++
         result.push(
           createUserMessage({
             content: NO_CONTENT_MESSAGE,
             isMeta: true,
           }),
         )
       }
```

注释里写的 "维持 alternation" 在 assistant→assistant 之间是对的，但当前一条已经是 user 时反而把局面搅坏了。

### 根因修（大，从源头杜绝）

`src/services/api/claude.ts` 流式装配逻辑：extended thinking + tool_use 同一 turn 时为何会把 assistant 消息双 push 到 `mutableMessages`。具体得追 `content_block_stop` / `message_delta` 的处理流程，但这是真正的 root cause——战术 fix 只是堵住下游表象。

## 备注

- 这个 bug 跟 `acp-link` 包无关——`packages/acp-link` 是纯 WebSocket↔stdio passthrough，不动 tool_use/tool_result。直接 spawn `cli-bun --acp` 和经 `acp-link` 都触发。
- 上游 Anthropic 官方 `@anthropic-ai/claude-code` 没有 `--acp` 模式，无可对比。
- 我已经准备好战术 fix 的本地 patch，如果维护者认可方向我可以提 PR。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] --acp 模式下 extended thinking + tool_use 同回合时间歇性触发 400 #1244

现象

复现环境

复现脚本

诊断输出

因果分析

修复建议

战术修（小，立即解 400）

根因修（大，从源头杜绝）

备注

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[bug] --acp 模式下 extended thinking + tool_use 同回合时间歇性触发 400 #1244

Description

现象

复现环境

复现脚本

诊断输出

因果分析

修复建议

战术修（小，立即解 400）

根因修（大，从源头杜绝）

备注

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions