Skip to content

大 context 请求频繁 ConnectTimeout,httpx connect_timeout 不可配 || Large context requests frequent ConnectTimeout, httpx connect_timeout is not configurable #2384

@1690834643

Description

@1690834643

环境 / Environment

  • kimi-cli 1.44.0
  • Python 3.14.3
  • Linux 6.6.114.1-microsoft-standard-WSL2 (x86_64)
  • Provider: managed:kimi-code, base_url https://api.kimi.com/coding/v1
  • Model: kimi-for-coding (max_context_size=262144)

现象 / Symptom

单个长 session 跑到 context ≥ ~120k input token 之后,每个 step 都有较高概率
在新建 HTTPS 连接阶段超时,报 openai.APITimeoutError: Request timed out.,
底层是 httpcore.ConnectTimeout(连接握手就没完成,不是 read 阶段)。

_run_with_connection_recovery 会重试一次,有时能拉起来,有时直接耗尽
("Chat provider recovery exhausted for step")。

证据 / Evidence

从一个 session 的日志统计:

日志 timeout 次数
2026-05-25 第二份 24
2026-05-27 18
2026-05-28 (当天) 53

所有失败 step 的 input 都 ≥ ~120k token,本 session 已堆到 154k;
中间也有大段时间正常工作(例:13:17–13:20 连续 11 步全部成功,input 135k–149k),
集中爆发出现在几个窄时间窗内(09:40 / 11:28 / 12:38 / 14:41)。

典型堆栈:

File ".../httpcore/_async/connection.py", line 124, in _connect
httpcore.ConnectTimeout
httpx.ConnectTimeout
    raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.
  File ".../kosong/chat_provider/kimi.py", line 170, in generate
    stream = await chat_provider.generate(system_prompt, tools, history)
kosong.chat_provider.APITimeoutError: Request timed out.
  File ".../kimi_cli/soul/kimisoul.py", line 1417, in _run_with_connection_recovery
    raise convert_error(e) from e

api.kimi.com 当时直连健康:

$ curl -o /dev/null -w "connect=%{time_connect}s tls=%{time_appconnect}s\n" https://api.kimi.com/coding/v1/
connect=0.234s tls=0.279s

DNS 解到 volcddos.com(火山引擎抗 D 边缘),所以怀疑是边缘节点对
大 body/高频新连的瞬时拒连
,而 kimi-cli 内部 httpx 的 connect_timeout
偏短(默认 ~5s)直接放弃,既不让用户调,也没有指数退避。

期望 / Expected

  1. 暴露 httpx 的 connect_timeout / 整体 timeoutconfig.toml
    (类似 providers.<name>.http.connect_timeout),让大 context 用户能调高
  2. _run_with_connection_recovery 用指数退避(目前看像固定窄间隔 1 次重试,
    赶上节点限流窗口时一起失败)
  3. 可选:对 ConnectTimeout 单独多重试几次,跟 read-timeout 区分对待
    (因为 connect 阶段服务端还没收到任何 token,重试无副作用)

复现 / Reproduce

  1. 开一个长 session,持续推进让 history 堆到 ~120k+ input token
  2. 继续正常对话,过段时间就会观察到 step 失败 + APITimeoutError
  3. 此时直接 curl https://api.kimi.com/coding/v1/ 通常仍然秒回 → 不是
    底层网络死了,是连接池新建 + WAF 限流的组合效应

Session ID: c5567dae-2551-426a-9db7-a46bb3b7b225
(完整日志可按需提供,不在 issue 里附 zip)


Environment / Environment

  • kimi-cli 1.44.0
    -Python 3.14.3
  • Linux 6.6.114.1-microsoft-standard-WSL2 (x86_64)
  • Provider: managed:kimi-code, base_url https://api.kimi.com/coding/v1
  • Model: kimi-for-coding (max_context_size=262144)

Phenomenon / Symptom

After a single long session runs to context ≥ ~120k input tokens, each step has a higher probability
Timeout occurs during the new HTTPS connection phase, and openai.APITimeoutError: Request timed out. is reported.
The bottom layer is httpcore.ConnectTimeout (the connection handshake is not completed, not the read phase).

_run_with_connection_recovery will try again, sometimes it can be pulled up, sometimes it will be exhausted directly.
("Chat provider recovery exhausted for step").

Evidence / Evidence

Log statistics from a session:

log timeout times
2026-05-25 Second copy 24
2026-05-27 18
2026-05-28 (today) 53

The input of all failed steps are ≥ ~120k tokens, and the heap of this session has reached 154k;
There is also a large period of time in the middle that works normally (for example: 13:17–13:20, 11 consecutive steps are all successful, input 135k–149k),
The concentrated outbreak occurred within several narrow time windows (09:40 / 11:28 / 12:38 / 14:41).

Typical stack:

File ".../httpcore/_async/connection.py", line 124, in _connect
httpcore.ConnectTimeout
httpx.ConnectTimeout
    raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.
  File ".../kosong/chat_provider/kimi.py", line 170, in generate
    stream = await chat_provider.generate(system_prompt, tools, history)
kosong.chat_provider.APITimeoutError: Request timed out.
  File ".../kimi_cli/soul/kimisoul.py", line 1417, in _run_with_connection_recovery
    raise convert_error(e) from e

api.kimi.com was directly connected to health:

$ curl -o /dev/null -w "connect=%{time_connect}s tls=%{time_appconnect}s\n" https://api.kimi.com/coding/v1/
connect=0.234s tls=0.279s

DNS found volcddos.com (Volcano engine anti-D edge), so it is suspected that the edge node is
Instantaneous connection rejection for large body/high-frequency new connections**, and connect_timeout of httpx inside kimi-cli
If it is too short (default ~5s), give up directly, neither letting the user adjust nor exponential backoff.

Expected / Expected

  1. Expose httpx’s connect_timeout / overall timeout to config.toml
    (similar to providers.<name>.http.connect_timeout), allowing users with large contexts to increase the
  2. _run_with_connection_recovery uses exponential backoff (currently it looks like 1 retry at a fixed narrow interval,
    Fail together when catching up with the node current limiting window)
  3. Optional: retry ConnectTimeout several times separately and treat it differently from read-timeout
    (Because the server has not received any token during the connect phase, retrying has no side effects)

Reproduce / Reproduce

  1. Open a long session and keep pushing until the history pile reaches ~120k+ input tokens
  2. Continue the normal conversation, and you will observe step failure + APITimeoutError after a while.
  3. At this time, directly curl https://api.kimi.com/coding/v1/ usually still returns within seconds → No
    The underlying network is dead. It is the combined effect of the new connection pool + WAF current limiting.

Session ID: c5567dae-2551-426a-9db7-a46bb3b7b225
(The complete log can be provided on demand, and the zip is not attached to the issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions