[Cherry-Pick][BugFix] prevent requests from entering running state without a slot (#7141)#7160
Conversation
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-02 20:00 CST
📋 Review 摘要
PR 概述:Cherry-pick 调度器 slot 计数修复到 release/2.4 分支,防止请求在无可用 slot 时进入 running 状态
变更范围:common_engine.py(异常处理)、resource_manager_v1.py(调度逻辑)
影响面 Tag:Engine Scheduler
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | common_engine.py:912 |
RuntimeError 异常处理行为变更,需确认意图 |
总体评价
resource_manager_v1.py 的 slot 计数修复逻辑清晰正确,现在综合考虑了 running、待重调度和已抢占的请求数量。但 common_engine.py 中 RuntimeError 的处理从"优雅退出"变为"直接抛出",这个行为变更与 PR 描述中的 slot accounting 修复目标关联不明显,建议作者确认。
| except RuntimeError as e: | ||
| if "cannot schedule new futures after shutdown" in str(e): | ||
| break | ||
| raise e |
There was a problem hiding this comment.
❓ 疑问 RuntimeError 异常处理行为变更
原代码在收到 "cannot schedule new futures after shutdown" 错误时会优雅地 break 退出循环,现在改为直接 raise e 抛出异常。
疑问点:
- 这个变更会导致 shutdown 时抛出异常而非静默退出,是否是预期行为?
- 此变更与 PR 描述中的 "slot accounting" 修复目标的关联是什么?
- 是否会影响服务的正常关闭流程?
请确认这个行为变更是否符合预期。
| if ( | ||
| len(self.running) | ||
| + len(self.to_be_rescheduled_request_id_set) | ||
| + sum([req.status == RequestStatus.PREEMPTED for req in self.waiting]) |
There was a problem hiding this comment.
🟢 LGTM slot 计数逻辑修复
现在正确地将以下三类请求纳入 slot 占用计算:
self.running:正在运行的请求self.to_be_rescheduled_request_id_set:待重新调度的请求- PREEMPTED 状态的 waiting 请求:已被抢占等待恢复的请求
同时将 == 改为 >= 是更安全的边界条件检查。
小建议(非阻塞):可考虑将 sum([req.status == ... for req in self.waiting]) 改为生成器表达式 sum(req.status == ... for req in self.waiting) 以节省内存分配。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.4 #7160 +/- ##
==============================================
Coverage ? 56.39%
==============================================
Files ? 333
Lines ? 42624
Branches ? 6485
==============================================
Hits ? 24036
Misses ? 16704
Partials ? 1884
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Cherry-pick the scheduler slot-accounting fixes from develop PR #7141 to
release/2.4.This keeps scheduler occupancy consistent on the release branch when requests are running, pending reschedule, or already preempted back into the waiting queue, and preserves the slot guard that prevents over-admission.
Modifications
ea3854523f9a69b01e4194e51bc1e2d037e71d23: prevent requests from entering running state without a slot.3e4360b2553391850467b544757022ab02f99c3c: fix abort-set counting.dc46505dbf06f3f95c565d53e57fb459e224a87b: count preempted tasks in the waiting list.upstream/release/2.4in branchrelease/2.4+20260402_fix_schedule.Usage or Command
Bugfix validation command not run in this session.
Suggested reproduction:
max_num_seqs.runningwhen the effective occupied slots have already reached capacity.Accuracy Tests
This cherry-pick does not change model forward, kernel logic, or numerical computation paths.
No accuracy impact is expected.
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.