[Draft] Enable text-only deployment for multimodal models#7183
[Draft] Enable text-only deployment for multimodal models#7183K11OntheBoat wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
|
liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7183 +/- ##
==========================================
Coverage ? 73.58%
==========================================
Files ? 376
Lines ? 52939
Branches ? 8257
==========================================
Hits ? 38954
Misses ? 11243
Partials ? 2742
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-03 17:37 CST
📋 Review 摘要
PR 概述:为多模态模型启用纯文本部署模式,引入 enable_mm_runtime 属性统一控制运行时多模态特性
变更范围:FDConfig、Engine、Worker、Scheduler、Attention Backends、Speculative Decoding
影响面 Tag:[FDConfig] [Engine] [Scheduler] [Speculative Decoding] [XPU] [HPU] [GCU] [Iluvatar] [Metax]
📝 PR 规范检查
PR 标题使用 [Draft] 不是有效 Tag,且描述中 Motivation、Modifications、Usage 均未填写。
标题建议(可直接复制):
[Feature] Enable text-only deployment for multimodal models
描述模板(可直接复制):
## Motivation
支持多模态模型以纯文本模式部署,通过 `deploy_modality=TEXT` 配置禁用多模态运行时特性(如 3D RoPE、encoder cache 等),降低资源占用并提升纯文本场景性能。
## Modifications
1. 在 `FDConfig` 中新增 `enable_mm_runtime` 和 `enable_rope_3d_runtime` 属性
2. 将各模块中对 `model_config.enable_mm` 的判断替换为 `fd_config.enable_mm_runtime`
3. 当 `deploy_modality=TEXT` 时,强制禁用 `rope_3d` 和 `use_3d_rope`
## Usage
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model /path/to/multimodal_model \
--deploy-modality text
### 问题
| 级别 | 文件 | 概述 |
|------|------|------|
| 🔴 Bug | `engine/common_engine.py:1282` | 包含调试打印语句,不应合并到 develop |
| 🔴 Bug | `entrypoints/engine_client.py:364` | 包含调试打印语句 |
| 🔴 Bug | `output/token_processor.py:952` | 包含调试打印语句 |
| 🔴 Bug | `inter_communicator/engine_worker_queue.py:554` | 包含调试打印语句,频繁调用影响性能 |
| ❓ 疑问 | `worker/input_batch.py:235` | `has_mm_model` 与 `enable_mm` 区分逻辑需确认 |
### 总体评价
本 PR 的设计思路清晰,通过引入 `enable_mm_runtime` 属性统一管理运行时多模态特性,变更覆盖面广且一致性好。但存在多处调试打印语句需要在合并前移除,这些 `print` 语句会在生产环境中产生大量输出,严重影响性能和日志可读性。
| request = Request.from_dict(data) | ||
|
|
||
| # [DEBUG] engine 接收到的请求关键字段 | ||
| print( |
There was a problem hiding this comment.
🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。
这些 print 语句会在每个请求处理时输出大量调试信息,影响生产环境性能和日志可读性。
建议:移除这些调试打印,或使用 logger.debug() 替代并通过日志级别控制输出。
| min_tokens = task.get("min_tokens", 1) | ||
|
|
||
| # [DEBUG] 发送到 engine 前的关键字段 | ||
| print( |
There was a problem hiding this comment.
🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。
建议移除或改用 logger.debug()。
| llm_logger.info(f"task {task_id} received eos token. Recycling.") | ||
|
|
||
| # [DEBUG] 输出 token 对比调试 | ||
| print( |
There was a problem hiding this comment.
🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。
建议移除或改用 logger.debug()。
| time.sleep(0.001) | ||
| self.lock.acquire() | ||
| wait_ms = (time.perf_counter() - wait_start) * 1000.0 | ||
| print( |
There was a problem hiding this comment.
🔴 Bug 此处包含调试打印语句,在 put_tasks 和 get_tasks 中会被频繁调用,严重影响性能。
建议移除这些调试打印。
| model_config=self.model_config, | ||
| partial_rotary_factor=self.model_config.partial_rotary_factor, | ||
| ) | ||
| if self.has_mm_model: |
There was a problem hiding this comment.
❓ 疑问 此处在 not self.enable_mm 条件下初始化 image_features,但仅当 has_mm_model=True 时执行。
这意味着当模型支持多模态但部署为纯文本模式时,会初始化这些变量。请确认这是否是预期行为?如果纯文本部署不需要这些变量,可以考虑移除此初始化逻辑。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.