Skip to content

[Draft] Enable text-only deployment for multimodal models#7183

Draft
K11OntheBoat wants to merge 1 commit intoPaddlePaddle:developfrom
K11OntheBoat:dev_split_mm
Draft

[Draft] Enable text-only deployment for multimodal models#7183
K11OntheBoat wants to merge 1 commit intoPaddlePaddle:developfrom
K11OntheBoat:dev_split_mm

Conversation

@K11OntheBoat
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 3, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 3, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.49153% with 18 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6cae9b1). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/input_batch.py 37.50% 8 Missing and 2 partials ⚠️
fastdeploy/config.py 57.14% 4 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py 0.00% 0 Missing and 1 partial ⚠️
...executor/layers/attention/dsa_attention_backend.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7183   +/-   ##
==========================================
  Coverage           ?   73.58%           
==========================================
  Files              ?      376           
  Lines              ?    52939           
  Branches           ?     8257           
==========================================
  Hits               ?    38954           
  Misses             ?    11243           
  Partials           ?     2742           
Flag Coverage Δ
GPU 73.58% <69.49%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-03 17:37 CST

📋 Review 摘要

PR 概述:为多模态模型启用纯文本部署模式,引入 enable_mm_runtime 属性统一控制运行时多模态特性
变更范围:FDConfig、Engine、Worker、Scheduler、Attention Backends、Speculative Decoding
影响面 Tag[FDConfig] [Engine] [Scheduler] [Speculative Decoding] [XPU] [HPU] [GCU] [Iluvatar] [Metax]

📝 PR 规范检查

PR 标题使用 [Draft] 不是有效 Tag,且描述中 Motivation、Modifications、Usage 均未填写。

标题建议(可直接复制):

  • [Feature] Enable text-only deployment for multimodal models

描述模板(可直接复制):

## Motivation
支持多模态模型以纯文本模式部署,通过 `deploy_modality=TEXT` 配置禁用多模态运行时特性(如 3D RoPE、encoder cache 等),降低资源占用并提升纯文本场景性能。

## Modifications
1.`FDConfig` 中新增 `enable_mm_runtime``enable_rope_3d_runtime` 属性
2. 将各模块中对 `model_config.enable_mm` 的判断替换为 `fd_config.enable_mm_runtime`
3.`deploy_modality=TEXT` 时,强制禁用 `rope_3d``use_3d_rope`

## Usage
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/multimodal_model \
    --deploy-modality text

### 问题

| 级别 | 文件 | 概述 |
|------|------|------|
| 🔴 Bug | `engine/common_engine.py:1282` | 包含调试打印语句,不应合并到 develop |
| 🔴 Bug | `entrypoints/engine_client.py:364` | 包含调试打印语句 |
| 🔴 Bug | `output/token_processor.py:952` | 包含调试打印语句 |
| 🔴 Bug | `inter_communicator/engine_worker_queue.py:554` | 包含调试打印语句,频繁调用影响性能 |
| ❓ 疑问 | `worker/input_batch.py:235` | `has_mm_model` 与 `enable_mm` 区分逻辑需确认 |

### 总体评价

本 PR 的设计思路清晰,通过引入 `enable_mm_runtime` 属性统一管理运行时多模态特性,变更覆盖面广且一致性好。但存在多处调试打印语句需要在合并前移除,这些 `print` 语句会在生产环境中产生大量输出,严重影响性能和日志可读性。

request = Request.from_dict(data)

# [DEBUG] engine 接收到的请求关键字段
print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。

这些 print 语句会在每个请求处理时输出大量调试信息,影响生产环境性能和日志可读性。

建议:移除这些调试打印,或使用 logger.debug() 替代并通过日志级别控制输出。

min_tokens = task.get("min_tokens", 1)

# [DEBUG] 发送到 engine 前的关键字段
print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。

建议移除或改用 logger.debug()

llm_logger.info(f"task {task_id} received eos token. Recycling.")

# [DEBUG] 输出 token 对比调试
print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 此处包含调试打印语句,不应合并到 develop 分支。

建议移除或改用 logger.debug()

time.sleep(0.001)
self.lock.acquire()
wait_ms = (time.perf_counter() - wait_start) * 1000.0
print(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 此处包含调试打印语句,在 put_tasksget_tasks 中会被频繁调用,严重影响性能。

建议移除这些调试打印。

model_config=self.model_config,
partial_rotary_factor=self.model_config.partial_rotary_factor,
)
if self.has_mm_model:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 此处在 not self.enable_mm 条件下初始化 image_features,但仅当 has_mm_model=True 时执行。

这意味着当模型支持多模态但部署为纯文本模式时,会初始化这些变量。请确认这是否是预期行为?如果纯文本部署不需要这些变量,可以考虑移除此初始化逻辑。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants