Skip to content

[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1#7159

Open
rainyfly wants to merge 3 commits intoPaddlePaddle:developfrom
rainyfly:support_abort_token_id_in_get_output_v1
Open

[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1#7159
rainyfly wants to merge 3 commits intoPaddlePaddle:developfrom
rainyfly:support_abort_token_id_in_get_output_v1

Conversation

@rainyfly
Copy link
Copy Markdown
Collaborator

@rainyfly rainyfly commented Apr 2, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 2, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-02 20:02 CST

📋 Review 摘要

PR 概述:在 GET_SAVE_OUTPUT_V1 模式下支持设置 PREEMPTED_TOKEN_ID 标记被抢占的请求
变更范围worker/gpu_model_runner.py_postprocess 方法
影响面 TagEngine Worker

问题

级别 文件 概述
🔴 Bug gpu_model_runner.py:2407 环境变量名错误,envs.GET_SAVE_OUTPUT_V1 不存在

总体评价

PR 意图清晰,为 V1 输出模式增加了抢占标记支持。但存在一个关键的变量名错误,会导致运行时异常,需要修复后才能合并。


# 5.1. Async cpy
post_process_event = paddle.device.cuda.create_event()
if envs.GET_SAVE_OUTPUT_V1:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 变量名错误:envs.GET_SAVE_OUTPUT_V1 不存在,应为 envs.FD_USE_GET_SAVE_OUTPUT_V1

fastdeploy/envs.py 中定义的环境变量是 FD_USE_GET_SAVE_OUTPUT_V1,当前代码使用 envs.GET_SAVE_OUTPUT_V1 会在运行时触发 AttributeError,导致条件分支永远无法进入(Python 会抛出异常)。

其他使用此环境变量的代码(如同文件的 241 行和 2300 行)都使用了正确的 envs.FD_USE_GET_SAVE_OUTPUT_V1

建议修复

if envs.FD_USE_GET_SAVE_OUTPUT_V1:

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-03 15:24 CST

📋 Review 摘要

PR 概述:在 FD_USE_GET_SAVE_OUTPUT_V1 模式下,为被抢占的请求设置 PREEMPTED_TOKEN_ID (-9) 以通知 server 端 abort 操作已完成。

变更范围fastdeploy/worker/gpu_model_runner.py - _postprocess 方法

影响面 TagEngine Scheduler

📝 PR 规范检查

PR 描述中 MotivationModifications 部分未填写具体内容,建议补充说明。

描述模板(可直接复制):

## Motivation
在使用 GET_SAVE_OUTPUT_V1 模式时,当请求被抢占(preempted)后,没有对应的采样 token。本 PR 通过设置 PREEMPTED_TOKEN_ID (-9) 来通知 server 端 abort 操作已完成,使抢占流程能够正确结束。

## Modifications
-`gpu_model_runner.py``_postprocess` 方法中,当 `FD_USE_GET_SAVE_OUTPUT_V1` 开启时,检查 `last_preempted_idx`
- 对于被抢占的请求,将其 `sampled_token_ids` 设置为 `PREEMPTED_TOKEN_ID`

问题

未发现阻塞性问题。

总体评价

代码逻辑清晰,与已有的 token_processor.py 中处理 PREEMPTED_TOKEN_ID 的逻辑保持一致。建议补充 PR 描述以便于后续维护和代码审查。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@938e7dd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/gpu_model_runner.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7159   +/-   ##
==========================================
  Coverage           ?   73.86%           
==========================================
  Files              ?      376           
  Lines              ?    52888           
  Branches           ?     8250           
==========================================
  Hits               ?    39064           
  Misses             ?    11095           
  Partials           ?     2729           
Flag Coverage Δ
GPU 73.86% <66.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants