[Cherry-Pick][OP]Unify MoE op with moe_permute path for bf16 GLM (#7164)#7279
Conversation
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-09 17:04 CST
📋 Review 摘要
PR 概述:Cherry-pick PR #7164,使用 Paddle 官方的 moe_permute/moe_unpermute 算子替代自定义算子,简化 MoE 层代码并提高可维护性。
变更范围:
custom_ops/gpu_ops/moe/deepgemm_preprocess.cu- 新增 cumsum 输出支持fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py- 新增FD_USE_PHI_MOE_PERMUTE环境变量控制的代码路径fastdeploy/model_executor/layers/moe/fused_moe_deepgemm_backend.py- 更新函数调用签名tests/layers/test_fused_moe_cutlass_backend.py- 新增真实算子测试
影响面 Tag:[OP] [Models]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | tests/layers/test_fused_moe_cutlass_backend.py:825 |
测试中定义了未使用的追踪变量 |
总体评价
PR 实现质量良好,CUDA kernel 的模板参数设计合理,Python 层与 C++ 层接口一致。新增的测试用例覆盖了 apply_tp 和 apply_ep_prefill 两个路径,包括 noaux_tc 和非 noaux_tc 分支。发现一个小的测试代码优化点,不阻塞合入。
| out = method.apply_tp(layer, x, gate) | ||
|
|
||
| assert permute_called["v"], "moe_permute was not called" | ||
| assert not dispatch_called["v"], "moe_expert_dispatch must not be called" |
There was a problem hiding this comment.
🟡 建议 dispatch_called 变量定义后从未被更新,导致此断言始终为 True。
虽然测试通过 monkeypatch 对 moe_expert_dispatch 抛出 AssertionError 已经确保了该函数不会被调用(否则测试会失败),但这个断言本身是冗余的,可以移除以提高代码清晰度。
建议:
# 删除未使用的变量和断言
permute_called = {"v": False}
original_permute = paddle.nn.functional.moe_permute
# ... spy_permute 定义 ...
# 删除这行:assert not dispatch_called["v"], "moe_expert_dispatch must not be called"
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7279 +/- ##
==============================================
Coverage ? 73.89%
==============================================
Files ? 376
Lines ? 52915
Branches ? 8255
==============================================
Hits ? 39103
Misses ? 11085
Partials ? 2727
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
使用 Paddle 官方的 moe_permute/moe_unpermute 算子替代自定义算子,简化代码并提高可维护性。
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.