[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337) by ckl117 · Pull Request #7339 · PaddlePaddle/FastDeploy

ckl117 · 2026-04-11T07:39:29Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-11T07:39:39Z

Thanks for your contribution!

codecov-commenter · 2026-04-11T09:06:20Z

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@2ac9b89). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...l_executor/layers/moe/fused_moe_cutlass_backend.py	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7339   +/-   ##
==============================================
  Coverage               ?   73.84%           
==============================================
  Files                  ?      376           
  Lines                  ?    52964           
  Branches               ?     8269           
==============================================
  Hits                   ?    39112           
  Misses                 ?    11116           
  Partials               ?     2736

Flag	Coverage Δ
GPU	`73.84% <66.66%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fastdeploy-bot

🤖 AI Code Review | 2026-04-11 19:35 CST

📋 Review 摘要

PR 概述：为 MoE BF16 EP 路径添加 paddle.batched_gemm 支持，替代原有的 compute_ffn 调用，并根据 FD_MOE_PROB_IN_ADVANCE 环境变量选择 fused_swiglu_scale 或 swiglu 激活函数。

变更范围：model_executor/layers/moe/fused_moe_cutlass_backend.py、tests/layers/test_fused_moe_cutlass_backend.py

影响面 Tag：[RL] [OP]

📝 PR 规范检查

标题问题：标题格式正确，包含 [RL] 和 [Cherry-Pick] 标签。

描述问题：Motivation 和 Modifications 章节均为空，缺少变更说明。

描述建议（可直接复制）：

## Motivation

在 MoE EP prefill 路径中，当 `FD_USE_PHI_MOE_PERMUTE=True` 且使用 w16a16 量化时，使用 `paddle.batched_gemm` 替代原有的 `compute_ffn` 调用，以提高性能。同时支持 `FD_MOE_PROB_IN_ADVANCE` 环境变量控制是否在激活函数前进行概率加权。

## Modifications

1. 在 `apply_ep_prefill` 方法的 `moe_permute/moe_unpermute` 路径中：
   - 使用两次 `paddle.incubate.nn.functional.batched_gemm` 替代 `self.compute_ffn` 调用
   - 根据 `FD_MOE_PROB_IN_ADVANCE` 环境变量选择：
     - True: 使用 `paddlefleet_ops.fused_swiglu_scale` (融合激活和缩放)
     - False: 使用 `paddle.incubate.nn.functional.swiglu` (仅激活)
   - 调整 `moe_unpermute` 的 `using_weighted_combine` 参数逻辑

2. 测试文件：
   - 添加 `align` 辅助函数用于对齐 token 数
   - 修改 `RealMoELayer` 的权重形状定义以适配 batched_gemm
   - 在 stub dispatch 中添加对 `expert_alignment` 的支持

问题

级别	文件	概述
🔴 Bug	`fused_moe_cutlass_backend.py:176`	`paddlefleet_ops` 可能为 `None`，直接访问属性会导致 `AttributeError`

总体评价

整体变更思路合理，使用 batched_gemm 替代原有 FFN 计算路径。但存在一个运行时错误风险：当 paddlefleet.ops 不可用时，直接调用 paddlefleet_ops.fused_swiglu_scale 会抛出异常，需要添加可用性检查。

fastdeploy-bot · 2026-04-11T11:35:28Z

fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py

+                    recv_num_tokens_per_expert_list,
+                )
+                if fastdeploy.envs.FD_MOE_PROB_IN_ADVANCE:
+                    out = paddlefleet_ops.fused_swiglu_scale(out, dst_weights)


🔴 Bug 当 paddlefleet.ops 不可用时，此处会抛出 AttributeError。paddlefleet_ops 通过 try_import 导入，失败时返回 None，直接访问属性会导致运行时错误。

建议修复方式：

if fastdeploy.envs.FD_MOE_PROB_IN_ADVANCE: if paddlefleet_ops is not None and hasattr(paddlefleet_ops, "fused_swiglu_scale"): out = paddlefleet_ops.fused_swiglu_scale(out, dst_weights) else: out = paddle.incubate.nn.functional.swiglu(out) else: out = paddle.incubate.nn.functional.swiglu(out)

moe bf16 ep support paddle batch_gemm

b198d49

ckl117 had a problem deploying to Metax_ci April 11, 2026 07:39 — with GitHub Actions Failure

ckl117 changed the title ~~moe bf16 ep support paddle batch_gemm~~ [Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337) Apr 11, 2026

This comment was marked as outdated.

Sign in to view

ep add FD_MOE_PROB_IN_ADVANCE check

664199b

ckl117 had a problem deploying to Metax_ci April 11, 2026 09:20 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix test

ba45a93

ckl117 had a problem deploying to Metax_ci April 11, 2026 11:23 — with GitHub Actions Failure

fastdeploy-bot suggested changes Apr 11, 2026

View reviewed changes

zoooo0820 approved these changes Apr 11, 2026

View reviewed changes

ckl117 merged commit 7446665 into PaddlePaddle:release/2.6 Apr 11, 2026
35 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337)#7339

[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337)#7339
ckl117 merged 3 commits intoPaddlePaddle:release/2.6from
ckl117:26_moe_bf16_batchgemm

ckl117 commented Apr 11, 2026

Uh oh!

paddle-bot bot commented Apr 11, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 11, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ckl117 commented Apr 11, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 11, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

fastdeploy-bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Apr 11, 2026 •

edited

Loading