[CI]【Hackathon 10th Spring No.45】SM70/SM75 compile guards follow-up#7331
Conversation
…ed compile guards Wholesale replace cpp_extensions.cc and setup_ops.py with our AI Studio V100-verified implementation (pipeline p-1051a228d3c7). Changes vs merged PaddlePaddle#6488: - cpp_extensions.cc: Add #ifdef ENABLE_SCALED_MM_C2X guard for 5 cutlass/FP8 ops (linker error on SM70 without guard) - cpp_extensions.cc: Add #ifdef ENABLE_SM80_EXT_OPS guard for 7 tail MoE ops (linker error on SM70/SM75 without guard) - setup_ops.py: Fix ENABLE_SM80_EXT_OPS placement (cc>=80, not cc>=75) - setup_ops.py: Remove get_compile_parallelism() scope creep (26 lines, functionally identical to 1-liner)
|
Thanks for your contribution! |
|
cloudforge1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review | 2026-04-11
📋 Review 摘要
PR 概述:为 T4/V100 (SM70) 添加编译保护,修复 linker error 问题
变更范围:custom_ops/gpu_ops/cpp_extensions.cc、custom_ops/setup_ops.py
影响面 Tag:[CI] [OP]
📝 PR 规范检查
✅ PR 标题包含有效 Tag [CI]
✅ PR 描述填写完整(Motivation/Modifications/Usage/Accuracy Tests/Checklist)
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 无 | - | - |
未发现阻塞性问题。
总体评价
PR 正确添加了 SM75/SM80 编译保护,解决了 T4/V100 上的 linker error 问题。setup_ops.py 中的 ENABLE_SM80_EXT_OPS 宏定义位置调整合理(移到 speculate_decoding 源文件之后),确保 cc_compile_args 也包含该宏,使 cpp_extensions.cc 中的 #ifdef 保护能够正确生效。修改方式 minimal(4 行新增,0 删除),符合 PR 描述的 additive-only 策略。
Motivation
PR #6488 (merged as
-part) introduced T4/V100 compile support but left two registration blocks incpp_extensions.ccunguarded:.cusources compile only at SM≥75, but registration is unconditional → linker error on V100 (SM70)This is a minimal, additive-only fix — 4 lines added, 0 lines removed. See PR #6941 for a full wholesale replacement alternative.
Modifications
custom_ops/gpu_ops/cpp_extensions.cc: Add#ifdef ENABLE_SM75_EXT_OPS/#endifaround 5 cutlass/FP8 ops. Add#ifdef ENABLE_SM80_EXT_OPS/#endifaround 7 tail MoE ops.No changes to
setup_ops.py— keeps #6488's code as-is.Usage or Command
No user-facing changes. Build correctly gates ops per SM tier after this fix.
Accuracy Tests
Guard macro verified in
setup_ops.py:ENABLE_SM75_EXT_OPSandENABLE_SM80_EXT_OPSare both incc_compile_args(host compiler visibility — required for.ccfiles).Wholesale version tested on Baidu AI Studio V100 (pipeline
p-1051a228d3c7).Checklist
cc_compile_args)