Skip to content

[CI]【Hackathon 10th Spring No.45】SM70/SM75 compile guards follow-up#7331

Open
r-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/045-t4-v100-compile-guards-part2-v2
Open

[CI]【Hackathon 10th Spring No.45】SM70/SM75 compile guards follow-up#7331
r-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/045-t4-v100-compile-guards-part2-v2

Conversation

@r-cloudforge
Copy link
Copy Markdown

Motivation

PR #6488 (merged as -part) introduced T4/V100 compile support but left two registration blocks in cpp_extensions.cc unguarded:

  1. 5 cutlass/FP8 ops (lines 1635–1673): .cu sources compile only at SM≥75, but registration is unconditional → linker error on V100 (SM70)
  2. 7 tail MoE/MLA ops (lines 1890–1925): sources compile only at SM≥80, registration unconditional → linker error on SM70/SM75

This is a minimal, additive-only fix — 4 lines added, 0 lines removed. See PR #6941 for a full wholesale replacement alternative.

Modifications

  • custom_ops/gpu_ops/cpp_extensions.cc: Add #ifdef ENABLE_SM75_EXT_OPS / #endif around 5 cutlass/FP8 ops. Add #ifdef ENABLE_SM80_EXT_OPS / #endif around 7 tail MoE ops.

No changes to setup_ops.py — keeps #6488's code as-is.

Usage or Command

No user-facing changes. Build correctly gates ops per SM tier after this fix.

Accuracy Tests

Guard macro verified in setup_ops.py: ENABLE_SM75_EXT_OPS and ENABLE_SM80_EXT_OPS are both in cc_compile_args (host compiler visibility — required for .cc files).

Wholesale version tested on Baidu AI Studio V100 (pipeline p-1051a228d3c7).

Checklist

  • 4 additive lines, 0 deletions — minimal diff
  • Pre-commit hooks pass (clang-format)
  • Guards use macros visible to host compiler (cc_compile_args)
  • Correct SM tier: cutlass→SM75, MoE→SM80

…ed compile guards

Wholesale replace cpp_extensions.cc and setup_ops.py with our
AI Studio V100-verified implementation (pipeline p-1051a228d3c7).

Changes vs merged PaddlePaddle#6488:
- cpp_extensions.cc: Add #ifdef ENABLE_SCALED_MM_C2X guard for 5
  cutlass/FP8 ops (linker error on SM70 without guard)
- cpp_extensions.cc: Add #ifdef ENABLE_SM80_EXT_OPS guard for 7
  tail MoE ops (linker error on SM70/SM75 without guard)
- setup_ops.py: Fix ENABLE_SM80_EXT_OPS placement (cc>=80, not cc>=75)
- setup_ops.py: Remove get_compile_parallelism() scope creep (26 lines,
  functionally identical to 1-liner)
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 10, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 10, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


cloudforge1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

fastdeploy-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-11

📋 Review 摘要

PR 概述:为 T4/V100 (SM70) 添加编译保护,修复 linker error 问题

变更范围custom_ops/gpu_ops/cpp_extensions.cccustom_ops/setup_ops.py

影响面 Tag[CI] [OP]

📝 PR 规范检查

✅ PR 标题包含有效 Tag [CI]
✅ PR 描述填写完整(Motivation/Modifications/Usage/Accuracy Tests/Checklist)

问题

级别 文件 概述
- -

未发现阻塞性问题。

总体评价

PR 正确添加了 SM75/SM80 编译保护,解决了 T4/V100 上的 linker error 问题。setup_ops.py 中的 ENABLE_SM80_EXT_OPS 宏定义位置调整合理(移到 speculate_decoding 源文件之后),确保 cc_compile_args 也包含该宏,使 cpp_extensions.cc 中的 #ifdef 保护能够正确生效。修改方式 minimal(4 行新增,0 删除),符合 PR 描述的 additive-only 策略。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants