Optimize scheduler for chunk prefill by liyonghua0910 · Pull Request #7466 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-04-17T09:11:28Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-17T09:11:42Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-17 17:29 CST\n\n## 📋 Review 摘要\n\nPR 概述：优化 chunk prefill 场景下的调度策略——将 running 队列中 chunk prefill 请求的 block 分配判断改为使用更保守的阈值（_get_can_schedule_prefill_threshold_block），并移除 chunk prefill 场景的抢占（preempt）逻辑，当资源不足时直接跳过并阻止 WAITING 队列调度。\n变更范围：engine/sched/resource_manager_v1.py\n影响面 Tag：Scheduler\n\n### 📝 PR 规范检查\n\nPR 标题缺少必需的 Tag 标签，且 Motivation / Modifications 描述为空，建议补充。\n\n标题建议（可直接复制）：\n- [Scheduler][Optimization] Optimize scheduler for chunk prefill\n\n描述模板（可直接复制）：\n\n## Motivation\n优化 chunk prefill 场景下的调度策略，使用更保守的 block 阈值判断，并移除 chunk prefill 的抢占机制，避免不必要的 preemption 开销，提升整体调度效率。\n\n## Modifications\n1. 新增 chunk_prefill_in_running_not_satisfied 标志，标记 running 队列中 chunk prefill 无法满足时的状态。\n2. 将 chunk prefill 的 block 分配判断从 num_new_block 改为 _get_can_schedule_prefill_threshold_block 返回的保守阈值。\n3. 移除 chunk prefill 场景的 _trigger_preempt 调用，资源不足时直接 break。\n4. 当 chunk prefill 不满足时，阻止 WAITING 队列的调度。\n\n\n### 问题\n\n| 级别 | 文件 | 概述 |\n|------|------|------|\n| ❓ 疑问 | resource_manager_v1.py:929 | break 会跳过 running 队列中后续所有请求（包括 decode），需确认是否影响 decode 延迟 |\n| 🟡 建议 | resource_manager_v1.py:920 | 阈值判断与实际分配数量不一致，建议添加注释说明设计意图 |\n\n### 总体评价\n\n调度优化方向合理——避免 chunk prefill 场景不必要的抢占开销。但移除 preemption 后，running 队列中排在 chunk prefill 之后的 decode 请求也会被跳过调度，可能在高负载场景下影响 decode 延迟，建议作者确认这一行为变更在实际场景中的影响。"

PaddlePaddle-bot · 2026-04-17T09:29:40Z

                        # Prepare prefill task
                        scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
+                    else:  # Not enough blocks to allocate
+                        chunk_prefill_in_running_not_satisfied = True


❓ 疑问 break 会跳过 running 队列中后续所有请求的调度（包括 decode 请求）

旧代码在 chunk prefill 资源不足时会先尝试 _trigger_preempt，抢占成功后可以继续调度后续请求；仅在抢占也失败时才 break。

现在直接 break 意味着：如果 running 队列中某个 chunk prefill 请求因 can_schedule_block_num_threshold（比实际需要的 num_new_block 更保守）判断不通过，后面排队的 decode 请求也会被跳过一轮调度。

在高并发、资源紧张的场景下，这是否会导致 decode 请求延迟增加？是否有线上 benchmark 数据验证这一行为变更的影响？

PaddlePaddle-bot · 2026-04-17T09:29:40Z

                        req_index += 1
                        continue
                    num_new_block = self.get_new_block_nums(request, num_new_tokens)
+                    can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)


🟡 建议 建议添加注释说明 can_schedule_block_num_threshold 与 num_new_block 的关系

这里使用 _get_can_schedule_prefill_threshold_block 返回的阈值（可能大于 num_new_block，会额外预留 running 请求的 output block 空间）进行调度判断，但第 924 行实际分配时使用的是 num_new_block。

这种「判断用保守阈值、分配用实际数量」的设计是正确的（确保预留 decode 空间），但逻辑不够直观，建议添加注释说明设计意图，方便后续维护。例如：

# Use a conservative threshold that reserves output blocks for running decode requests, # but only allocate the actual needed blocks for this chunk prefill. can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)

codecov-commenter · 2026-04-17T10:33:59Z

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@185708b). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/sched/resource_manager_v1.py	66.66%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7466   +/-   ##
==============================================
  Coverage               ?   73.71%           
==============================================
  Files                  ?      376           
  Lines                  ?    52987           
  Branches               ?     8275           
==============================================
  Hits                   ?    39057           
  Misses                 ?    11189           
  Partials               ?     2741

Flag	Coverage Δ
GPU	`73.71% <66.66%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Optimize scheduler for chunk prefill

f43acb7

liyonghua0910 had a problem deploying to Metax_ci April 17, 2026 09:11 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize scheduler for chunk prefill#7466

Optimize scheduler for chunk prefill#7466
liyonghua0910 wants to merge 1 commit intoPaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260416_opt_prefill

liyonghua0910 commented Apr 17, 2026

Uh oh!

paddle-bot bot commented Apr 17, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 17, 2026

Uh oh!

PaddlePaddle-bot Apr 17, 2026

Uh oh!

codecov-commenter commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

liyonghua0910 commented Apr 17, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 17, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants