Skip to content

Optimize scheduler for chunk prefill#7466

Open
liyonghua0910 wants to merge 1 commit intoPaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260416_opt_prefill
Open

Optimize scheduler for chunk prefill#7466
liyonghua0910 wants to merge 1 commit intoPaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260416_opt_prefill

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 17, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-17 17:29 CST\n\n## 📋 Review 摘要\n\nPR 概述:优化 chunk prefill 场景下的调度策略——将 running 队列中 chunk prefill 请求的 block 分配判断改为使用更保守的阈值(_get_can_schedule_prefill_threshold_block),并移除 chunk prefill 场景的抢占(preempt)逻辑,当资源不足时直接跳过并阻止 WAITING 队列调度。\n变更范围engine/sched/resource_manager_v1.py\n影响面 TagScheduler\n\n### 📝 PR 规范检查\n\nPR 标题缺少必需的 Tag 标签,且 Motivation / Modifications 描述为空,建议补充。\n\n标题建议(可直接复制):\n- [Scheduler][Optimization] Optimize scheduler for chunk prefill\n\n描述模板(可直接复制):\n\n## Motivation\n优化 chunk prefill 场景下的调度策略,使用更保守的 block 阈值判断,并移除 chunk prefill 的抢占机制,避免不必要的 preemption 开销,提升整体调度效率。\n\n## Modifications\n1. 新增 chunk_prefill_in_running_not_satisfied 标志,标记 running 队列中 chunk prefill 无法满足时的状态。\n2. 将 chunk prefill 的 block 分配判断从 num_new_block 改为 _get_can_schedule_prefill_threshold_block 返回的保守阈值。\n3. 移除 chunk prefill 场景的 _trigger_preempt 调用,资源不足时直接 break。\n4. 当 chunk prefill 不满足时,阻止 WAITING 队列的调度。\n\n\n### 问题\n\n| 级别 | 文件 | 概述 |\n|------|------|------|\n| ❓ 疑问 | resource_manager_v1.py:929 | break 会跳过 running 队列中后续所有请求(包括 decode),需确认是否影响 decode 延迟 |\n| 🟡 建议 | resource_manager_v1.py:920 | 阈值判断与实际分配数量不一致,建议添加注释说明设计意图 |\n\n### 总体评价\n\n调度优化方向合理——避免 chunk prefill 场景不必要的抢占开销。但移除 preemption 后,running 队列中排在 chunk prefill 之后的 decode 请求也会被跳过调度,可能在高负载场景下影响 decode 延迟,建议作者确认这一行为变更在实际场景中的影响。"

# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate
chunk_prefill_in_running_not_satisfied = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 break 会跳过 running 队列中后续所有请求的调度(包括 decode 请求)

旧代码在 chunk prefill 资源不足时会先尝试 _trigger_preempt,抢占成功后可以继续调度后续请求;仅在抢占也失败时才 break。

现在直接 break 意味着:如果 running 队列中某个 chunk prefill 请求因 can_schedule_block_num_threshold(比实际需要的 num_new_block 更保守)判断不通过,后面排队的 decode 请求也会被跳过一轮调度

在高并发、资源紧张的场景下,这是否会导致 decode 请求延迟增加?是否有线上 benchmark 数据验证这一行为变更的影响?

req_index += 1
continue
num_new_block = self.get_new_block_nums(request, num_new_tokens)
can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 建议添加注释说明 can_schedule_block_num_thresholdnum_new_block 的关系

这里使用 _get_can_schedule_prefill_threshold_block 返回的阈值(可能大于 num_new_block,会额外预留 running 请求的 output block 空间)进行调度判断,但第 924 行实际分配时使用的是 num_new_block

这种「判断用保守阈值、分配用实际数量」的设计是正确的(确保预留 decode 空间),但逻辑不够直观,建议添加注释说明设计意图,方便后续维护。例如:

# Use a conservative threshold that reserves output blocks for running decode requests,
# but only allocate the actual needed blocks for this chunk prefill.
can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@185708b). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7466   +/-   ##
==============================================
  Coverage               ?   73.71%           
==============================================
  Files                  ?      376           
  Lines                  ?    52987           
  Branches               ?     8275           
==============================================
  Hits                   ?    39057           
  Misses                 ?    11189           
  Partials               ?     2741           
Flag Coverage Δ
GPU 73.71% <66.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants