[Cherry-Pick][Speculate Decoding][Engine] Cherry-pick #7166, #7349, #7402, #7445 to release/online/20260415#7447
Merged
freeliuzc merged 4 commits intoPaddlePaddle:release/online/20260415from Apr 16, 2026
Conversation
…_stop_value kernels (PaddlePaddle#7166) - speculate_limit_thinking_content_length: update current_base_step to step_idx+1 (step_idx now records history count before current round); remove incorrect step_idx decrement on accept_num truncation; mark step_idx param as const. - speculate_set_stop_value_multi_seqs: fix can_stop gate to use step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx formula (remove stale -accept_num offset); use <= condition so accept_idx maps directly to the accepted token that ends the stop sequence; fix accept_tokens index (remove -1). - Update unit tests for speculate_set_stop_value_multi_seqs kernel.
…el (PaddlePaddle#7349) Co-authored-by: guanshihui] <guanshihui@baidu.com>
…n SpeculativeSampler (PaddlePaddle#7402)
|
Thanks for your contribution! |
freeliuzc
approved these changes
Apr 16, 2026
0ac91bf
into
PaddlePaddle:release/online/20260415
10 of 14 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
将 4 个已合入/待合入 develop 分支的 PR cherry-pick 到 release/online/20260415 分支,包含投机解码(Speculate Decoding)相关 bugfix 以及推理中断(Interrupt Reasoning)控制命令功能。
Modifications
包含以下 4 个 PR 的改动:
#7166 - [Speculative Decoding] fix mtp stop_seqs and limit thinking bugs
修复 limit_thinking 和 set_stop_value kernel 中 step_idx 语义错误
#7349 - [Speculate Decoding] Fix step_idx semantics in reasoning_phase_token_constraint and speculate set_value kernels
修复 reasoning_phase_token_constraint kernel 的 bug
#7402 - [Speculate Decoding] Fix reasoning_phase_token_constraint call args in SpeculativeSampler
修复 SpeculativeSampler 中调用 reasoning_phase_token_constraint 的参数错误
#7445 - [Interrupt reasoning] Add interrupt_requests control command support
在 PD 分离架构的控制命令通道中新增 interrupt_requests 命令,支持从外部主动中断正在执行的推理请求
Usage or Command
无新增用法,详见各原始 PR。
Accuracy Tests
不涉及模型前向计算或 kernel 精度变化,无需精度测试。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.