[Feature]【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332
Draft
r-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
Draft
[Feature]【Hackathon 10th Spring No.50】MiniCPM4.1-8B model reproduction#7332r-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
r-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Author
|
感谢 AI 审查。标题已修正为 审查中无 blocking issues,不做额外调整。 |
58c1662 to
7baf689
Compare
fastdeploy-bot
left a comment
There was a problem hiding this comment.
📋 Review 摘要
PR 概述:为 FastDeploy 添加 MiniCPM4.1-8B 模型支持,实现 μP (Maximal Update Parametrization) 缩放、GQA、LongRoPE 等特性
变更范围:fastdeploy/model_executor/models/minicpm4.py (新增模型文件)、tests/model_executor/test_minicpm4.py (新增测试文件)、文档更新
影响面 Tag:[Models]
问题
未发现阻塞性问题。
总体评价
该 PR 整体实现正确,代码质量良好:
- 模型架构实现正确,参考了 Qwen2 模型模式并添加了 MiniCPM4 特有的 μP scaling
- 复用了现有 layers 组件,没有重复实现
- 使用
@ModelRegistry.register_model_class自动注册,符合项目规范 - 24 个单元测试覆盖全面,使用 stub 模拟依赖实现 CPU 安全测试
- Config 字段使用
getattr+ 默认值处理,兼容性良好 - 文档完整,包含部署指南和性能调优建议
建议补充端到端测试(tests/e2e/test_MiniCPM4_serving.py)以验证实际推理流程。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。
This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.
MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:
num_key_value_heads=2MiniCPMForCausalLMModifications
Model Code (
fastdeploy/model_executor/models/minicpm4.py)New model file (516 lines) implementing:
MiniCPM4MLP: Gate/up merged projection with SiLU activation, no biasMiniCPM4Attention: GQA withQKVParallelLinear(with_bias=False), neox-style RoPEMiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization supportMiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HFmodel.→ FDminicpm4.), registered asMiniCPMForCausalLMMiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)Unit Tests (
tests/model_executor/test_minicpm4.py)New test file (514 lines, 24 functional tests) with full FastDeploy integration:
MiniCPM4MLP,MiniCPM4Attention,MiniCPM4DecoderLayer,MiniCPM4Model,MiniCPM4ForCausalLM)monkeypatch.setattrstubs for heavy infrastructure (attention backend, parallel linear, embedding, RMSNorm, RoPE, graph opt)load_state_dict,compute_logitsDocumentation
docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuningdocs/supported_models.md: Added MiniCPM4 entry to LLM model tableEngineering Highlights
μP 3-Site Scaling — Correct implementation of Maximal Update Parametrization at three distinct points, each with different mathematical operations:
× scale_emb(amplifies to ×12)× scale_depth / √num_hidden_layers(applied independently to both attention and MLP outputs per layer, before residual add)÷ (hidden_size / dim_model_base)(normalizes ÷16 before logit computation)Ordering is critical: residual scaling must happen after each sub-layer output but before the residual addition.
Vocab Masking:
logits[:, ori_vocab_size:] = -infprevents generation of padding tokens at inference time — preserves original vocabulary boundary whenvocab_sizewas padded during training.tie_word_embeddings: Transposes embedding weight → lm_head with dtype consistency, matching
MiniCPMForCausalLMHF default.Design Decisions
@ModelRegistry.register_model_classdecorator — no manual imports neededscale_emb,scale_depth,dim_model_base) read from HFconfig.jsonviaModelConfigauto-setattrUsage or Command
See docs/best_practices/MiniCPM4-8B.md for full deployment guide.
Accuracy Tests
Unit Tests (24/24 passed on A800 GPU)
tests/model_executor/test_minicpm4.py(514 lines, 24 functional tests)monkeypatch.setattr— real FastDeploy class instantiation, noMagicMockfrom fastdeploy.model_executor.models.minicpm4 import ...— all 6 model classesTest categories:
load_state_dictweight mappingload_state_dictload_state_dictload_state_dictcompute_logitsμP scaling (÷16), vocab masking, lm_head fallback,set_state_dict, model name,tie_word_embeddingsMiniCPMForCausalLMAI Studio A800 GPU Validation (SM80)
Tested on Baidu AI Studio NVIDIA A800-SXM4-80GB (SM80 Ampere), Paddle 3.3.0, Python 3.10.12: 24/24 passed in 2.16s.
Environment: NVIDIA A800-SXM4-80GB, 81920 MiB, SM80, PaddlePaddle 3.3.0, Python 3.10.12.
GPU 推理验证 / Full Inference Validation
为方便验证模型推理能力,附上可直接运行的GPU验证脚本(单卡,≥24GB显存):
Step 1 — 启动 API Server:
python -m fastdeploy.entrypoints.openai.api_server \ --model openbmb/MiniCPM4.1-8B \ --tensor-parallel-size 1 \ --max-model-len 4096 \ --max-num-seqs 16Step 2 — 发送推理请求:
预期行为: Server 正常启动,模型加载完成后响应推理请求,生成连贯中文文本,无报错。
WINT4 量化验证 (降低显存需求至 ~8GB):
python -m fastdeploy.entrypoints.openai.api_server \ --model openbmb/MiniCPM4.1-8B \ --tensor-parallel-size 1 \ --quantization wint4 \ --max-model-len 4096 \ --max-num-seqs 16Checklist
@ModelRegistry.register_model_classdecorator