🚀 Fast-SAM3D: 3Dfy Anything in Images but Faster

Weilun Feng ^* ,Mingqiang Wu^*, Zhiliang Chen, Chuanguang Yang^✉, Haotong Qin, Yuqi Li, Xiaokun Liu, Guoxin Fan, Zhulin An^✉, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu

^*Equal Contribution ^✉Corresponding Author

Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China University of Mining and Technology, ETH Zürich, Shanghai Jiao Tong University

Fast-SAM3D accelerates SAM3D by up to 2.67× while maintaining geometric fidelity and semantic consistency.

[26/05/23]Fast-TRELLIS transfers the inference-time acceleration design of Fast-SAM3D to TRELLIS for efficient structured 3D generation.

💡 TL;DR

Fast-SAM3D is a training-free acceleration framework for single-view 3D reconstruction that delivers up to 2.67× speedup with negligible quality loss. Our approach dynamically aligns computation with instantaneous generation complexity through three heterogeneity-aware mechanisms.

📰 News

[2026.05.23] 🎉🎉🎉Fast-TRELLIS :the implementation of Fast-SAM3D in TRELLIS released. You can switch to the Fast-TRELLIS branch or click here.
[2026.05.22] A Gradio-based user demo is now available
[2026.05.01] 🎉 Accepted by ICML 2026.
[2026.03.25] Fixed the bug where mesh merging was not enabled properly.
[2026.03.11] Code optimized and some known bugs fixed
[2026.02.05] 🎉 Paper and code released! Check out our paper.

🌟 Highlights

🚀 Training-Free Acceleration: Achieves 2.67× speedup for single-object generation and 2.01× for scene generation without any model retraining.
🎯 Heterogeneity-Aware Design: Addresses multi-level heterogeneity in 3D generation pipelines: kinematic distinctiveness, intrinsic sparsity, and spectral variance.
🔧 Plug-and-Play Modules: Three seamless integration modules:
- Modality-Aware Step Caching: Decouples shape evolution from sensitive layout updates
- Joint Spatiotemporal Token Carving: Concentrates refinement on high-entropy regions
- Spectral-Aware Token Aggregation: Adapts decoding resolution to geometric complexity
✨ Quality Preservation: Maintains or even exceeds original model's geometric fidelity (F-Score: 92.59 vs. 92.34).

🔍 Method Overview

Overview of Fast-SAM3D. Our approach integrates three heterogeneity-aware modules: (1) Modality-Aware Step Caching for decoupling structural evolution from layout updates; (2) Joint Spatiotemporal Token Carving for eliminating redundancy; (3) Spectral-Aware Token Aggregation for adaptive decoding resolution.

Stage 1: Modality-Aware Step Caching

The Sparse Structure Generator exhibits modality heterogeneity: shape tokens evolve smoothly while layout tokens are volatile. We propose:

Linear Extrapolation for shape tokens using finite-difference prediction
Momentum-Anchored Smoothing for layout tokens to suppress high-frequency jitter

Stage 2: Joint Spatiotemporal Token Carving

The SLaT Generator shows intrinsic refinement sparsity: updates concentrate on high-entropy regions. We design:

Unified Saliency Potential combining temporal dynamics (magnitude & abruptness) and spatial frequency
Dynamic Adaptive Step Caching with curvature-aware trajectory approximation

Stage 3: Spectral-Aware Token Aggregation

The Mesh Decoder processes dense token sequences. We introduce:

Spectral Complexity Analysis using High-Frequency Energy Ratio (HFER)
Instance-Adaptive Aggregation with aggressive compression for simple shapes and detail preservation for complex geometries

🛠️ Installation

Requirements

Python >= 3.9
PyTorch >= 2.0
CUDA >= 11.8
SAM3D dependencies

Setup FastSAM3D Environment

If you already have the official SAM3D environment, you can directly reuse it,below is the official environment configuration for Fast-SAM3D.

# create fastsam3d environment
mamba env create -f environments/default.yml
mamba activate fastsam3d

# for pytorch/cuda dependencies
export PIP_EXTRA_INDEX_URL="https://pypi.ngc.nvidia.com https://download.pytorch.org/whl/cu121"

# install fastsam3d and core dependencies
pip install -e '.[dev]'
pip install -e '.[p3d]' # pytorch3d dependency on pytorch is broken, this 2-step approach solves it

# for inference
export PIP_FIND_LINKS="https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu121.html"
pip install -e '.[inference]'

# patch things that aren't yet in official pip packages
./patching/hydra # https://github.com/facebookresearch/hydra/pull/2863

If you encounter some difficulties during installation, please refer to the more detailed /doc/Setup.md documentation.

Getting Checkpoints

From HuggingFace

⚠️ Before using FastSAM 3D , please request access to the checkpoints on the SAM 3D Objects Hugging Face repo. Once accepted, you need to be authenticated to download the checkpoints. You can do this by running the following steps (e.g. hf auth login after generating an access token).

⚠️ SAM 3D Objects is available via HuggingFace globally, except in comprehensively sanctioned jurisdictions. Sanctioned jurisdiction will result in requests being rejected.

pip install 'huggingface-hub[cli]<1.0'
TAG=hf
hf download \
  --repo-type model \
  --local-dir checkpoints/${TAG}-download \
  --max-workers 1 \
  facebook/sam-3d-objects
mv checkpoints/${TAG}-download/checkpoints checkpoints/${TAG}
rm -rf checkpoints/${TAG}-download

Moge

Moge model download link: https://huggingface.co/Ruicheng/moge-2-vitl-normal/tree/main

Modify the weight path of depth_model in pipeline.yaml.

🚀 Usage

Quick Start

cd Fast-SAM3D
# Object Generation
bash infer.sh
# Scene Generation
bash infer_scene.sh

Acceleration Options

# Customize acceleration strength
python infer.py \
    --image_path examples/image.png \
    --mask_index 1 \
    --output_dir /data/wmq/Fast-SAM3D/Look \
    --ss_cache_stride 3 \
    --ss_warmup 2 \
    --ss_order 1 \
    --ss_momentum_beta 0.5 \
    --slat_thresh 1.5 \
    --slat_warmup 3 \
    --slat_carving_ratio 0.1 \
    --mesh_spectral_threshold_low 0.5 \
    --mesh_spectral_threshold_high 0.7 \
    --enable_acceleration

Scene Generation

python infer_scene.py\
    --image_dir  examples_dir \
    --output_dir /data/wmq/Fast-SAM3D/Look-scene \
    --ss_cache_stride 3 \
    --ss_warmup 2 \
    --ss_order 1 \
    --ss_momentum_beta 0.5 \
    --slat_thresh 1.5 \
    --slat_warmup 3 \
    --slat_carving_ratio 0.1 \
    --mesh_spectral_threshold_low 0.5 \
    --mesh_spectral_threshold_high 0.7 \
    --enable_acceleration

Image Directory

At least one RGB image and a mask are required

├── example/
│   ├── image.png	#RGB_image
│   ├── 0.png  		#RGB_mask_1
│   └── 1.png		#RGB_mask_2

🍔 User demo

A user demo system, implemented with Gradio, is now live and available on GitHub. You can quickly launch it locally by running:

 python gradio_demo.py

📊 Results

Quantitative Comparison

Method	Visual ↑	CD ↓	F1@0.05 ↑	vIoU ↑	3D-IoU ↑	Scene Time ↓	Speed ↑
SAM3D	0.369	0.022	92.34	0.543	0.403	462.3s	1.00×
Random Drop	0.264	0.030	83.52	0.327	0.094	402.2s	1.15×
Uniform Merge	0.329	0.023	91.48	0.540	0.367	366.8s	1.26×
Fast3Dcache	0.348	0.022	91.31	0.505	0.051	443.3s	1.04×
TaylorSeer	0.344	0.028	90.95	0.504	0.374	265.6s	1.74×
EasyCache	0.342	0.028	87.06	0.432	0.186	244.9s	1.89×
Fast-SAM3D	0.350	0.022	92.59	0.552	0.375	229.7s	2.01×

Speedup Analysis

Qualitative Comparison

Fast-SAM3D produces results perceptually indistinguishable from SAM3D while generic strategies suffer from structural collapse (Random Drop) or semantic drift (TaylorSeer).

📄 Citation

If you find this work helpful, please consider citing:

@misc{feng2026fastsam3d3dfyimagesfaster,
      title={Fast-SAM3D: 3Dfy Anything in Images but Faster}, 
      author={Weilun Feng and Mingqiang Wu and Zhiliang Chen and Chuanguang Yang and Haotong Qin and Yuqi Li and Xiaokun Liu and Guoxin Fan and Zhulin An and Libo Huang and Yulun Zhang and Michele Magno and Yongjun Xu},
      year={2026},
      eprint={2602.05293},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.05293}, 
}

🙏 Acknowledgements

This project is built upon the excellent SAM3D framework. We thank the authors for their outstanding work in open-world 3D reconstruction.

📜 License

This project is released under the MIT License.

📧 Contact

For questions or suggestions, please open an issue or contact:

Weilun Feng: fengweilun24s@ict.ac.cn
Mingqiang Wu wumingqiang25e@ict.ac.cn
Chuanguang Yang: yangchuanguang@ict.ac.cn
Zhulin An: anzhulin@ict.ac.cn

⭐ Star us on GitHub if you find this project helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
checkpoints/hf		checkpoints/hf
environments		environments
faster_utils_slat		faster_utils_slat
faster_utils_ss		faster_utils_ss
fft		fft
notebook		notebook
patching		patching
sam3d_objects		sam3d_objects
taylor_utils_slat		taylor_utils_slat
taylor_utils_ss		taylor_utils_ss
token_slat		token_slat
README.md		README.md
gradio_demo.py		gradio_demo.py
infer.sh		infer.sh
infer_scene.sh		infer_scene.sh
pyproject.toml		pyproject.toml
requirements.dev.txt		requirements.dev.txt
requirements.inference.txt		requirements.inference.txt
requirements.p3d.txt		requirements.p3d.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Fast-SAM3D: 3Dfy Anything in Images but Faster

💡 TL;DR

📋 Table of Contents

📰 News

🌟 Highlights

🔍 Method Overview

Stage 1: Modality-Aware Step Caching

Stage 2: Joint Spatiotemporal Token Carving

Stage 3: Spectral-Aware Token Aggregation

🛠️ Installation

Requirements

Setup FastSAM3D Environment

Getting Checkpoints

🚀 Usage

Quick Start

Acceleration Options

Scene Generation

Image Directory

🍔 User demo

📊 Results

Quantitative Comparison

Speedup Analysis

Qualitative Comparison

📄 Citation

🙏 Acknowledgements

📜 License

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Fast-SAM3D: 3Dfy Anything in Images but Faster

💡 TL;DR

📋 Table of Contents

📰 News

🌟 Highlights

🔍 Method Overview

Stage 1: Modality-Aware Step Caching

Stage 2: Joint Spatiotemporal Token Carving

Stage 3: Spectral-Aware Token Aggregation

🛠️ Installation

Requirements

Setup FastSAM3D Environment

Getting Checkpoints

🚀 Usage

Quick Start

Acceleration Options

Scene Generation

Image Directory

🍔 User demo

📊 Results

Quantitative Comparison

Speedup Analysis

Qualitative Comparison

📄 Citation

🙏 Acknowledgements

📜 License

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages