Mini-Inference Engine

CUDA GEMM optimization tutorial and mini inference engine
From naive matrix multiplication to ~85% cuBLAS-class performance on the reference benchmark

English · 简体中文 · Online Docs · Quick Start

What this repository contains

Mini-Inference Engine is a compact CUDA/C++17 project for learning high-performance GEMM optimization in a realistic inference-engine setting. It keeps the scope intentionally small: matrix multiplication kernels, runtime utilities, benchmarks, tests, and bilingual documentation all live in one traceable codebase.

Core value:

Area	What to inspect
GEMM kernels	`src/naive_matmul.cu` through `src/vectorized_gemm.cu` show the optimization path.
Runtime components	`include/tensor.h`, `include/inference_engine.h`, `include/memory_pool.h`, and `include/stream_manager.h`.
Benchmarks	`benchmarks/benchmark.cpp`, `benchmarks/detailed_benchmark.cu`, and `benchmarks/mnist_demo.cpp`.
Specs	`openspec/specs/` defines requirements, architecture, API, data, and testing expectations.
Documentation	`docs/en/` and `docs/zh/` provide the tutorial, architecture, API, and tuning guides.

The headline performance number is hardware-specific. The project uses a conservative reference claim: the best optimized kernel reaches about 85% of cuBLAS-class throughput on the documented RTX 3080 1024×1024 benchmark.

Quick start

Requirements: CUDA Toolkit 11.0+, CMake 3.18+, a C++17 compiler, and an NVIDIA GPU with compute capability 7.0+.

git clone https://github.com/LessUp/mini-inference-engine.git
cd mini-inference-engine

cmake --preset default
cmake --build --preset default
ctest --preset default --output-on-failure

cmake --preset release
cmake --build --preset release
./build-release/benchmark

GPU tests skip when no CUDA device is available, but building still requires a CUDA toolkit because the library is compiled as a CUDA project.

Documentation map

Topic	English	中文
Quick Start	docs/en/QUICK_START.md	docs/zh/QUICK_START.md
Architecture	docs/en/ARCHITECTURE.md	docs/zh/ARCHITECTURE.md
GEMM Optimization	docs/en/GEMM_OPTIMIZATION.md	docs/zh/GEMM_OPTIMIZATION.md
Performance Tuning	docs/en/PERFORMANCE_TUNING.md	docs/zh/PERFORMANCE_TUNING.md
API Reference	docs/en/API_REFERENCE.md	docs/zh/API_REFERENCE.md
Development Guide	docs/en/CONTRIBUTING.md	docs/zh/CONTRIBUTING.md

Engineering workflow

Source of truth: openspec/specs/**.
Build system: explicit source lists in CMakeLists.txt; do not use recursive globbing for source files.
Formatting: .clang-format with Google-based 4-space style.
Tests: tests_host covers utilities that do not require a GPU device; tests_gpu covers CUDA runtime/kernel behavior. Configuring and compiling the project still requires a CUDA Toolkit.
Branching: keep master as the only long-lived branch; use short-lived branches/worktrees for changes and delete them after merge.

See AGENTS.md for the full project-specific AI and engineering workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.claude		.claude
.github		.github
assets		assets
benchmarks		benchmarks
config		config
docs		docs
include		include
openspec		openspec
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.editorconfig		.editorconfig
.gitignore		.gitignore
404.md		404.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
_config.yml		_config.yml
index.md		index.md
robots.txt		robots.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini-Inference Engine

What this repository contains

Quick start

Documentation map

Engineering workflow

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mini-Inference Engine

What this repository contains

Quick start

Documentation map

Engineering workflow

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages