A High-Performance WebGPU Deep Learning Inference Engine
Zero Dependencies · Hand-Written WGSL · GPU-Accelerated · Type-Safe
Quick Start · Features · Performance · Documentation · Contributing
The smallest, most transparent deep learning inference engine for the web.
| Tiny-DL-Inference | TensorFlow.js | ONNX Runtime Web | |
|---|---|---|---|
| Bundle Size | 58KB | ~2MB | ~1.5MB |
| Dependencies | Zero | Heavy | Moderate |
| Code Transparency | 100% WGSL source | Black box | Black box |
| GPU Control | Direct shader access | Abstracted | Abstracted |
| Kernel Fusion | ✅ Manual fusion | Limited | Limited |
Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.
- Zero Dependencies — No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
- Kernel Fusion — Fused Conv2d+Bias+ReLU achieves 3× memory bandwidth reduction
- Zero-Copy Operations — Tensor views with no GPU overhead (< 1μs reshape)
- Hand-Written WGSL — Every operator implemented from scratch in readable WGSL code
- Type Safe — Full TypeScript with strict mode, zero
anytypes - Comprehensive Testing — Property-based testing with fast-check (100+ iterations each)
- Production Ready — Custom error classes, proper GPU resource lifecycle
- Educational — Perfect for studying GPU computing and WebGPU programming
- Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
- Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
- Node.js: 18.0+ (for development)
npm install tiny-dl-inferenceimport { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';
// 1. Initialize GPU context
const context = new GPUContext();
await context.init();
// 2. Create input tensor
const input = Tensor.fromArray(context,
new Float32Array([1.0, -2.0, 3.0, -4.0]),
[1, 4, 1, 1] // [batch, channels, height, width]
);
// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);
// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])
// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';
// Initialize engine
const context = new GPUContext();
await context.init();
const engine = new InferenceEngine(context);
// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');
// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
// Run inference
const output = await engine.infer(input);
const predictions = await output.download();
// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);
// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();→ Read the Full Documentation for detailed guides and examples.
Without Fusion (6 memory operations):
Read → Conv → Write → Read → Bias → Write → Read → ReLU → Write
With Fusion (2 memory operations):
Read → Conv+Bias+ReLU → Write
| Benchmark | Separate Operators | Fused Operator | Improvement |
|---|---|---|---|
| Conv2d 64-channel | 2.34ms | 0.89ms | 2.6× faster |
| Memory Operations | 6 ops | 2 ops | 3× reduction |
| Kernel Launches | 3 | 1 | 66% fewer |
| Intermediate Tensors | 3 allocated | 0 | 100% saved |
// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]); // < 1 microsecond| Model | Latency | Device |
|---|---|---|
| MNIST CNN | < 100ms | Chrome 120, RTX 3060 |
| CIFAR-10 | < 150ms | Chrome 120, RTX 3060 |
| Operator | Description | Fusion Available |
|---|---|---|
Conv2d |
2D Convolution with stride/padding | ✅ Fused with Bias+ReLU |
Conv2dBiasReLU |
Conv + Bias + ReLU in single kernel | ✅ 3× memory reduction |
| Operator | Description |
|---|---|
MaxPool |
2D Max Pooling with configurable kernel size |
| Operator | Description | Formula |
|---|---|---|
ReLU |
Rectified Linear Unit | f(x) = max(0, x) |
Softmax |
Normalized exponential (numerically stable) | f(x_i) = e^(x_i) / Σe^(x_j) |
| Operator | Description |
|---|---|
Dense |
Fully connected layer with optional bias |
Flatten |
Zero-copy tensor reshaping |
import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';
async function classifyMNIST(imageData: Float32Array): Promise<number> {
const context = new GPUContext();
try {
await context.init();
const engine = new InferenceEngine(context);
await engine.loadModel('mnist-model.json');
// Input: 1x1x28x28 (grayscale MNIST)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
// Run inference
const output = await engine.infer(input);
const predictions = await output.download();
// Get result
const predictedDigit = predictions.indexOf(Math.max(...predictions));
// Cleanup
input.destroy();
output.destroy();
engine.dispose();
return predictedDigit;
} finally {
// Ensure GPU resources are released even if an error occurs
context.destroy();
}
}
// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
.then(digit => console.log('Recognized digit:', digit))
.catch(err => console.error('Inference failed:', err));→ See more Examples including custom models, web integration, and performance benchmarking.
| Browser | Minimum Version | Status |
|---|---|---|
| Chrome | 113+ | ✅ Fully Supported |
| Edge | 113+ | ✅ Fully Supported |
| Safari | 18+ (macOS Sonoma+) | |
| Firefox | Behind flag | 🔧 Enable dom.webgpu.enabled |
if (navigator.gpu) {
console.log('✅ WebGPU is supported!');
} else {
console.error('❌ WebGPU not supported in this browser');
}┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (InferenceEngine, ModelLoader) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ Operator Layer │
│ (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ Core Layer │
│ (GPUContext, Tensor, Memory Management) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ WebGPU Runtime │
│ (WGSL Shaders, GPU Compute) │
└─────────────────────────────────────────────────────────────┘
tiny-dl-inference/
├── openspec/ # OpenSpec 规范驱动开发(单一事实来源)
│ ├── specs/ # 规范文档
│ │ ├── product/ # 产品需求规范(PRD)
│ │ ├── architecture/ # 架构设计规范
│ │ ├── api/ # API 规范
│ │ └── testing/ # BDD 测试规范
├── docs/ # User documentation (Bilingual)
│ ├── en/ # English (26 files)
│ └── zh/ # 中文 (27 files)
├── src/ # Source code
│ ├── core/ # GPUContext, Tensor, error classes
│ ├── operators/ # Neural network operators
│ ├── engine/ # InferenceEngine, ModelLoader
│ └── utils/ # Benchmark, CPU reference implementations
├── tests/ # Test suite (Vitest)
└── examples/ # Demo code (MNIST, benchmark)
# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference
# Install dependencies
npm install
# Run type checking
npm run typecheck
# Run tests (134 passing)
npm test
# Build project
npm run build# Run all tests
npm test
# Run with coverage report
npm run test:coverage
# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts
# Property-based tests (100+ iterations each)
npx vitest run -t "property"Test Coverage:
- ✅ 134 tests passing
- ✅ 13 property-based tests with fast-check
- ✅ CPU reference implementations for correctness validation
- ✅ Target: >90% code coverage (V8)
- Quick Start Guide — Get up and running in 5 minutes
- Installation — Detailed setup instructions
- Architecture — System design overview
- GPU Context — WebGPU resource management
- Tensors — Multi-dimensional data structures
- Operators — Neural network layers
- Memory Layout — NCHW vs NHWC
- Optimization Guide — Performance tuning
- Kernel Fusion — Custom fused operators
- Custom Operators — Build your own WGSL operators
- Benchmarking — Performance measurement
- GPUContext — Device management
- Tensor — Data structures
- Operators — All operators
- InferenceEngine — High-level API
- MNIST Classification — Handwritten digit recognition
- Custom Model — Build models from scratch
- Web Integration — Browser-based app
- Performance Tuning — Benchmarking guide
- Interactive Playground — Experiment with operators
→ Browse Full Documentation: English | 中文
We welcome contributions! This project follows Spec-Driven Development (SDD) — all changes must be defined in /specs/ first.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Review specs in
/specs/before coding - Implement your changes
- Test thoroughly (134+ tests)
- Submit a Pull Request
- Contributing Guide — Full development workflow
- AGENTS.md — AI agent development guidelines
- Specs Directory — Single Source of Truth
- TypeScript strict mode (
strict: true) - 2-space indentation, single quotes
- Property-based testing with
fast-check - Follow existing patterns in
/src/operators/
This project uses Spec-Driven Development — specifications are the Single Source of Truth:
| Spec | Location | Purpose |
|---|---|---|
| Requirements | openspec/specs/product/spec.md |
What to build |
| Architecture | openspec/specs/architecture/spec.md |
How to build it |
| API Contracts | openspec/specs/api/spec.md |
Interface definitions |
| Test Criteria | openspec/specs/testing/spec.md |
Acceptance criteria |
See CHANGELOG.md for all releases.
Security:
- Fixed 5 moderate npm vulnerabilities
- Updated vitest to v4.1.4
Performance:
- Kernel fusion: 3× memory reduction
- Zero-copy reshape: < 1μs overhead
- GPU memory leak fixes
MIT License — Free for personal and commercial use.
- 📖 Documentation: https://lessup.github.io/tiny-dl-inference/
- 💻 GitHub Repository: https://github.com/LessUp/tiny-dl-inference
- 🐛 Issue Tracker: https://github.com/LessUp/tiny-dl-inference/issues
- 📦 npm Package: https://www.npmjs.com/package/tiny-dl-inference
Built with ❤️ for the AI community