Skip to content

LessUp/tiny-dl-inference

Repository files navigation

Tiny-DL-Inference

CI Status npm version Bundle Size WebGPU TypeScript License

A High-Performance WebGPU Deep Learning Inference Engine

Zero Dependencies · Hand-Written WGSL · GPU-Accelerated · Type-Safe

Quick Start · Features · Performance · Documentation · Contributing

English | 简体中文


Why Tiny-DL-Inference?

The smallest, most transparent deep learning inference engine for the web.

Tiny-DL-Inference TensorFlow.js ONNX Runtime Web
Bundle Size 58KB ~2MB ~1.5MB
Dependencies Zero Heavy Moderate
Code Transparency 100% WGSL source Black box Black box
GPU Control Direct shader access Abstracted Abstracted
Kernel Fusion ✅ Manual fusion Limited Limited

Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.


Features

🚀 Performance

  • Zero Dependencies — No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
  • Kernel Fusion — Fused Conv2d+Bias+ReLU achieves 3× memory bandwidth reduction
  • Zero-Copy Operations — Tensor views with no GPU overhead (< 1μs reshape)
  • Hand-Written WGSL — Every operator implemented from scratch in readable WGSL code

🛠 Developer Experience

  • Type Safe — Full TypeScript with strict mode, zero any types
  • Comprehensive Testing — Property-based testing with fast-check (100+ iterations each)
  • Production Ready — Custom error classes, proper GPU resource lifecycle
  • Educational — Perfect for studying GPU computing and WebGPU programming

Quick Start

Requirements

  • Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
  • Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
  • Node.js: 18.0+ (for development)

Installation

npm install tiny-dl-inference

🚀 Try it Online

Open in StackBlitz

First Inference

import { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';

// 1. Initialize GPU context
const context = new GPUContext();
await context.init();

// 2. Create input tensor
const input = Tensor.fromArray(context, 
  new Float32Array([1.0, -2.0, 3.0, -4.0]),
  [1, 4, 1, 1]  // [batch, channels, height, width]
);

// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);

// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])

// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();

Using InferenceEngine (High-Level API)

import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';

// Initialize engine
const context = new GPUContext();
await context.init();

const engine = new InferenceEngine(context);

// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');

// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);

// Run inference
const output = await engine.infer(input);
const predictions = await output.download();

// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);

// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();

→ Read the Full Documentation for detailed guides and examples.


Performance

Kernel Fusion: 3× Memory Bandwidth Reduction

Without Fusion (6 memory operations):
  Read → Conv → Write → Read → Bias → Write → Read → ReLU → Write

With Fusion (2 memory operations):
  Read → Conv+Bias+ReLU → Write
Benchmark Separate Operators Fused Operator Improvement
Conv2d 64-channel 2.34ms 0.89ms 2.6× faster
Memory Operations 6 ops 2 ops 3× reduction
Kernel Launches 3 1 66% fewer
Intermediate Tensors 3 allocated 0 100% saved

Zero-Copy Reshape

// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]);  // < 1 microsecond

First Inference Latency

Model Latency Device
MNIST CNN < 100ms Chrome 120, RTX 3060
CIFAR-10 < 150ms Chrome 120, RTX 3060

Supported Operators

Convolution

Operator Description Fusion Available
Conv2d 2D Convolution with stride/padding ✅ Fused with Bias+ReLU
Conv2dBiasReLU Conv + Bias + ReLU in single kernel 3× memory reduction

Pooling

Operator Description
MaxPool 2D Max Pooling with configurable kernel size

Activation Functions

Operator Description Formula
ReLU Rectified Linear Unit f(x) = max(0, x)
Softmax Normalized exponential (numerically stable) f(x_i) = e^(x_i) / Σe^(x_j)

Fully Connected

Operator Description
Dense Fully connected layer with optional bias
Flatten Zero-copy tensor reshaping

Complete Example: MNIST Classification

import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';

async function classifyMNIST(imageData: Float32Array): Promise<number> {
  const context = new GPUContext();
  
  try {
    await context.init();
    const engine = new InferenceEngine(context);
    await engine.loadModel('mnist-model.json');
    
    // Input: 1x1x28x28 (grayscale MNIST)
    const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
    
    // Run inference
    const output = await engine.infer(input);
    const predictions = await output.download();
    
    // Get result
    const predictedDigit = predictions.indexOf(Math.max(...predictions));
    
    // Cleanup
    input.destroy();
    output.destroy();
    engine.dispose();
    
    return predictedDigit;
  } finally {
    // Ensure GPU resources are released even if an error occurs
    context.destroy();
  }
}

// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
  .then(digit => console.log('Recognized digit:', digit))
  .catch(err => console.error('Inference failed:', err));

→ See more Examples including custom models, web integration, and performance benchmarking.


Browser Compatibility

Browser Minimum Version Status
Chrome 113+ ✅ Fully Supported
Edge 113+ ✅ Fully Supported
Safari 18+ (macOS Sonoma+) ⚠️ Experimental
Firefox Behind flag 🔧 Enable dom.webgpu.enabled

Check WebGPU Support

if (navigator.gpu) {
  console.log('✅ WebGPU is supported!');
} else {
  console.error('❌ WebGPU not supported in this browser');
}

Project Structure

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                        │
│              (InferenceEngine, ModelLoader)                 │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                    Operator Layer                           │
│    (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.)            │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                      Core Layer                             │
│         (GPUContext, Tensor, Memory Management)             │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                    WebGPU Runtime                           │
│              (WGSL Shaders, GPU Compute)                    │
└─────────────────────────────────────────────────────────────┘

Directory Layout

tiny-dl-inference/
├── openspec/           # OpenSpec 规范驱动开发(单一事实来源)
│   ├── specs/          # 规范文档
│   │   ├── product/    # 产品需求规范(PRD)
│   │   ├── architecture/ # 架构设计规范
│   │   ├── api/        # API 规范
│   │   └── testing/    # BDD 测试规范
├── docs/               # User documentation (Bilingual)
│   ├── en/             # English (26 files)
│   └── zh/             # 中文 (27 files)
├── src/                # Source code
│   ├── core/           # GPUContext, Tensor, error classes
│   ├── operators/      # Neural network operators
│   ├── engine/         # InferenceEngine, ModelLoader
│   └── utils/          # Benchmark, CPU reference implementations
├── tests/              # Test suite (Vitest)
└── examples/           # Demo code (MNIST, benchmark)

Development

Setup

# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference

# Install dependencies
npm install

# Run type checking
npm run typecheck

# Run tests (134 passing)
npm test

# Build project
npm run build

Testing

# Run all tests
npm test

# Run with coverage report
npm run test:coverage

# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts

# Property-based tests (100+ iterations each)
npx vitest run -t "property"

Test Coverage:

  • 134 tests passing
  • 13 property-based tests with fast-check
  • ✅ CPU reference implementations for correctness validation
  • ✅ Target: >90% code coverage (V8)

Documentation

📚 Getting Started

🔧 Core Concepts

🚀 Advanced

📖 API Reference

💡 Examples

🧪 Playground


中文文档

→ Browse Full Documentation: English | 中文


Contributing

We welcome contributions! This project follows Spec-Driven Development (SDD) — all changes must be defined in /specs/ first.

Quick Start

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Review specs in /specs/ before coding
  4. Implement your changes
  5. Test thoroughly (134+ tests)
  6. Submit a Pull Request

Resources

Code Style

  • TypeScript strict mode (strict: true)
  • 2-space indentation, single quotes
  • Property-based testing with fast-check
  • Follow existing patterns in /src/operators/

Specifications

This project uses Spec-Driven Development — specifications are the Single Source of Truth:

Spec Location Purpose
Requirements openspec/specs/product/spec.md What to build
Architecture openspec/specs/architecture/spec.md How to build it
API Contracts openspec/specs/api/spec.md Interface definitions
Test Criteria openspec/specs/testing/spec.md Acceptance criteria

Changelog

See CHANGELOG.md for all releases.

Latest: v2.0.1 (2026-04-16)

Security:

  • Fixed 5 moderate npm vulnerabilities
  • Updated vitest to v4.1.4

Performance:

  • Kernel fusion: 3× memory reduction
  • Zero-copy reshape: < 1μs overhead
  • GPU memory leak fixes

Full Changelog


License

MIT License — Free for personal and commercial use.


Links


Built with ❤️ for the AI community