Tiny-DL-Inference

A High-Performance WebGPU Deep Learning Inference Engine

Zero Dependencies · Hand-Written WGSL · GPU-Accelerated · Type-Safe

Quick Start · Features · Performance · Documentation · Contributing

Why Tiny-DL-Inference?

The smallest, most transparent deep learning inference engine for the web.

	Tiny-DL-Inference	TensorFlow.js	ONNX Runtime Web
Bundle Size	58KB	~2MB	~1.5MB
Dependencies	Zero	Heavy	Moderate
Code Transparency	100% WGSL source	Black box	Black box
GPU Control	Direct shader access	Abstracted	Abstracted
Kernel Fusion	✅ Manual fusion	Limited	Limited

Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.

Features

🚀 Performance

Zero Dependencies — No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
Kernel Fusion — Fused Conv2d+Bias+ReLU achieves 3× memory bandwidth reduction
Zero-Copy Operations — Tensor views with no GPU overhead (< 1μs reshape)
Hand-Written WGSL — Every operator implemented from scratch in readable WGSL code

🛠 Developer Experience

Type Safe — Full TypeScript with strict mode, zero any types
Comprehensive Testing — Property-based testing with fast-check (100+ iterations each)
Production Ready — Custom error classes, proper GPU resource lifecycle
Educational — Perfect for studying GPU computing and WebGPU programming

Quick Start

Requirements

Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
Node.js: 18.0+ (for development)

Installation

npm install tiny-dl-inference

🚀 Try it Online

First Inference

import { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';

// 1. Initialize GPU context
const context = new GPUContext();
await context.init();

// 2. Create input tensor
const input = Tensor.fromArray(context, 
  new Float32Array([1.0, -2.0, 3.0, -4.0]),
  [1, 4, 1, 1]  // [batch, channels, height, width]
);

// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);

// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])

// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();

Using InferenceEngine (High-Level API)

import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';

// Initialize engine
const context = new GPUContext();
await context.init();

const engine = new InferenceEngine(context);

// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');

// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);

// Run inference
const output = await engine.infer(input);
const predictions = await output.download();

// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);

// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();

→ Read the Full Documentation for detailed guides and examples.

Performance

Kernel Fusion: 3× Memory Bandwidth Reduction

Without Fusion (6 memory operations):
  Read → Conv → Write → Read → Bias → Write → Read → ReLU → Write

With Fusion (2 memory operations):
  Read → Conv+Bias+ReLU → Write

Benchmark	Separate Operators	Fused Operator	Improvement
Conv2d 64-channel	2.34ms	0.89ms	2.6× faster
Memory Operations	6 ops	2 ops	3× reduction
Kernel Launches	3	1	66% fewer
Intermediate Tensors	3 allocated	0	100% saved

Zero-Copy Reshape

// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]);  // < 1 microsecond

First Inference Latency

Model	Latency	Device
MNIST CNN	< 100ms	Chrome 120, RTX 3060
CIFAR-10	< 150ms	Chrome 120, RTX 3060

Supported Operators

Convolution

Operator	Description	Fusion Available
`Conv2d`	2D Convolution with stride/padding	✅ Fused with Bias+ReLU
`Conv2dBiasReLU`	Conv + Bias + ReLU in single kernel	✅ 3× memory reduction

Pooling

Operator	Description
`MaxPool`	2D Max Pooling with configurable kernel size

Activation Functions

Operator	Description	Formula
`ReLU`	Rectified Linear Unit	`f(x) = max(0, x)`
`Softmax`	Normalized exponential (numerically stable)	`f(x_i) = e^(x_i) / Σe^(x_j)`

Fully Connected

Operator	Description
`Dense`	Fully connected layer with optional bias
`Flatten`	Zero-copy tensor reshaping

Complete Example: MNIST Classification

import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';

async function classifyMNIST(imageData: Float32Array): Promise<number> {
  const context = new GPUContext();
  
  try {
    await context.init();
    const engine = new InferenceEngine(context);
    await engine.loadModel('mnist-model.json');
    
    // Input: 1x1x28x28 (grayscale MNIST)
    const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
    
    // Run inference
    const output = await engine.infer(input);
    const predictions = await output.download();
    
    // Get result
    const predictedDigit = predictions.indexOf(Math.max(...predictions));
    
    // Cleanup
    input.destroy();
    output.destroy();
    engine.dispose();
    
    return predictedDigit;
  } finally {
    // Ensure GPU resources are released even if an error occurs
    context.destroy();
  }
}

// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
  .then(digit => console.log('Recognized digit:', digit))
  .catch(err => console.error('Inference failed:', err));

→ See more Examples including custom models, web integration, and performance benchmarking.

Browser Compatibility

Browser	Minimum Version	Status
Chrome	113+	✅ Fully Supported
Edge	113+	✅ Fully Supported
Safari	18+ (macOS Sonoma+)	⚠️ Experimental
Firefox	Behind flag	🔧 Enable `dom.webgpu.enabled`

Check WebGPU Support

if (navigator.gpu) {
  console.log('✅ WebGPU is supported!');
} else {
  console.error('❌ WebGPU not supported in this browser');
}

Project Structure

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                        │
│              (InferenceEngine, ModelLoader)                 │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                    Operator Layer                           │
│    (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.)            │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                      Core Layer                             │
│         (GPUContext, Tensor, Memory Management)             │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                    WebGPU Runtime                           │
│              (WGSL Shaders, GPU Compute)                    │
└─────────────────────────────────────────────────────────────┘

Directory Layout

tiny-dl-inference/
├── openspec/           # OpenSpec 规范驱动开发（单一事实来源）
│   ├── specs/          # 规范文档
│   │   ├── product/    # 产品需求规范（PRD）
│   │   ├── architecture/ # 架构设计规范
│   │   ├── api/        # API 规范
│   │   └── testing/    # BDD 测试规范
├── docs/               # User documentation (Bilingual)
│   ├── en/             # English (26 files)
│   └── zh/             # 中文 (27 files)
├── src/                # Source code
│   ├── core/           # GPUContext, Tensor, error classes
│   ├── operators/      # Neural network operators
│   ├── engine/         # InferenceEngine, ModelLoader
│   └── utils/          # Benchmark, CPU reference implementations
├── tests/              # Test suite (Vitest)
└── examples/           # Demo code (MNIST, benchmark)

Development

Setup

# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference

# Install dependencies
npm install

# Run type checking
npm run typecheck

# Run tests (134 passing)
npm test

# Build project
npm run build

Testing

# Run all tests
npm test

# Run with coverage report
npm run test:coverage

# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts

# Property-based tests (100+ iterations each)
npx vitest run -t "property"

Test Coverage:

✅ 134 tests passing
✅ 13 property-based tests with fast-check
✅ CPU reference implementations for correctness validation
✅ Target: >90% code coverage (V8)

Documentation

📚 Getting Started

Quick Start Guide — Get up and running in 5 minutes
Installation — Detailed setup instructions
Architecture — System design overview

🔧 Core Concepts

GPU Context — WebGPU resource management
Tensors — Multi-dimensional data structures
Operators — Neural network layers
Memory Layout — NCHW vs NHWC

🚀 Advanced

Optimization Guide — Performance tuning
Kernel Fusion — Custom fused operators
Custom Operators — Build your own WGSL operators
Benchmarking — Performance measurement

📖 API Reference

GPUContext — Device management
Tensor — Data structures
Operators — All operators
InferenceEngine — High-level API

💡 Examples

MNIST Classification — Handwritten digit recognition
Custom Model — Build models from scratch
Web Integration — Browser-based app
Performance Tuning — Benchmarking guide

🧪 Playground

Interactive Playground — Experiment with operators

中文文档

快速开始 — 5 分钟内上手
架构设计 — 系统架构说明
算子文档 — 神经网络算子
优化指南 — 性能优化指南
API 参考 — 完整 API 文档

→ Browse Full Documentation: English | 中文

Contributing

We welcome contributions! This project follows Spec-Driven Development (SDD) — all changes must be defined in /specs/ first.

Quick Start

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Review specs in /specs/ before coding
Implement your changes
Test thoroughly (134+ tests)
Submit a Pull Request

Resources

Contributing Guide — Full development workflow
AGENTS.md — AI agent development guidelines
Specs Directory — Single Source of Truth

Code Style

TypeScript strict mode (strict: true)
2-space indentation, single quotes
Property-based testing with fast-check
Follow existing patterns in /src/operators/

Specifications

This project uses Spec-Driven Development — specifications are the Single Source of Truth:

Spec	Location	Purpose
Requirements	`openspec/specs/product/spec.md`	What to build
Architecture	`openspec/specs/architecture/spec.md`	How to build it
API Contracts	`openspec/specs/api/spec.md`	Interface definitions
Test Criteria	`openspec/specs/testing/spec.md`	Acceptance criteria

Changelog

See CHANGELOG.md for all releases.

Latest: v2.0.1 (2026-04-16)

Security:

Fixed 5 moderate npm vulnerabilities
Updated vitest to v4.1.4

Performance:

Kernel fusion: 3× memory reduction
Zero-copy reshape: < 1μs overhead
GPU memory leak fixes

→ Full Changelog

License

MIT License — Free for personal and commercial use.

Links

📖 Documentation: https://lessup.github.io/tiny-dl-inference/
💻 GitHub Repository: https://github.com/LessUp/tiny-dl-inference
🐛 Issue Tracker: https://github.com/LessUp/tiny-dl-inference/issues
📦 npm Package: https://www.npmjs.com/package/tiny-dl-inference

Built with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.claude		.claude
.github		.github
docs		docs
examples		examples
openspec		openspec
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
publish-release.sh		publish-release.sh
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Tiny-DL-Inference

Why Tiny-DL-Inference?

Features

🚀 Performance

🛠 Developer Experience

Quick Start

Requirements

Installation

🚀 Try it Online

First Inference

Using InferenceEngine (High-Level API)

Performance

Kernel Fusion: 3× Memory Bandwidth Reduction

Zero-Copy Reshape

First Inference Latency

Supported Operators

Convolution

Pooling

Activation Functions

Fully Connected

Complete Example: MNIST Classification

Browser Compatibility

Check WebGPU Support

Project Structure

Architecture Overview

Directory Layout

Development

Setup

Testing

Documentation

📚 Getting Started

🔧 Core Concepts

🚀 Advanced

📖 API Reference

💡 Examples

🧪 Playground

中文文档

Contributing

Quick Start

Resources

Code Style

Specifications

Changelog

Latest: v2.0.1 (2026-04-16)

License

Links

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages