rust-tiktoken

The fastest Rust BPE tokenizer, plus its WebAssembly bindings. Drop-in compatible with OpenAI tiktoken and the mainstream open models (Llama 3, DeepSeek, Qwen, Mistral). A hand-written ASCII fast-path makes ASCII text 15–40x faster than tiktoken-rs (≈2x on CJK/Unicode).

Crates in this workspace

Path	Crate / Package	Description
`tiktoken/`	`tiktoken`	Rust BPE tokenizer — 11 encodings, 68 models, multi-provider pricing
`tiktoken-wasm/`	`tiktoken-wasm` (Rust)	WASM binding crate for the above
`tiktoken-wasm/`	`@goliapkg/tiktoken-wasm` (npm)	Same, published to npm via `wasm-pack`

The two crates live in one workspace and are versioned in lockstep — every release bumps and publishes both at the same version number.

Highlights

ASCII fast-path pre-tokenizer — resolves the common ASCII pieces (letters, digits, punctuation, contractions) without invoking the regex engine; 2.3–5.5x faster encode / count on ASCII text for cl100k / o200k / qwen2 / deepseek. Unicode/CJK transparently falls back to the regex.
11 encodings · 68 models · 7 providers — OpenAI (GPT-4/4o/4.1/4.5, GPT-5.x, o1/o3/o4-mini, gpt-oss), Llama 3/4, DeepSeek, Qwen, Mistral; plus USD cost estimation (Anthropic & Google included for pricing).
Lean & portable — arena-based vocabulary, hybrid linear/heap BPE merge, optional rayon parallelism, allocation-free count(), pure Rust with zero C dependencies, a tiny wasm build, and zstd-compressed vocab embedded at compile time.

Full API, supported-model tables, and benchmarks live in the per-crate READMEs: tiktoken/ · tiktoken-wasm/.

Quick start

Rust

[dependencies]
tiktoken = "3.5"

// by encoding name
let enc = tiktoken::get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world");
assert_eq!(enc.decode_to_string(&tokens).unwrap(), "hello world");

// count without allocating a token vector
let n = enc.count("The quick brown fox.");

// or resolve by model name
let enc = tiktoken::encoding_for_model("gpt-4o").unwrap();

WebAssembly (browser / Node.js)

npm install @goliapkg/tiktoken-wasm

import init, { getEncoding } from '@goliapkg/tiktoken-wasm'
await init()
const enc = getEncoding('o200k_base')
const tokens = enc.encode('hello world')

Performance

On an Apple M4 Mac mini, ASCII encode / count is 15–40x faster than tiktoken-rs and ~20–40x faster than Python tiktoken (≈2x on CJK/Unicode, which defers to the regex). Full per-encoding tables and methodology: tiktoken/README.md#performance.

Build

cargo test -p tiktoken
cargo fmt --all --check
cargo clippy --workspace --lib -- -D warnings

# WASM (requires wasm-pack: cargo install wasm-pack)
cd tiktoken-wasm
wasm-pack build --target web --release --scope goliapkg

Release

tiktoken and tiktoken-wasm are versioned in lockstep and released together via git-flow (no PRs):

git flow release start X.Y.Z
# bump versions to X.Y.Z: tiktoken/Cargo.toml, tiktoken-wasm/Cargo.toml
# (and its tiktoken path-dep); finalize both CHANGELOGs
git flow release finish X.Y.Z                       # merge → master, tag vX.Y.Z, back-merge develop
git tag -a tiktoken-wasm-vX.Y.Z vX.Y.Z^{commit} -m "tiktoken-wasm X.Y.Z"
git push origin master develop vX.Y.Z tiktoken-wasm-vX.Y.Z
# tag `v*` publishes the tiktoken crate; `tiktoken-wasm-v*` publishes the wasm crate + npm

License

Licensed under either of MIT or Apache-2.0, at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github		.github
tiktoken-wasm		tiktoken-wasm
tiktoken		tiktoken
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.ja.md		README.ja.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rust-tiktoken

Crates in this workspace

Highlights

Quick start

Rust

WebAssembly (browser / Node.js)

Performance

Build

Release

License

About

Licenses found

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rust-tiktoken

Crates in this workspace

Highlights

Quick start

Rust

WebAssembly (browser / Node.js)

Performance

Build

Release

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages