An educational web UI that lets you see inside a tiny transformer as it generates names, character by character.
Built on top of Andrej Karpathy's microgpt — a ~200-line pure-Python character-level GPT that implements everything from scratch (autograd, attention, training) with zero dependencies. See his blog post for more.
The UI has two tabs:
Visualizes each step of name generation through 6 pipeline cards:
- Tokenization — how characters map to token IDs (27 tokens: a-z + [START])
- Embedding — token and position embeddings combined into a 16-number vector
- Attention — heatmaps for each of the 4 attention heads showing which earlier characters the model focuses on
- Combining — per-head outputs (4 groups of 4), blended via
wo, then refined through an MLP (64 neurons) into a final 16-dim embedding - Projection — the final embedding is projected through
lm_headinto 27 raw logits (one per character), with the top 5 labeled - Prediction — probability bar chart over all possible next characters (softmax of logits)
Displays all 9 weight matrices (~4,192 parameters) as interactive colored heatmaps. Hover over any cell to see its exact value. Matrices are shown in forward-pass order — token/position embeddings, Q/K/V/O attention projections, MLP up/down layers, and the final language model head. Card colors match the corresponding Generation tab stages.
Requires Python 3.13+ and uv.
# Install dependencies
uv sync
# Train the model and save weights (~1 min)
uv run python save_weights.py
# Start the web UI
uv run python app.pyThen visit http://localhost:5001.
docker build -t microgpt .
docker run -p 5001:5001 microgptThis uses gunicorn as a production server and doesn't require Python or uv on the host machine. Useful for deploying to a server.
The repo includes a fly.toml config. CI/CD auto-deploy is not currently enabled, but can be set up with a GitHub Actions workflow:
To set up your own deployment:
- Install the Fly CLI and sign up
- Run
fly launch(creates the app and sets theFLY_API_TOKENGitHub secret) - Push to
main— GitHub Actions handles the rest
To deploy manually: fly deploy
As of March 2026, running on Fly.io cost about $0.20/day with min_machines_running = 1 in fly.toml. This keeps one machine always warm for snappy responses — with 0, cold starts made it noticeably sluggish.
- Generate a Name — generates a full name, animating one character at a time. Click any character to inspect that step's internals.
- Step Through — generates one character at a time. Click "Next Character" to advance. Better for deep exploration.
- Temperature slider — controls randomness. Low = predictable, high = creative. Recomputes probabilities client-side from raw logits (no server call).
microgpt.py is the original training script (untouched). save_weights.py runs training and serializes the ~4,192 learned parameters to weights.json. inference.py reimplements the forward pass with plain floats (no autograd) and captures intermediates at every layer. app.py serves a Flask API that the single-page frontend calls to generate names and retrieve visualization data.