n0x

The full AI stack, in one browser tab.

Try it → What's New How it works Run locally

What's New (v2.0)

n0x has been hardened for a cleaner public release. The current focus is build reliability, API rate limiting, safer dependency versions, better first-run onboarding, and clearer privacy/security documentation.

Highlights:

Hybrid web search - SearXNG, DuckDuckGo, and Wikipedia by default; optional Brave and Tavily keys for stronger coverage
Local document RAG - vector + BM25 retrieval, RRF fusion, MMR reranking, versioned cache, and fallback extraction
Provider routing - browser, Chrome AI, Ollama, and OpenAI-compatible cloud endpoints in one UI
Production guardrails - CI, lint/typecheck/build scripts, server-side rate limits, and opt-in funnel telemetry
Clear limitations - privacy, security, compatibility, and known-limitation pages document the tradeoffs

See full changelog →

n0x runs LLMs, autonomous agents, document Q&A, Python runtime, image generation, and web search in one browser tab. No server. No account. No API keys required. Open a tab, pick a model, start working.

The default path is fully local — your prompts, files, and model weights never leave your machine. WebGPU handles inference at 35–80 tok/s on a normal laptop GPU. But if you want more power, flip to Ollama or plug in a cloud API (Groq, OpenRouter, any OpenAI-compatible endpoint) and you're running the same tool stack against bigger models.

Why this exists

Every AI tool I tried either wanted my data, wanted my money, or both. I wanted something I could open in a browser and just use — no Docker, no Python venv, no sign-up wall, no "you've hit your free tier limit."

So I built n0x. It's an actual AI workstation, not a chatbot wrapper.

What you get

🤖 Pick your backend. Four providers, one interface:

Provider	What runs	Setup	Speed
Browser (WebGPU)	50+ open-source models, 360MB→70B, on your GPU via WebLLM	Zero. Just pick a model.	10-80 t/s
Ollama	Any model from your local Ollama server	`ollama serve` — auto-detected	Varies
Cloud API	Groq, OpenRouter, any OpenAI-compatible endpoint	Paste key + base URL	100-500 t/s
Chrome AI	Built-in Gemini Nano (experimental)	Chrome 127+ with flags	20-40 t/s

Switch between them mid-conversation. Your chat history stays. Auto-routing can switch automatically based on query complexity.

🤖 Agent mode. A ReAct reasoning loop that actually works.

The LLM chains tool calls autonomously — web search, document lookup, Python execution, memory recall — and you watch it think in real time, token by token.

What's new:

✅ Handles malformed JSON gracefully (multi-strategy parser)
✅ Loop detection (stops if calling same tool 3x)
✅ Context window budgeting (prevents OOM)
✅ Per-tool execution timeouts (30s max)
✅ Cumulative token tracking
✅ Streaming thought process (shows reasoning as it happens)

The trace UI shows every step with timing, tool args, and observations.

Document Q&A. Local-first hybrid RAG.

Drop a PDF, DOCX, TXT, CSV, HTML, or Markdown file into the chat. n0x:

Processing:

✅ Extracts text with a fallback chain
✅ Chunks with sentence-boundary awareness (50% overlap)
✅ Embeds with MiniLM-L6 in a Web Worker (UI stays responsive)
✅ Indexes with Voy vector search + BM25 keyword search
✅ Re-ranks with MMR (Maximum Marginal Relevance) for diverse results
✅ Caches vectors in IndexedDB (instant re-upload)

What's new:

✅ Versioned cache system - old caches auto-invalidate
✅ Type-safe chunk storage - validates all cached data
✅ 100-page PDF limit - prevents OOM on huge documents
✅ Binary file handling - shows helpful message instead of crashing
✅ Fallback extraction - works even if worker crashes
✅ Text sanitization - removes null bytes and control chars
✅ Hybrid search - vector + BM25 fused with RRF (Reciprocal Rank Fusion)

Supported formats: PDF (via PDF.js), DOCX (native WASM decompression), TXT, Markdown, CSV (formatted table), HTML (tags stripped)

Web search. Multi-source search with citations.

Type your query, toggle "Deep Search", and get search context and citations.

Multi-Engine Parallel Search:

🔍 SearXNG (free, privacy-respecting)
🦆 DuckDuckGo (instant answers)
📖 Wikipedia (authoritative content)
🦁 Brave Search (optional API key, excellent quality)
🔬 Tavily (optional API key, research-grade)

What's new:

✅ 5 engines in parallel (all run simultaneously, fastest wins)
✅ Answer synthesis - returns direct answer from top sources
✅ Deep content extraction - Jina Reader fetches full page content (2000 chars)
✅ Source citations - URLs + excerpts
✅ Priority fallback - Tavily → Brave → SearXNG → Wikipedia → DDG
✅ Graceful failure - returns useful errors when providers are down

No API keys required. Tavily/Brave are optional upgrades for better results.

🐍 Python runtime. Pyodide WASM sandbox.

Code output feeds back into the conversation. If execution fails, the error goes to the LLM automatically for a fix.

What's new:

✅ Better error handling - shows package install failures clearly
✅ Auto-load packages - import numpy just works
✅ Self-healing - failed code triggers automatic retry
✅ Helpful errors - suggests using micropip for manual installs

🎨 Image generation. Multiple engines, free tier.

Type "generate an image of..." and choose your engine:

Pollinations (Flux, z-image-turbo, klein, qwen-image) - Free, fast
AI Horde (Stable Diffusion) - Community-powered, queue-based

What's new:

✅ Smart fallback - tries Pollinations → free tier → Horde
✅ Better error messages - shows provider status
✅ Retry logic - auto-retries on transient failures

🧠 Memory. Persistent, searchable knowledge.

The agent stores and recalls facts across sessions.

How it works:

✅ Hybrid embeddings - unigrams + bigrams + trigrams with TF-IDF weighting
✅ Vector + keyword search - finds semantically similar + exact matches
✅ Auto-save - saves meaningful exchanges (when memory toggle is ON)
✅ Tags - auto-tags with context (chat, search, rag, cloud, local)
✅ IndexedDB storage - persists across sessions

What's new:

✅ Respects toggle - only saves when memory is enabled
✅ Better similarity - 1024-dim vectors vs 512-dim (less collision)
✅ Proper cleanup - no more IndexedDB handle leaks

💾 Storage Manager. Clear data without leaving the app.

Browsers limit IndexedDB storage (usually 2GB). Built-in Storage Manager lets you clear:

Chat history
Semantic memory
RAG vector cache
Model weights (WebLLM cache)

What's new:

✅ Blocked state handling - warns if another tab has DB open
✅ Confirmation flow - prevents accidental deletion
✅ Page reload after clear - ensures clean state

🎤 Voice. Speech-to-text and text-to-speech.

Web Speech API integration. Works offline.

Press microphone → speak → auto-submit
Toggle TTS → responses read aloud
Interrupt mid-speech

🌳 Branching. Fork conversations.

Click any message → "Branch" → create alternate timeline. Both branches persist in the sidebar.

🎭 Personas. Five system prompts.

Default - Balanced, helpful assistant
Senior Engineer - Code reviews, architecture, best practices
Writer - Creative, storytelling, editing
Tutor - Teaching, explanations, exercises
Analyst - Data analysis, insights, visualization

Each with their own tone, formatting rules, and domain focus.

How it works

                              ┌─────────────────────┐
                              │    Provider Layer    │
                              │  WebGPU · Ollama ·   │
                              │  Cloud · Chrome AI   │
                              └────────┬────────────┘
                                       │
┌──────────┐     ┌───────────┐    ┌────▼────┐
│  User     │────▶  Router    │───▶│  LLM    │
│  Input    │     │           │    │ Stream  │
└──────────┘     │ Auto-Route │    └────┬────┘
                 │ direct /   │         │
                 │ agent /    │    ┌────▼─────────────────────┐
                 │ image      │    │  Agent (ReAct Loop)       │
                 └───────────┘    │  thought → action →       │
                                  │  observation → repeat     │
                                  │                           │
                                  │  Tools:                   │
                                  │   ├ Multi-Engine Search   │
                                  │   │  (5 parallel engines) │
                                  │   ├ Hybrid RAG            │
                                  │   │  (Vector+BM25+MMR)    │
                                  │   ├ Python (Pyodide)      │
                                  │   ├ Memory (IndexedDB)    │
                                  │   └ Image Gen (Multi)     │
                                  └───────────────────────────┘

What's new:

✅ Auto-routing - routes BEFORE context gathering (faster, cheaper)
✅ Retry logic - 2 retries with exponential backoff on failures
✅ Graceful degradation - context fails → continues without context
✅ Resource cleanup - proper IndexedDB/AbortController lifecycle

Everything above the line runs in the browser. The only network calls are optional: search queries, image prompts, cloud API calls. Disable them and you have a fully air-gapped AI workstation.

Models

50+ models. MLC-compiled, quantized, cached in browser storage after first download. Real inference, not API calls.

Category	Examples	Size	Speed	Use Case
⚡ Tiny	SmolLM2 360M, Qwen 0.5B, TinyLlama 1.1B	360MB–900MB	60-80 t/s	Any device, instant responses
⚖️ Balanced	Qwen 2.5 1.5B (default), Phi-3.5, Llama 3.2 3B, Gemma 2 2B	700MB–2.2GB	35–50 t/s	Best quality/speed balance
🚀 Powerful	Mistral 7B, Qwen 2.5 7B, Llama 3.1 8B, Gemma 2 9B, Qwen 14B	4–10GB	15–25 t/s	High quality, needs VRAM
🧠 Reasoning	DeepSeek R1 distills (1.5B, 7B, 14B, 32B, 70B)	1GB–30GB	10–20 t/s	Chain-of-thought reasoning
💻 Code	Qwen Coder 1.5B/7B/32B, DeepSeek Coder, Qwen Math	800MB–20GB	Varies	Code generation, debugging
🔥 Flagship	Qwen 2.5 32B, Llama 3.3 70B, R1 Llama 70B	10–30GB	8–15 t/s	Near GPT-4 quality

Start with Qwen 2.5 1.5B (~1GB). It loads in seconds on a warm cache and handles most tasks well. Scale up from there.

Run it yourself

Chrome or Edge (WebGPU required). Node 18+.

git clone https://github.com/ixchio/n0x.git
cd n0x
npm install
npm run dev

Open localhost:3000. First launch downloads the default model (~1GB) — after that it loads from cache instantly.

Optional env vars

# Better search (optional - everything works without these)
TAVILY_API_KEY=tvly-xxxxx        # Research-grade search results
BRAVE_API_KEY=BSA-xxxxx          # Excellent search quality

# Image generation (optional)
POLLINATIONS_API_KEY=xxxxx       # Higher rate limits, no watermarks

All are optional. n0x works 100% free with no API keys.

Performance

Local (WebGPU)

Qwen 2.5 1.5B: 40-50 t/s, 1GB, good quality
Qwen 2.5 7B: 15-25 t/s, 4GB, excellent quality
Llama 3.3 70B: 8-12 t/s, 30GB, near GPT-4 quality

Cloud (Groq - Free Tier)

Llama 3.3 70B: 200-300 t/s, GPT-4 level
Mixtral 8x7B: 300-400 t/s, very capable
Llama 3.1 8B: 500+ t/s

RAG Performance

Indexing: ~1s per 100 pages
Re-upload: Instant (cached)
Search: <100ms
Cache size: ~500KB per document

Web Search

Total time: 2-5s
Parallel engines: All run simultaneously
Fastest wins: Returns as soon as best result arrives

Privacy

Your prompts, documents, and model weights stay in your browser. Period.

What leaves your machine:

Search queries → DuckDuckGo/SearXNG/Wikipedia (if you use search)
Image prompts → Pollinations API (if you generate images)
Cloud API calls → Your chosen provider (if you use cloud mode)

Turn off search + images + cloud = 100% air-gapped. No metadata, no telemetry, nothing.

What's stored locally:

Chat history (IndexedDB)
Memory (IndexedDB)
RAG vectors (IndexedDB)
Model weights (Cache API)

Total storage: 0.5-30GB depending on models/documents. Clear anytime via Storage Manager.

Stack

Frontend: Next.js 14 · React 18 · TypeScript · Tailwind CSS · Framer Motion AI/ML: WebLLM (WebGPU) · Transformers.js · Voy · MiniLM-L6 Runtime: Pyodide (Python WASM) Storage: IndexedDB · Zustand Search: Tavily · Brave · SearXNG · DuckDuckGo · Wikipedia · Jina Reader Image: Pollinations · AI Horde

Roadmap

See ROADMAP.md for 30+ planned features including:

🎤 Voice interface upgrade (Whisper.cpp, wake word)
🖼️ Multi-modal RAG (OCR, image understanding)
🕸️ Knowledge graph RAG (entity/relationship extraction)
🤖 Custom agents (user-created, shareable)
📱 Mobile PWA (offline support, native features)
🔌 Plugin system (GitHub, Notion, Slack integrations)
🎥 Video understanding (upload videos, ask questions)

Vote on features →

Contributing

Contributions welcome! See CONTRIBUTING.md.

Quick start:

Fork the repo
Create a branch: git checkout -b feature/amazing-feature
Make changes
Test thoroughly (see TESTING_GUIDE.md)
Submit PR

Bug reports: Open an issue with:

What you did
What happened
Browser console errors
Browser/OS version

Credits

Built by ixchio with contributions from the community.

Powered by:

MLC-LLM - WebGPU inference
Transformers.js - Embeddings
Pyodide - Python in WASM
Voy - Vector search
PDF.js - PDF parsing

Screenshots

License

MIT © ixchio

Free. Local. Private. Powerful.

No sign-up. No API keys. No data collection.

Try n0x →

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
app		app
components		components
docs		docs
lib		lib
public		public
.dockerignore		.dockerignore
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
tailwind.config.cjs		tailwind.config.cjs
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

n0x

What's New (v2.0)

Highlights:

Why this exists

What you get

🤖 Pick your backend. Four providers, one interface:

🤖 Agent mode. A ReAct reasoning loop that actually works.

Document Q&A. Local-first hybrid RAG.

Web search. Multi-source search with citations.

🐍 Python runtime. Pyodide WASM sandbox.

🎨 Image generation. Multiple engines, free tier.

🧠 Memory. Persistent, searchable knowledge.

💾 Storage Manager. Clear data without leaving the app.

🎤 Voice. Speech-to-text and text-to-speech.

🌳 Branching. Fork conversations.

🎭 Personas. Five system prompts.

How it works

Models

Run it yourself

Optional env vars

Performance

Local (WebGPU)

Cloud (Groq - Free Tier)

RAG Performance

Web Search

Privacy

Stack

Roadmap

Contributing

Credits

Screenshots

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages