Run local models from your macOS menu bar — behind an OpenAI-compatible API.
Website · Install · How it works · Develop
Dakodeon is a tiny menu bar app. Start a local
llama-server router and point any agent — OpenCode,
Zed, or your own scripts — at http://127.0.0.1:8080/v1. Your agent chooses the model by id;
Dakodeon shows what's loaded and manages the downloads.
It bundles no runtime and no weights. It drives the llama.cpp and hf tools already
on your machine, so the app itself stays tiny.
brew install --cask emin93/tap/dakodeonNote
Requirements: macOS 14+, with llama-server and hf on your PATH.
brew install llama.cpp
pip install -U "huggingface_hub[cli]" # provides `hf`| 🧭 Menu bar control | Start/stop the server and see the active model from a slim panel. |
| 📦 Model manager | A Settings window shows each model's download status — download, cancel, delete, or reveal weights in Finder. |
| 🔄 Selection in your agent | Clients like OpenCode select a model by id; llama-server routes to that profile and keeps one loaded at a time. The app has no model picker. |
| 🧹 Clean shutdown | Quitting the app stops llama-server. |
| 🚀 Native defaults | llama-server loads each model's trained context and embedded chat template. |
Dakodeon launches llama-server in router mode and exposes the standard
OpenAI-compatible endpoints. GET /v1/models returns the available profile ids,
such as gemma4-12b-it-qat and gemma4-31b-it-qat. Chat requests route by the
JSON model field, so switching models in a client like OpenCode also moves the
active model Dakodeon shows in the menu.
POST http://127.0.0.1:8080/v1/chat/completions
GET http://127.0.0.1:8080/v1/modelsModel files download to the shared Hugging Face cache via hf; the app resolves the
local GGUF paths and points the server at them — nothing is copied or duplicated.
Sizes and LFS hashes come from hf models list <repo> -R --json. Models are shown by
their Hugging Face repository — the same id clients send.
Profiles are curated in code at
Sources/Dakodeon/Catalog.swift. Each ModelProfile
declares its weights, an optional draft / MTP model, and any extra llama-server flags.
The app exposes no per-user configuration — to add a model, append an entry:
ModelProfile(
id: "gemma4-12b-it-qat",
weights: ModelAsset(repo: "unsloth/gemma-4-12B-it-qat-GGUF", file: "gemma-4-12B-it-qat-UD-Q4_K_XL.gguf"),
draft: ModelAsset(repo: "unsloth/gemma-4-12B-it-qat-GGUF", file: "mtp-gemma-4-12B-it.gguf"),
extraArguments: ["-ngl", "999", "--spec-type", "draft-mtp"]
)Bundled today
| Profile | Quant | Draft | Download |
|---|---|---|---|
| Gemma 4 12B IT QAT | UD-Q4_K_XL | MTP | 6.97 GB |
| Gemma 4 31B IT QAT | UD-Q4_K_XL | MTP | 17.57 GB |
make run # build, package, and launch the .app
make dist # build the signed Dakodeon.app bundle
make zip # build dist/Dakodeon.zip (release artifact)| File | Responsibility |
|---|---|
Catalog.swift |
Curated model profiles + types |
ModelStore.swift |
Download / delete / status via the hf cache |
ServerController.swift |
llama-server lifecycle, active-model sync, shutdown |
MenuView.swift |
The menu bar panel |
SettingsView.swift |
Model-management window |
DakodeonApp.swift |
App entry, scenes, and icons |
MIT for the app. Model weights remain under their own licenses — the bundled Gemma model follows the Gemma Terms of Use.
