Skip to content

graff-agent: Hermes-class harness in a NEW repo, prerequisites in codegraff #34

@justrach

Description

@justrach

Thesis

graff-agent is a new, separate repo that turns the existing graff binary into a Hermes-class personal agent — self-improving, multi-platform, scheduled, runtime-pluggable. It does this by depending on @codegraff/sdk (the N-API package on release/0.1.53) from npm and adding the surface area Hermes has and we don't.

This issue lives in codegraff because the prerequisite work is here: a small set of SDK additions need to ship before the new repo is buildable. Once those land, graff-agent proceeds independently.


Hermes inventory (full deep dive)

Sourced from a structured pass over the hermes-agent wiki — every line below is a real subsystem with concrete file paths.

1. Agent core

  • AIAgent class in run_agent.py — the central orchestrator. AIAgent.run_conversation() is the main loop.
  • Provider/model resolution in hermes_cli/runtime_provider.py.
  • Auxiliary LLM client (agent/auxiliary_client.py) for vision, summarization, web extraction — i.e. side tasks use a cheap model.

2. Skills system (agentskills.io compatible)

  • 70+ bundled skills, optional skills hub.
  • skill_manage tool with actions create, patch, edit, write_file, delete, remove_file (see acp_adapter/tools.py).
  • Skills live under ~/.hermes/skills/<name>/ with SKILL.md + optional references/, templates/, scripts/ — same layout we already use under .forge/skills/.

3. Closed learning loop (the differentiator)

  • Autonomous skill creation: triggered after skills.creation_nudge_interval tool-calling iterations. Agent calls skill_manage action=create and writes to ~/.hermes/skills/.
  • Skill self-improvement during use: triggered when user corrects style/workflow or a non-trivial technique emerges. Priority order: patch loaded skill → update umbrella skill → add support file under umbrella → create new umbrella. Patches use old_string/new_string.
  • Periodic memory nudges: _MEMORY_REVIEW_PROMPT, _SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT injected periodically inside AIAgent (run_agent.py). _summarize_background_review_actions produces user-facing summaries.
  • FTS5 session search: SQLite at ~/.hermes/state.db with FTS5 over conversation content. session_search tool. Summarization step uses Gemini Flash.
  • Memory files: MEMORY.md and USER.md orchestrated by agent/memory_manager.py.

4. User modeling — Honcho

  • Plugin under plugins/memory/honcho/ implementing the MemoryProvider ABC.
  • Stores: session summary, user representation, AI peer card, persistent conclusions.
  • Two-layer injection into system prompt every turn: base layer (cadence: contextCadence) + dialectic LLM layer (dialecticCadence).
  • Tools: honcho_profile, honcho_search, honcho_context, honcho_reasoning, honcho_conclude.
  • Cold-start vs warm-start prompt strategies based on prior session existence.

5. Messaging gateway

  • GatewayRunner in gateway/run.py — long-running daemon, started/stopped via hermes gateway start|stop.
  • BasePlatformAdapter in gateway/platforms/base.py defines the contract. Required: connect, disconnect, send, send_typing, get_chat_info. Optional: send_document, send_voice, send_image_file, etc.
  • Adapters: telegram.py, discord.py, slack.py, plus WhatsApp, Signal, Email.
  • Session keys: agent:main:{platform}:{chat_type}:{chat_id}, built via gateway/session.build_session_key().
  • Per-user group sessions: group_sessions_per_user (default true) — each sender in a group gets isolated state.
  • State persistence: SQLite (~/.hermes/state.db) for metadata + JSONL transcripts in ~/.hermes/sessions/.
  • Streaming: progressive message edits when platform supports it (SUPPORTS_MESSAGE_EDITING), driven by streaming.transport: edit.
  • Voice: auto-TTS gated by voice.auto_tts + per-chat /voice on|off|tts.

6. Cron scheduler

  • cron/jobs.py + cron/scheduler.py.
  • Jobs stored as JSON, support multiple schedule formats, can attach skills and scripts, deliver to any platform.

7. Subagent / parallelization

  • delegate_task tool spawns isolated subagents that share parent's iteration budget (no runaway loops).
  • execute_code tool: agent writes Python that calls Hermes tools via RPC — collapses multi-step pipelines into a single inference call.

8. Seven terminal backends

  • All implement BaseEnvironment ABC under tools/environments/.
  • Backends: LocalEnvironment, DockerEnvironment, SSHEnvironment, SingularityEnvironment, ModalEnvironment, plus Daytona and Vercel Sandbox.
  • Selection driven by terminal.backend in ~/.hermes/config.yaml (or TERMINAL_ENV env var). _get_env_config() + _create_environment() in tools/terminal_tool.py.

9. Trajectory recording + Atropos RL

  • environments/hermes_base_env.pyHermesAgentBaseEnv extends Atropos BaseEnv.
  • Trajectory recording substrate for evaluation and RL training.

10. Provider/model routing

  • hermes_cli/runtime_provider.py resolves provider + model with cost/speed/quality preferences.
  • Supports Nous Portal, OpenRouter, OpenAI, NVIDIA NIM, MiniMax, Kimi, z.ai, Hugging Face, custom endpoints.

11. Plugin system

  • Four discovery sources: bundled (<repo>/plugins/), user (~/.hermes/plugins/), project (./.hermes/plugins/ if HERMES_ENABLE_PROJECT_PLUGINS=1), and pip entry points (hermes_agent.plugins).
  • PluginManager in hermes_cli/plugins.py runs discover_and_load(). Later sources override earlier on collision.
  • Plugins register via register(ctx):
    • Lifecycle hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end.
    • New tools: ctx.register_tool(...).
    • CLI subcommands: ctx.register_cli_command(...).
  • Specialized: MemoryProvider (only one active), context engines (only one active).

12. Other notable subsystems

  • Context compression: agent/context_compressor.py summarizes turns near limits.
  • Prompt caching: agent/prompt_caching.py applies Anthropic cache breakpoints.
  • ACP server: acp_adapter/ exposes Hermes as IDE-native agent over stdio/JSON-RPC for VS Code, Zed, JetBrains.
  • Voice mode + TTS as first-class.

What this repo (codegraff) has today

Subsystem Status File
Agent core / main loop yes crates/forge_app/src/agent_executor.rs, app.rs
Provider routing + DTO transforms yes (richer than Hermes) crates/forge_app/src/dto/{anthropic,google,openai}/
MCP host yes crates/forge_infra (paginated tools, structured results)
TUI yes crates/codegraff-tui/src/main.rs (~7900 LOC)
Slash command palette yes recent v0.1.5 work
Subagent w/ model override yes v0.1.5
Trajectory recording yes forge_app/src/trajectory_recorder.rs, forge_repo/src/trajectory/
/trace, /resume yes v0.1.5
Skills (read-only, agentskills.io layout) yes forge_domain/src/skill.rs, forge_services/src/tool_services/skill.rs, .forge/skills/
Context compression yes forge_app/src/compact.rs (931 LOC)
Prompt caching yes transforms in OpenAI/Anthropic DTO
N-API SDK yes (on release/0.1.53) sdk/typescript/
Headless mode yes forge_main/src/main.rs (-p, stdin, --conversation-id)
Conversation dump/load yes forge_main/src/cli.rs (conversation dump/show)

Gap analysis — Hermes vs codegraff today

Hermes subsystem codegraff status Where it gets built
Agent core
Skills (read-only)
Skill mutation API codegraff (Part A)
Skill self-improvement loop graff-agent (uses mutation API)
Autonomous skill creation nudges graff-agent (review prompt injection)
MEMORY.md / USER.md files graff-agent (memory manager sidecar)
FTS5 session search codegraff (Part A: virtual table + SDK)
Periodic review prompts (memory + skill) graff-agent (cron-driven prompt injection)
Honcho / dialectic user modeling graff-agent (memory provider plugin)
Pluggable memory provider trait codegraff (Part A)
Plugin system (lifecycle hooks, register_tool, register_cli) codegraff (Part A — minimal first)
Messaging gateway daemon graff-agent (per-platform adapters)
Per-platform adapters (Telegram, Discord, Slack, WhatsApp, Signal, Email) graff-agent
Session-key routing (agent:main:{platform}:{chat_type}:{chat_id}) graff-agent
Group-chat per-user sessions graff-agent
Streaming message edits per platform graff-agent
Voice memo transcription + auto-TTS graff-agent
Cron scheduler graff-agent
Pluggable execution environments (BaseEnvironment trait) codegraff (trait) + graff-agent (Docker/SSH/Modal/Daytona/Vercel adapters)
delegate_task parallelism partial (subagents exist) codegraff polish (already mostly here)
execute_code tool (Python RPC into agent tools) graff-agent (Node/TS sidecar)
ACP server (IDE integration) codegraff (separate roadmap)
Auxiliary LLM client (cheap model for side tasks) partial codegraff (formalize)
Trajectory subscription stream partial (write-only via recorder) codegraff (Part A — broadcast → SDK)
Pending-nudges queue codegraff (Part A — table + SDK)
Conversation export/import (JSONL transcripts) partial (JSON dump exists) codegraff (already close)

Two repos, clear separation

┌──────────────────────────────────────┐    ┌──────────────────────────────────────┐
│  codegraff  (this repo)              │    │  graff-agent  (NEW separate repo)    │
│                                      │    │                                      │
│  Rust workspace:                     │    │  TypeScript / Node monorepo:         │
│   - graff binary (TUI)               │    │   - depends on @codegraff/sdk        │
│   - @codegraff/sdk (N-API)           │──▶ │     from npm — no Rust toolchain     │
│   - SQLite + diesel + FTS5           │ npm│     required to develop              │
│   - MCP host                         │    │                                      │
│   - skill mutation API               │    │  Packages:                           │
│   - trajectory broadcast             │    │   - graff-gateway (per-platform)     │
│   - pending_nudges queue             │    │   - graff-cron                       │
│   - memory_provider trait            │    │   - graff-memd  (learning loop)      │
│   - plugin system (minimal)          │    │   - graff-honcho (memory provider)   │
│   - exec_environment trait           │    │   - graff-runtimes (Docker/SSH/...)  │
│                                      │    │   - graff-shared                     │
│  Stays focused on agent core +       │    │                                      │
│  the substrate everything plugs into.│    │  Ships its own CLI (`graff-agent`)   │
└──────────────────────────────────────┘    │  that orchestrates the daemons.      │
                                            └──────────────────────────────────────┘

Why split:

  • graff-agent developers don't need a Rust toolchain — npm install pulls prebuilt N-API binaries.
  • codegraff keeps shipping the core and SDK on its own cadence; graff-agent ships gateway/cron/memd independently.
  • A bug in the Telegram adapter never blocks a release of the core agent.
  • TS ecosystem (Bun, npm, platform SDKs that mostly target JS) without compromising the Rust monorepo.
  • Issues, releases, roadmaps stay scoped.

Part A — work that lands in codegraff (this repo)

These are the prerequisites that make graff-agent buildable. Each becomes a follow-up issue.

A.1 — SDK additions (sdk/typescript/src/lib.rs + wire.rs)

  • Trajectory subscription streamgraff.trajectory.subscribe(conversationId): AsyncIterable<TrajectoryEvent>. Tokio broadcast → N-API ThreadsafeFunction.
  • Skill mutation APIgraff.skills.create | update | patch | writeFile | delete | removeFile. Mirrors Hermes' skill_manage actions exactly.
  • Compact pipeline exposedgraff.conversations.compact(cid): Promise<Digest> reusing forge_app/src/compact.rs.
  • Pending-nudges queuegraff.nudges.enqueue(cid, message).
  • User-profile APIgraff.user.facts.{list, upsert, delete}.
  • Recall APIgraff.search.recall(query, opts): Promise<RecallHit[]> over the FTS5 table.
  • Context-engine + memory-provider hooks — register-time interface so a memory provider (e.g. graff-honcho) can inject system-prompt context every turn (onSystemPromptAssemble).
  • Lifecycle hooksonPreToolCall, onPostToolCall, onPreLLMCall, onPostLLMCall, onSessionStart, onSessionEnd. Sidecars subscribe; this is the minimal plugin system Part 1.

A.2 — Backing changes in forge_app / forge_repo

  • pending_nudges table + poll point in the conversation loop.
  • user_profile table.
  • FTS5 virtual table over trajectory_events (or a derived messages table) + diesel migration.
  • MemoryProvider trait in forge_domain + injection point in system-prompt assembly (mirrors Hermes' Layer-3 Honcho block).
  • ExecutionEnvironment trait + Local impl. (Docker/SSH/Modal/etc. land in graff-agent.)
  • Skill repository write path (SkillRepository::write, delete, patch). YAML frontmatter parsing for progressive disclosure.

A.3 — Release

  • Wire npm publish step in .github/workflows/sdk-typescript.yml (CI already builds the matrix; assemble job stops short of publish).
  • First published release: @codegraff/sdk@0.2.0.

A.4 — graff CLI surface (small)

  • graff conversation export <cid> → JSONL transcript (Hermes-style ~/.hermes/sessions/).
  • graff doctor for environment + provider sanity (Hermes parity).

Part B — graff-agent repo (new, future)

Layout once Part A ships:

graff-agent/
├── package.json                # pnpm workspaces root
├── packages/
│   ├── graff-shared/           # session keys, SQLite schema, types
│   ├── graff-memd/             # closed learning loop daemon
│   ├── graff-gateway/          # platform adapters
│   │   ├── src/platforms/
│   │   │   ├── base.ts         # PlatformAdapter abstract
│   │   │   ├── telegram.ts
│   │   │   ├── discord.ts
│   │   │   ├── slack.ts
│   │   │   ├── whatsapp.ts
│   │   │   ├── signal.ts
│   │   │   └── email.ts
│   │   └── src/runner.ts       # GatewayRunner equivalent
│   ├── graff-cron/             # schedules.toml + scheduler
│   ├── graff-honcho/           # memory provider plugin
│   ├── graff-runtimes/         # Docker, SSH, Modal, Daytona, Vercel
│   └── graff-agent/            # `graff-agent` CLI: start/stop/setup
├── docs/
└── examples/

B.1 — graff-memd (closed learning loop)

Mirrors Hermes' nudge + review system, file-by-file analog:

Hermes file graff-memd equivalent
_MEMORY_REVIEW_PROMPT in run_agent.py packages/graff-memd/src/prompts/memory-review.ts
_SKILL_REVIEW_PROMPT prompts/skill-review.ts
_COMBINED_REVIEW_PROMPT prompts/combined-review.ts
_summarize_background_review_actions digest/summarize.ts
agent/context_compressor.py reuse graff.conversations.compact() from SDK
agent/memory_manager.py memory/manager.ts — owns MEMORY.md, USER.md files in ~/.graff/

Loop: subscribe to trajectory events → on TaskComplete, run review prompt via runAgent({ model: cheapModel }) → if agent calls skill mutation tools, they hit the SDK API → on cron tick, inject pending_nudges for review prompts.

B.2 — graff-gateway

Direct port of Hermes' gateway/:

  • PlatformAdapter abstract (matches Hermes' BasePlatformAdapter).
  • Session key format identical: agent:main:{platform}:{chat_type}:{chat_id}.
  • Per-platform: webhook (preferred) + long-poll fallback. Streaming message edits where supported. Voice-in via Whisper.
  • SQLite at ~/.graff/gateway.db for (platform, chat_id) → conversation_id mapping; main conversation state stays in @codegraff/sdk's SQLite.
  • Order: Telegram → Discord → Slack → WhatsApp → Signal → Email.

B.3 — graff-cron

  • ~/.graff/cron/jobs.json (Hermes uses JSON not TOML — match for portability).
  • Multiple schedule formats (cron, every-N, at-time).
  • Jobs can attach skill: <name> + script: <path> + delivery: {platform, chat_id}.
  • Output captured to ~/.graff/cron/runs/<job_id>/<run_id>.jsonl.

B.4 — graff-honcho

  • Implements the codegraff MemoryProvider trait (added in Part A) via the SDK plugin hooks.
  • Stores: session summary, user representation, peer card, persistent conclusions (mirrors Hermes).
  • Tools registered via plugin: honcho_profile, honcho_search, honcho_context, honcho_reasoning, honcho_conclude.
  • Two-layer system-prompt injection (base + dialectic).

B.5 — graff-runtimes

  • TS interface ExecutionEnvironment (matches the codegraff Rust trait, talks to it via SDK).
  • Adapters: Docker, SSH, Singularity, Modal, Daytona, VercelSandbox. (Local stays in codegraff.)
  • Each adapter is its own subpackage so users only install what they need.

B.6 — graff-agent CLI

Mirrors Hermes' top-level CLI:

  • graff-agent (start interactive) — actually shells out to graff for the TUI.
  • graff-agent gateway start|stop|setup
  • graff-agent cron list|add|run|delete
  • graff-agent skills hub (community hub fetch)
  • graff-agent doctor
  • graff-agent setup (full wizard)

Roadmap

Phase 0 — codegraff Part A (this issue, broken into ~10 sub-issues)
Land all SDK + backing changes; cut @codegraff/sdk@0.2.0 to npm.

Phase 1 — graff-agent repo bootstrap
Empty repo, pnpm workspaces, CI, @codegraff/sdk@0.2.0 integrated, smoke test.

Phase 2 — graff-memd v0

  • FTS5 recall via SDK
  • Session-end digester (compact pipeline)
  • Skill self-improvement loop (review-gated)
  • Memory + skill review nudges
  • MEMORY.md / USER.md files

Phase 3 — graff-gateway Telegram only

  • Adapter contract + runner
  • Webhook + streaming edits + voice-in
  • Cross-platform session-key map

Phase 4 — graff-cron

Phase 5 — More platforms (Discord, Slack, WhatsApp, Signal, Email)

Phase 6 — graff-honcho memory provider

Phase 7 — graff-runtimes (Docker first, then SSH, then sandbox providers)

Phase 8 — execute_code tool, ACP server (parallel tracks once core lands)


Open questions (carried)

  • graff-memd language: TS via SDK (consistency, faster iteration) vs Rust (zero N-API overhead, direct DB). Default: TS. Rewrite later if profiling demands.
  • Skill format: stay 100% on agentskills.io spec, or extend with version + confidence for the self-improvement loop? Decide when skill mutation API lands.
  • Memory provider activation: only-one-active (Hermes' rule) vs stacked? Mirror Hermes for now.
  • Should the gateway's per-platform SQLite live in ~/.graff/ or share @codegraff/sdk's DB? Probably separate, joined by conversation_id.

Out of scope here

  • Copying Hermes code — graff-agent is an independent re-implementation atop graff.
  • RL trajectory generation / Atropos environments — recording substrate exists; training is its own workstream.
  • Building any graff-agent package in this issue — that work happens in the new repo. This issue tracks only Part A.

Sub-issues (filed)

Tracked here for execution order. All target release/0.1.53.

P0 — substrate (do first; everything else depends on these)

P1 — live data flow (graff-memd needs these to react in real time)

P2 — extensibility (gateway / cron / memd / honcho / runtimes plug in here)

P3 — release gate

P4 — CLI polish

Once all of P0–P3 land and @codegraff/sdk@0.2.0 is on npm, the new graff-agent repo is unblocked and Phase 1 begins there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions