Sprout × mesh-llm: in-process mesh node (serve/consume) + relay admission#798
Open
tlongwell-block wants to merge 30 commits into
Open
Sprout × mesh-llm: in-process mesh node (serve/consume) + relay admission#798tlongwell-block wants to merge 30 commits into
tlongwell-block wants to merge 30 commits into
Conversation
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…787) Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
* mari/mesh-relay-trust: Add relay-owned mesh status publication Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Adds the Settings > Compute panel against Max's frozen mesh-llm Tauri command contract. Builds against the typed shapes but does not yet wire to live commands — those land in Max's lane (mesh_availability, mesh_node_status, mesh_start_node, mesh_stop_node, mesh_installed_models, mesh_agent_preset). UI will resolve when his commands ship. - desktop/src/features/mesh-compute/types.ts: type mirrors of Max's frozen command surface, sourced from the 2026-05-29 freeze posts. - desktop/src/features/mesh-compute/api.ts: typed wrappers around invokeTauri for each command. mesh_search_models is reserved as a signature-only export for v2; calling it throws. - desktop/src/features/mesh-compute/classifyModelRef.ts + test: pure ref-classification matching mesh runtime/mod.rs:3390 (catalog / hf:// / local path). Drives the inline 'Looks like a …' hint. - hooks/useMeshAvailability.ts: 5s slow poll + focus refresh. - hooks/useMeshNodeStatus.ts: 750ms poll while transitioning, 4s otherwise — so lifecycle changes don't stall. - ui/MeshComputeSettingsCard.tsx: the rebuilt Share-compute surface. No raw mesh knobs (publish/auto/discovery), no kind:xxxx language, no endpoint id on the primary surface. Advanced is collapsed and carries Max VRAM + console URL. Footer states the architectural invariants (no public Nostr publish, no auto-discovery, no out-of-relay sharing) so a privacy-aware user trusts the toggle. - SettingsPanels.tsx: new 'compute' section after 'channel-templates' using the Cpu icon. Not yet implemented (queued): - The Create-Agent 'Relay mesh' flow that pre-selects sprout-agent and pre-fills env vars via mesh_agent_preset. Pending agreement with Max on flow-vs-picker shape. - Managed-agent row rendering for the typed LlmAuth (-32001) failure → 'Relay mesh denied this agent — check membership.' Verified: pnpm typecheck, pnpm lint, pnpm test (348/348), pnpm check. Commands will fail at runtime until Max's mesh_* commands land — that's expected. Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…etection Pure helpers + tests for the Create-Agent 'Run on relay mesh' flow Eva blessed. Not yet integrated with CreateAgentDialog — that integration lands when Max's mesh_agent_preset() command is on the integration tip and types compile against the real Tauri surface. Doing the helpers first keeps the dialog diff a one-liner-per-setter when it's time. - meshAgentPresetPatch(): turn a MeshAgentPreset into the flat field patch the dialog applies via Object.assign-style fan-out across acpCommand / agentCommand / agentArgs / mcpCommand / model / envVars setters. Returns owned copies so the caller cannot mutate the preset. - detectMeshPresetOverrides(): which user-set fields would the preset overwrite? Returns human-readable labels for the 'Using Relay mesh — overrides this persona's model' honest-over- silent copy Eva named as a requirement. Empty/null values are not treated as 'set' — a fresh draft is purely additive. Verified: pnpm typecheck, pnpm check, pnpm test (358/358 including 10 new applyMeshAgentPreset tests covering: patch field mapping, defensive copy, no-override on empty/matching draft, override reporting for model/runtime/env-vars, env-var same-value-no-report, additive env-var-no-report, empty-string treated like null). Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…mesh flow
Now that Max's foundation is on the integration tip, swap the scaffolded
type/api shims for the real `tauriMesh.ts` surface and ship the
Create-Agent flow Eva blessed.
- Delete `mesh-compute/types.ts` and `mesh-compute/api.ts`. All call
sites now import from `@/shared/api/tauriMesh` directly. `ModelRefKind`
(presentational) moves inline into `classifyModelRef.ts` where it's
used. Single source of truth.
- `MeshComputeSettingsCard` and the polling hooks now consume Max's
exact `MeshAvailability` / `MeshNodeStatus` / `MeshModelOption`
shapes. Card behavior unchanged.
- `RelayMeshAgentSection`: new component, the 'Run on relay mesh' flow
entry inside CreateAgentDialog. Renders as a rounded section with
a toggle + model dropdown. Greyed with mesh availability's
`reason` when `available === false`. On model pick, calls
`mesh_agent_preset(modelId)` and exposes both `modelId` and the
resolved `MeshAgentPresetPatch` to the parent so it can fan out
into existing setters. Renders the override warning
('Using Relay mesh overrides this agent's model') when
`detectMeshPresetOverrides` reports any clashing fields.
- `CreateAgentDialog`: adds `useMesh` + `meshModelId` state, hides the
backend 'Run on' select and the ACP runtime field when `useMesh`
is on, fans the preset out into the existing acpCommand /
agentCommand / agentArgs / mcpCommand / envVars setters via the
section's onModelIdChange callback, and includes `model:` in the
submit input. Relay-mesh always uses local backend (sprout-agent
+ OpenAI-compat env vars); `isProviderMode` is suppressed.
Submit guard blocks until a model is picked.
Verified locally:
- pnpm typecheck clean
- pnpm check clean (biome lint + format + file-size guard)
- pnpm test 358/358 (incl. 9 classifyModelRef + 10 applyMeshAgentPreset
tests; no new tests for the UI section — render testing for this
scope was disproportionate vs. typecheck + manual demo)
Queued for follow-up (out of this commit):
- ManagedAgentRow / agent-failure render: when `lastError` starts with
'Agent reported error: llm auth:' render 'Relay mesh denied this
agent — check your relay membership.' The seam is already shipped
in Max's commit (`-32001` + log-tail capture into `last_error`);
no UI currently renders `lastError` at all, so the friendly copy
is additive and can ship separately.
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…ference runbook) Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Closes the loop on Max's auth-failure seam. When a managed agent exits
with an HTTP 401/403 from the OpenAI-compatible mesh endpoint:
sprout-agent raises AgentError::LlmAuth (json_rpc_code -32001,
Display prefix 'llm auth: ...').
sprout-acp wraps it as 'Agent reported error: llm auth: ...'.
managed_agents/storage.rs recovers that line from read_log_tail and
writes it into ManagedAgent.lastError on nonzero exit.
ManagedAgentRow (new) reads lastError via friendlyAgentLastError,
promotes the auth-failure case to 'Relay mesh denied this
agent — check your relay membership.' rendered in
destructive color under the Status block. Generic
lastError content passes through verbatim so unrelated
failures still surface their text.
- desktop/src/features/agents/lib/friendlyAgentLastError.ts: pure
classifier returning { severity: 'denied' | 'generic'; copy } or
null. Matches both the sprout-acp wrap and the unwrapped
sprout-agent prefix; substring matches inside other messages are
NOT promoted (no lying about unrelated crashes).
- friendlyAgentLastError.test.mjs: 8 tests covering null/empty, both
matched prefixes, generic passthrough, whitespace trimming,
substring-not-at-start anti-promotion, and non-auth 'Agent reported
error: llm:' staying generic.
- ManagedAgentRow.tsx StatusBlock: renders the friendly copy when
non-null, with destructive coloring for denial and muted-foreground
for generic. Both row variants (expandable + non-expandable) thread
the same friendlyError through.
v1 limitation, explicitly named in the helper's doc comment: the typed
-32001 code from sprout-agent never reaches desktop structurally — ACP's
ObserverHandle is process-local in the child. The recovered string is
the seam we render against. Follow-up to make this fully structural is
ACP status file / desktop-owned observer sink.
Verified:
- pnpm typecheck clean
- pnpm check clean (lint + format + file-size, 528 files)
- pnpm test 366/366 (was 358; +8 new friendlyAgentLastError tests)
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
M1 proof that the mesh serve path loads a GGUF and generates over the local OpenAI endpoint, on real hardware. Single-node serve-and-self- consume (publish=false, auto_join=false, mDNS) — the one-box loopback. Verified: Qwen3.6-35B-A3B IQ4_XS loads into Metal on M4 Max, /v1/chat/completions returns 200 with coherent tokens. NOTE: sets console_ui(true) to work around a serve::start readiness deadlock in mesh bd16da4 (status-poll vs headless console bind); see docs/mesh-llm-local-build.md. Fold into the e2e #[ignore] matrix once the upstream readiness fix lands. Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…lapsible-if) Two -D warnings clippy errors under rustc 1.95 that would block CI: - storage.rs: moved meaningful_agent_error_from_log above #[cfg(test)] mod tests - mesh_llm/mod.rs: collapsed inner if into a match guard No logic change. Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
…esh crates, paste advisory) cargo-deny fails on the pinned mesh-llm dependency tree: - several crates use the Unlicense license (not previously in the allowlist) - six mesh-llm-* workspace crates omit a per-crate license field; the mesh repo is MIT OR Apache-2.0 (workspace Cargo.toml + Apache-2.0 LICENSE) — clarified - RUSTSEC-2024-0436: paste unmaintained, transitive via iroh → netlink advisories/bans/licenses/sources all pass locally after this. The clarify entries and paste ignore are removable once mesh sets per-crate license fields upstream. Signed-off-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
d7b4fe3 to
1903ea8
Compare
Resolves version-bump conflicts (CHANGELOG/package.json/tauri.conf/pubspec/ Cargo.toml → 0.3.5) and the lib.rs file-size budget (kept 780 to cover both main's SIGINT handlers and our mesh command registrations). Cargo.lock regenerated with our mesh deps on top of main. No mesh logic touched; the deleted meshClassifyModelRef wrapper stays deleted. cargo-deny green. Signed-off-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
- trust rows take MEMBER_NSEC/STRANGER_NSEC instead of Keys::generate() (a generated key's membership is undefined — can't assert "member sees status") - live_agent_completes_chat_over_mesh: real env-gated completion assertion via MESH_OPENAI_BASE; skips (not silent-passes) when no live endpoint - live_split_model_completes: panics "not implemented" so --ignored can never report it green without a real multi-node split harness Addresses Perci's e2e blocker: rows must not pass as tests while only eprintln.
… docs - iroh_relay: add verify_bearer_rejects_expired_timestamp covering the ±60s window (defense vs observed-token replay outside the live admit moment). - iroh_relay: doc-comment the ViaOwner deny arm — v1 explicitly rejects NIP-OA owner-delegated agents at iroh admission even when HTTP would admit them, keeping the mesh-compute trust boundary legibly tighter and pointing at the NIP-OA scope follow-up. - mesh_status_publisher: SproutMeshStatus doc — endpoint_addr and mesh_id are dial metadata, not access grants; iroh admission via NIP-98 → relay membership is the only gate. Addresses Eva's N3 punch-list item. The third bullet from the original proposal (ViaOwner-deny unit test) was scoped down to a doc-comment: asserting it requires a real AppState (db+redis+typesense via identity_archive::test_state), which fails the minimalness bar for a two-line pattern-match arm.
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
added 2 commits
May 29, 2026 20:02
- Resolve the mesh-llm rev from Cargo.lock instead of hardcoding `bd16da4` in two workflow files; a dependency bump no longer needs a lockstep CI edit. - Cache the prebuilt llama native libraries with actions/cache (restore/save, house style) keyed on the resolved rev + OS + backend, so the expensive build is skipped on a hit and invalidated automatically on a rev change. - docs: the cached CI build is shipped, not a "follow-up" — update to match. Addresses Perci N4 + Sami N5.
- find_progressish_reason matches a typed phase/status/state/stage field instead of stringify-and-grep over the whole payload, so an unrelated field mentioning "preparing" can't pin the health badge to degraded forever (Sami N1). - looks_like_model_ref drops the Qwen/Llama/GGUF name allowlist (missed Mistral/Phi/Gemma, false-positived on family substrings); a bare string is a ref only via hf:// scheme or .gguf ext — structured refs come through the typed model_id/modelRef/id path (Sami N2). - Tests in sibling mod_tests.rs (#[path]) to keep mod.rs under the 500-line budget; pin both fixes incl. the unrelated-field regression.
Comment on lines
+500
to
+513
| run: | | ||
| set -euo pipefail | ||
| cargo fetch --manifest-path desktop/src-tauri/Cargo.toml | ||
| SHORT='${{ steps.mesh_rev.outputs.short }}' | ||
| MESH_ROOT=$(find "${CARGO_HOME:-$HOME/.cargo}/git/checkouts" -path "*/$SHORT" -type d -name "$SHORT" | head -1) | ||
| if [[ -z "$MESH_ROOT" ]]; then | ||
| echo "::error::mesh-llm checkout for $SHORT not found after cargo fetch" | ||
| exit 1 | ||
| fi | ||
| export LLAMA_STAGE_BACKEND=metal | ||
| export LLAMA_STAGE_BUILD_DIR="$GITHUB_WORKSPACE/.cache/mesh-llama/build-stage-abi-metal" | ||
| export CMAKE_OSX_DEPLOYMENT_TARGET=10.15 | ||
| "$MESH_ROOT/scripts/prepare-llama.sh" pinned | ||
| "$MESH_ROOT/scripts/build-llama.sh" -DCMAKE_OSX_DEPLOYMENT_TARGET=10.15 |
Comment on lines
+95
to
+108
| run: | | ||
| set -euo pipefail | ||
| cargo fetch --manifest-path desktop/src-tauri/Cargo.toml | ||
| SHORT='${{ steps.mesh_rev.outputs.short }}' | ||
| MESH_ROOT=$(find "${CARGO_HOME:-$HOME/.cargo}/git/checkouts" -path "*/$SHORT" -type d -name "$SHORT" | head -1) | ||
| if [[ -z "$MESH_ROOT" ]]; then | ||
| echo "::error::mesh-llm checkout for $SHORT not found after cargo fetch" | ||
| exit 1 | ||
| fi | ||
| export LLAMA_STAGE_BACKEND=metal | ||
| export LLAMA_STAGE_BUILD_DIR="$GITHUB_WORKSPACE/.cache/mesh-llama/build-stage-abi-metal" | ||
| export CMAKE_OSX_DEPLOYMENT_TARGET=10.15 | ||
| "$MESH_ROOT/scripts/prepare-llama.sh" pinned | ||
| "$MESH_ROOT/scripts/build-llama.sh" -DCMAKE_OSX_DEPLOYMENT_TARGET=10.15 |
|
FWIW I think we may be able to do a better gate with mesh side admission control vs bespoke relay hosting (this means relays see nothing in the clear and have no bearing on security): Mesh-LLM/mesh-llm#589 just merged in ability for that (which may help). reason for relays: the closer they are to the devices connecting, the more likely their are for QUIC direct connections to establish for low latency |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sprout × mesh-llm — run an LLM on one machine, use it from another
Embeds an in-process mesh-llm node (SDK pinned to
bd16da4) in sprout-relay (rendezvous + admission) and Sprout desktop (serve + consume). Gated entirely by relay membership — no new auth protocol, no new crypto.What it does
kind:30621, relay-signed, relay-only) + gates admission via NIP-98 → relay membership.Proven on hardware
Qwen3.6-35B-A3B (IQ4_XS) loads into Metal on an M4 Max and serves real
/v1/chat/completions—200 OK,finish_reason=stop, coherent tokens — over the exactmesh_llm_sdk::servepath desktop uses. Seecrates/sprout-relay/examples/mesh_serve_smoke.rs(single-node loopback) and the#[ignore]acceptance matrix incrates/sprout-test-client/tests/e2e_mesh_llm.rs.CI automates the trust/denial assertions; live-inference + split ship as opt-in
#[ignore]tests with a runbook (CI can't host native multi-node inference yet).Security posture (honest, not airtight)
iroh_relay_urlfrom NIP-11 + a fresh NIP-98 bearer before starting mesh, and fails closed otherwise (e9ba42b9). This kills theeffective_relay_urls([])→ public-relay fallback.bd16da4still performs unconditional public STUN (Google/Cloudflare/STUNProtocol) on start and may inject the discovered public IP into invite tokens. No SDK knob to disable it yet — documented indocs/mesh-llm-local-build.md, upstream fix pending. We do not claim "no third-party infra" is airtight in v1.serve::startdeadlocks withconsole_ui(false)at this rev (status-poll vs headless bind); desktop setsconsole_ui(true)until upstream chore(deps): update dependency @types/react to v19.2.15 #736 fixes headless readiness.Notes
bd16da4) for reproducibility.