Skip to content

feat: read-only sandbox observability in chat — live browser + workspace file explorer#1890

Merged
larryro merged 24 commits into
mainfrom
feat/sandbox-chat-observability
Jun 16, 2026
Merged

feat: read-only sandbox observability in chat — live browser + workspace file explorer#1890
larryro merged 24 commits into
mainfrom
feat/sandbox-chat-observability

Conversation

@larryro

@larryro larryro commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds read-only observability for external-agent sandbox sessions to the chat page: a live view of the agent's browser and a read-only file explorer of its workspace — surfaced only on external-agent threads (which have no side panes today, so these are additive). Also lands the /user workspace-layout refactor that both build on.

The agent runs in a per-(org,user) sandbox session container and frequently drives a headless Chromium via Playwright MCP; until now the user could only read the text/tool timeline. This makes the agent's work visible without granting any control.

What's included

Part C — clean /user layout (a9c88113b): renames the session data root /workspace/user, separating the user working area (/user/workspace, the agent CWD) from I/O (uploads/, output/) and platform plumbing (/user/.runtime/). No migration — the session sandbox is unreleased; local data is wiped + the image rebuilt.

Part A — live browser view (ca4ab4492 + fixes): headed Chromium + Xvfb + x11vnc in the runtime image, streamed to a noVNC client over an authenticated WebSocket chain (runnerd raw tunnel → spawner relay → platform WS termination with cookie/org auth → Caddy). Read-only is guaranteed at the source (x11vnc -viewonly); the framebuffer never leaves loopback. Gated off by default behind the SANDBOX_BROWSER_VIEW operator flag. CJK/emoji fonts + --test-type infobar fix included.

Part B — workspace file explorer (cfea8b64b + fixes): read-only tree of /user/workspace via the existing runnerd /fs APIs, with a file viewer. Browse + view + download; stopped session → "resume to browse".

This session's polish:

  • Replay-401 fix (08b2faa43): the spawner's HMAC had no nonce, so deterministic empty-body GETs (e.g. sessionIsAlive) collided in the replay cache. Added a per-request nonce (backward-compatible: legacy nonce-less requests still verify).
  • Composer declutter + gating (8f7da945b, 4de310b3f): removed the redundant Workspace-files/Live-browser pills (desktop uses right-edge strips, mobile uses the + menu); centralized the show/hide gate in one useSandboxPanesAvailable hook, fixing a previously-ungated strip that showed on non-sandbox threads.
  • Viewer UX (4898e6837): Download is always available for any file; Preview is an explicit on-demand action; Markdown renders user-friendly (with a Source toggle).
  • Mobile (d9e2b386f, b6b9d1024, 9197db274): viewport-gated panes (no desktop backdrop bleed), +-menu entries, no header-button overlap in the Sheet, left-tree/right-viewer layout.

Security

  • Read-only is enforced at the VNC source (-viewonly), not just in the client.
  • Cross-org viewing rejected before the WS upgrade (session→org ∈ cookie orgs).
  • Framebuffer stays on loopback; session containers publish no ports; runnerd is the sole reader.
  • The new HMAC nonce keeps replay protection intact (a verbatim replay still collides in the cache).

Verification

  • Live E2E driven through a real browser (desktop + iPhone CDP device emulation): live VNC stream of the agent browsing, read-only confirmed; file tree + preview + download; markdown rendered; stopped-session resume state; mobile +-menu → Sheet.
  • Ran an adversarial multi-dimension review of the diff (nonce security / gating regressions / viewer correctness / completeness) with each finding independently verified — gating clean; the doc/comment + orphaned-i18n-key findings it confirmed are fixed in this PR.

Pre-PR checklist

  • Ran bun run check — green except one pre-existing known-flaky test, convex/integrations/slack/http_actions.test.ts (convexTest doesn't register components under turbo parallel load; passes 8/8 in isolation, unrelated to this PR). All TS lint/typecheck and 72,500+ tests otherwise pass.
  • N/A — no hand-rolled skeletons added.
  • Updated services/platform/messages/{en,de,fr}.json (new viewer keys; de-CH inherits de).
  • N/A — Docs: the parent sandbox-session / external-agent chat feature is unreleased and entirely undocumented (no docs/ pages exist for it); these are observability additions to it, so there is no doc page to extend. Docs will land with the feature's documentation pass.
  • N/A — no docs pages touched.
  • N/A — @tale/docs unaffected.
  • N/A — README unaffected.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added live browser streaming view for external agent sessions—watch read-only mirroring of the agent's browser in real-time.
    • Added workspace files explorer pane to browse and preview files from the agent's workspace during chat sessions.
    • Added mobile support for workspace files and live browser features via modal dialogs.
    • Integrated sandbox observability panes into composer for external-agent threads, with toggle controls in the mode menu.

larryro added 13 commits June 15, 2026 23:07
…ing under .runtime

Separate the clean user working area from platform plumbing so it can be
surfaced read-only in chat (workspace file explorer, follow-up). The data
root /workspace becomes /user:

  /user/workspace        agent CWD + repo clone target (was /workspace/repo)
  /user/uploads,output   user I/O
  /user/.runtime/{home,deps,tale}  HOME, pip/npm targets, steer/wrappers/skills
  TMPDIR -> /tmp (off the persistent volume)

WORKSPACE_ROOT, the entrypoint skeleton+env (both daemon and one-shot modes),
run_agent workdir default, staging dests, steer/skills paths, the k8s backend,
docker-session-args bind target + volume tmpfs, and the adapter
CLAUDE_CONFIG_DIR/TALE_STEER_DIR all move accordingly. Per-org pip/npm/bun
caches stay at /cache/* (independent of HOME). No data migration — the session
sandbox feature is unreleased.

sandbox 283 pass, sandbox-runtime/daemon 49 pass, agent_adapters 30 pass; tsc clean.
…l-agent sandboxes

Adds a 'Workspace files' side pane (external-agent threads only) that browses
the live session workspace under /user, read-only, with file preview + download.

Backend:
- sandbox/workspace_files.ts: resolveBrowsableSession + resolveBrowsableSessionForUser
  internalQueries — the single canAccessThread auth boundary (cross-org access
  throws), shared by both the list action and the download route, and reusable by
  the live-browser feature. Owner/status resolution mirrors getThreadSandboxState.
- node_only/sandbox/workspace_files.ts: listWorkspaceDir public action (server-side
  filtering: drops node_modules/.git, hides dotfiles unless showHidden, dirs-first
  sort, 1000-entry cap with truncated flag) + readWorkspaceFileBytes internalAction.
- http.ts: GET /api/sandbox/workspace_file — cookie-authed, org-scoped, content +
  download (Content-Disposition), 401/403/404/409/429.
- session_client.ts: sessionReadFile helper; rate_limiter: security:workspace-file.

Frontend:
- workspace-files-pane.tsx (lazy file tree: one listWorkspaceDir per directory
  expand; file viewer via the content route — Shiki for text, object-URL images,
  download-only for binaries/too-large; show-hidden + refresh; resume-to-browse
  empty state), workspace-files-context.tsx, workspace-files-toggle.tsx (composer
  pill, gated to external-agent + a session). Mobile via Sheet.
- i18n: chat.workspaceFiles.* in en/de/fr (de-CH falls back to de).

Read-only (no edit/delete). Reads the live container FS, distinct from the
curated threadFiles. tsc + oxlint clean; resolver unit tests 7/7.
…in chat

Adds a 'Live browser' side pane (external-agent threads only) that streams the
agent's headed Chromium read-only over VNC-through-WebSocket. OFF by default
(operator flag SANDBOX_BROWSER_VIEW on BOTH the spawner and platform).

End-to-end chain (raw RFB; one browser-facing WS termination at the platform):
  Xvfb :99 <- headed Chromium (CDP 127.0.0.1:9222, MCP attaches via --cdp-endpoint)
  x11vnc 127.0.0.1:5900 -viewonly (airtight read-only; agent drives via CDP, not X)
  -> runnerd GET /screencast (raw HTTP/1.1 Upgrade tunnel, x-tale-runnerd-token; zero-dep)
  -> spawner GET /v1/sessions/:id/screencast (Bun WS server, HMAC; WS<->raw-tunnel bridge)
  -> platform GET /screencast/:threadId (Bun WS, cookie->org via screencast-auth httpAction
     reusing resolveBrowsableSessionForUser; WS<->WS relay)
  -> browser noVNC RFB(viewOnly) <canvas>

Image/runtime (A1): Dockerfile adds xvfb/x11vnc/fonts; entrypoint start_browser_stack
(supervised Xvfb + x11vnc -localhost -viewonly + headed Chromium, fail-open readiness
poll) gated on TALE_BROWSER_CDP=1; adapter swaps Playwright MCP to --cdp-endpoint when
spec.browserCdp; tale-playwright-mcp shim skips proxy flags in CDP mode; config.browserView
-> --env TALE_BROWSER_CDP; run_external_agent sets browserCdp from SANDBOX_BROWSER_VIEW.

runnerd (A2): server.on('upgrade') /screencast raw byte tunnel to 127.0.0.1:5900,
activeScreencasts counter in /healthz, single-teardown, 502 when x11vnc down.

spawner (A3): Bun WS server, HMAC-before-upgrade, Bun.connect raw upgrade to runnerd,
101 parse + binary relay + bufferedAmount backpressure; sweepExpired exemption when
activeScreencasts>0 (a watched session isn't reaped).

platform (A4): /api/sandbox/screencast-auth httpAction (the auth oracle) + server.ts
wrapper-fetch WS termination + Bun WS client to spawner (HMAC headers) + relay;
Caddy /screencast/* route + encode exclusion.

frontend (A5): live-browser-pane (noVNC RFB, viewOnly + pointer-events-none + View-only
badge, connecting/streaming/disconnected/error/empty states), live-browser-toggle,
live-browser-context (mutually exclusive with the workspace-files pane); @novnc/novnc dep;
i18n chat.liveBrowser.* en/de/fr.

sandbox 302, daemon 57, agent_adapters 32 pass; platform tsc + oxlint + i18n tests clean.
…rate-limit bucket, dev mirror

Found + fixed while running the read-only sandbox-observability feature
end-to-end in a browser:

- Desktop modal-backdrop bug: the Workspace-files + Live-browser panes opened
  BOTH the desktop docked pane AND the mobile Sheet on desktop. A Sheet's
  md:hidden only hides its content via CSS — Radix still portals the Dialog
  backdrop, which covered the screen + intercepted clicks. Add a useIsMobile
  hook and mutually gate the docked panes (desktop) vs the Sheets (mobile).
- /api/sandbox/screencast-auth referenced rate-limit bucket
  'security:screencast-auth' that was never defined → 500 on every live-browser
  auth probe (tsc/lint can't catch a missing runtime config key). Define it.
- serve-screencast Vite dev plugin: bun dev runs Vite, not server.ts, so the
  prod /screencast WS isn't served in dev (like serve-webdav/watch-examples
  mirror their server.ts routes). Mirror it, reusing screencast-relay.ts's
  HMAC helpers so the contract can't drift. + @types/ws devdep.

Live E2E verified (fresh sandbox): clean /user tree, lazy expand, file viewer;
live VNC stream of the agent's headed Chromium navigating example.com,
read-only.
…ox infobar

The live browser view rendered CJK text as tofu boxes (e.g. google.com.hk's
Chinese UI) because the runtime image only shipped Latin fonts. Add
fonts-noto-cjk + fonts-noto-color-emoji so real-world / non-Latin pages render
correctly in the VNC stream.

Also pass the headed Chromium --test-type + --disable-infobars +
--no-first-run/--no-default-browser-check so the 'unsupported command-line
flag: --no-sandbox' infobar (and first-run prompts) don't eat the top of every
streamed page.

Verified live: agent opens google.com.hk, types a search, reads results — the
stream shows readable Chinese (no tofu) and no infobar. Image +~190 MB (CJK).
The 'Workspace files' panel rooted at the /user data root, so it listed the
sibling uploads/ and output/ I/O dirs alongside the working tree. Root it at
/user/workspace instead — the panel now shows only the agent's working files
(cloned repos, created files), matching the 'Workspace files' label. uploads/
and output/ are platform I/O staging, not workspace contents.

Verified live: panel shows the agent's hello.txt directly, no uploads/output.
… explorer

When a session's backend is evicted (idle/crash/reconcile lag) the Convex row
can still read 'active' for a short window — a phantom. The explorer then hit
the spawner, got a 404 (no registry entry), and rendered it as an empty folder
(listing) or 'missing or too large' (file read) — misleading, since the file
exists and the sandbox is just not running.

listWorkspaceDir + readWorkspaceFileBytes now probe sessionIsAlive when the
spawner 404s: a dead backend surfaces as sessionRunning:false (list) / a 409
session_not_running (read) → the existing 'resume to browse' UI — while a live
session still distinguishes a genuinely missing/oversized file (404).

Verified: with a live session the read returns 200; the happy path is
unchanged. (The session loss in testing was self-inflicted — deleting the
spawner's .spawner.lock under a running spawner triggers a restart loop that
evicts sessions; resolved by a clean restart.)
…ht viewer

The explorer stacked the tree above the file viewer (top/bottom). Switch to the
conventional file-explorer layout — tree on the left, viewer on the right — on
desktop (`md:flex-row`, tree a 1/3 / 160–280px left sidebar with a right
border), while keeping the stacked layout under `md` where a narrow screen has
no room for two columns. Bump the default pane width 480→560 so both columns
breathe.

Verified live: tree (tale/, hello.txt) on the left, 'hi from sandbox' rendered
in the viewer on the right.
…obile

The Workspace-files / Live-browser composer pills sit at x>=545 in a 390px
viewport (verified under iPhone device emulation), so they're unreachable
on mobile. Surface both toggles in the always-reachable "+" composer menu
under a "Sandbox" section, gated the same as the pills (external-agent
thread + an existing session).

Fix the mobile Sheet header overlap: the pane body rendered its Show-hidden
/ Refresh actions in the header row, but the Sheet also drew its own absolute
top-right close button on top of Refresh. Render the pane's own close button
in every variant (it was previously desktop-only) and pass `hideClose` to
both mobile Sheets so the close lives in the header row with the other
actions instead of colliding (ESC / overlay-tap still dismiss). The now-dead
`embedded` body prop is removed.
…lse-positives

The spawner's replay protection keys its short-TTL cache on the request
signature, but the signature was fully deterministic for a given
(method, path, ms-timestamp, body). An empty-body GET to a fixed path —
notably the `sessionIsAlive` liveness probe — therefore signs the SAME
string when two probes land in the same millisecond, so the second was
rejected as `replay` (a spurious 401). This surfaced as the workspace
file explorer logging "sandbox session get failed (401)" on a stopped
session, and is a latent race on the phantom-heal path generally.

Bind a fresh per-request nonce into the signed string
(`METHOD\npath\nts\nnonce\nsha256(body)`), sent as `x-tale-sandbox-nonce`.
Distinct requests now get distinct signatures (no false collision) while a
verbatim replay of a captured request still hits the cache. The nonce is
OPTIONAL in `verify` — when the header is absent the legacy nonce-less
string is verified — so a not-yet-updated client (e.g. spawner_client.ts)
or a rolling-deploy window keeps authenticating.

Updates both platform signers (session_client, screencast-relay) and the
spawner verifier + authorize(), with tests for the nonce, the verbatim
replay, and the legacy fallback.
… via one hook

The Workspace-files / Live-browser composer pills duplicated the open
affordance that already exists — the right-edge collapse strips on desktop
and the `+`-menu entries on mobile — and crowded the composer row (the
pills also overflow off-screen under `md`). Remove both pills and their
components.

Centralize the gate in a new `useSandboxPanesAvailable` hook (external-agent
agent + a thread sandbox session, any lifecycle). chat.tsx now mounts the
desktop panes and the mobile Sheets only when it returns true, and the
`+`-menu uses the same hook — so a normal chat thread never shows a stray
strip or a portalled backdrop. The previously-ungated desktop strip is now
gated correctly.
…ew, rendered markdown

Restructure the file viewer around a persistent toolbar:
- Download is ALWAYS available for any selected file (any size/type), not
  only on the binary/too-large card as before. The icon sits inside the <a>
  (not Button's `icon` prop) so `asChild` no longer forwards className onto a
  Fragment (React dev warning).
- Preview is an explicit on-demand action (no auto-fetch on select) with a
  proper AbortController + object-URL lifecycle; the toolbar button hides once
  content is shown so it can't collide with a renderer's own toggle.
- Markdown (md/mdx/markdown) opens rendered/user-friendly via
  RenderableFileViewer's Preview mode (gains an optional `defaultMode`; with a
  Source toggle to view raw); everything else stays syntax-highlighted source.

Also split the misleading reuse of `workspaceFiles.emptyDir` ("This folder is
empty.") — the no-file-selected placeholder now uses a dedicated
`selectFile` string. New i18n keys (preview, previewHint, previewUnavailable,
selectFile) added to en/de/fr (de-CH inherits de).
…he composer pills

Deleting the Workspace-files / Live-browser composer pills left their
`workspaceFiles.toggleTooltip` and `liveBrowser.toggleTooltip` strings
unreferenced, which fails the "every key in base files is referenced by
source" i18n usage test. Remove them from en/de/fr (toggleLabel stays —
the pane strips + `+`-menu still use it; de-CH never had these).
deps.touch();
// pipe() applies backpressure (pause the source when the dest buffer fills,
// resume on drain) on both legs.
socket.pipe(vnc);
@@ -0,0 +1,158 @@
'use node';
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds read-only sandbox observability for external-agent chats: a workspace files pane and a live browser pane backed by new authenticated HTTP and WebSocket flows. It introduces CDP-based browser attachment for agent execution, adds nonce-aware request signing for screencast/auth requests, and wires runnerd and platform relays for browser streaming. It also migrates sandbox runtime paths and related contracts from /workspace to /user, updates session/container startup accordingly, and refreshes tests, docs, and translations for the new behavior.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

  • tale-project/tale#1841 — Both PRs modify the sandbox Kubernetes pod/backend path and workspace-mount code paths.
  • tale-project/tale#1880 — Both PRs change Claude Code adapter buildExec and related adapter typing/configuration.
  • tale-project/tale#1872 — Both PRs update the agent-adapter/session-exec path used when building persistent agent runs.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-chat-observability

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/platform/app/features/chat/components/composer-mode-menu.tsx`:
- Around line 241-252: Remove the hardcoded English defaultValue fallback
strings from the tChat function calls in the menu label definitions. For the
workspace files menu item with the tChat('workspaceFiles.toggleLabel', ...) call
and the live browser menu item with the tChat('liveBrowser.toggleLabel', ...)
call, delete the defaultValue property and its string value from each call,
leaving only the translation key as the function parameter. This ensures
translations are enforced and no English fallbacks can surface to end users.

In `@services/platform/app/features/workspace/components/live-browser-pane.tsx`:
- Around line 391-400: The resize separator elements in both pane components are
mouse-only and lack keyboard accessibility. In
services/platform/app/features/workspace/components/live-browser-pane.tsx at
lines 391-400, add tabIndex={0} to make the separator focusable, implement
onKeyDown handler to support arrow-key resize operations (tracking width
changes), and add aria-valuemin, aria-valuemax, and aria-valuenow ARIA
attributes reflecting the current pane width state. Apply identical keyboard and
ARIA value attribute changes to the separator in
services/platform/app/features/workspace/components/workspace-files-pane.tsx at
lines 1133-1142 to maintain parity between components and ensure keyboard users
can both focus and operate both resize controls.
- Around line 75-130: The RFB lifecycle effect is not properly guarded by the
sessionActive flag and does not include it in the dependency list, which allows
WebSocket connections to remain open when sessionActive becomes false. Add an
early return guard at the beginning of the useEffect (after the containerRef and
window checks) that returns if sessionActive is false, and include sessionActive
in the dependency array at the end alongside threadId and connectNonce. This
ensures the effect properly respects the session state and cleanup runs when
sessionActive changes.

In
`@services/platform/app/features/workspace/components/workspace-files-pane.tsx`:
- Around line 570-607: The TreeRowButton function sets tabIndex to 0 only when
isActive is true, which causes all treeitem buttons to have tabIndex=-1 on
initial load when nothing is active, breaking keyboard navigation. Modify the
tabIndex assignment logic to ensure the first tree item (or a designated default
item) is always keyboard-reachable with tabIndex=0, while inactive items retain
tabIndex=-1. This requires passing additional context to the TreeRowButton
component to identify which item should be initially focusable (e.g., an
isFirstItem prop or similar indicator), then updating the tabIndex assignment to
use that context alongside the isActive state.
- Around line 964-999: The Refresh button at line 964 becomes ineffective once
the session stops because the condition at line 997 causes WorkspaceFileTree to
unmount when sessionRunning is false, meaning any refreshNonce change has no
component to consume it. Disable the Refresh Button by adding a disabled
attribute that evaluates to true when sessionRunning is false, preventing users
from attempting a refresh action that cannot succeed while the WorkspaceFileTree
is unmounted.

In `@services/platform/convex/agents/external_agent/run_external_agent.ts`:
- Line 140: The BROWSER_VIEW_ENABLED constant uses a strict string equality
check for the SANDBOX_BROWSER_VIEW environment variable, accepting only the
literal value '1', while the sandbox service uses boolEnvOpt() which accepts
multiple boolean-like values ('true', '1', 'yes', 'on') in a case-insensitive
manner. This causes a mismatch where the platform layer and sandbox layer
interpret the same environment variable differently. Update the
BROWSER_VIEW_ENABLED assignment to use the boolEnvOpt() function (the same
function used in the sandbox service) or implement matching logic that accepts
the same range of boolean-like values to ensure consistent behavior across both
layers.

In `@services/platform/convex/http.ts`:
- Around line 576-588: The catch blocks at both affected sites in
services/platform/convex/http.ts are catching all errors and returning 403,
which masks genuine operational failures. At the anchor site (lines 576-588) and
the sibling site (lines 714-726), replace the generic catch clause with a
specific check for UnauthorizedError, returning 403 only for that error type,
while allowing other unexpected errors to propagate naturally so they are
properly handled as 500 errors by the framework. This matches the established
pattern used elsewhere in the codebase where UnauthorizedError is explicitly
distinguished from other exceptions.

In `@services/platform/convex/node_only/sandbox/run_agent.ts`:
- Around line 104-110: The `browserCdp` field was added to
`RunAgentInSessionArgs` and is being forwarded to `buildExec`, but the `args`
validator schema in `runAgentInSession` does not include this field, preventing
it from being passed through the action wrapper. Add `browserCdp` as an optional
boolean field to the `args` validator in `runAgentInSession` to match the field
definition in `RunAgentInSessionArgs` and enable the field to flow through to
`buildExec` when calls are made through the action wrapper.

In `@services/platform/convex/node_only/sandbox/workspace_files.ts`:
- Around line 65-70: The current path validation using simple string checks for
`..` and NUL bytes does not enforce the `/user/workspace` boundary properly. At
services/platform/convex/node_only/sandbox/workspace_files.ts lines 65-70 in the
function containing the `sessionListFiles` call, normalize `args.path` relative
to `WORKSPACE_ROOT` and reject any resulting path that falls outside
`/user/workspace`. At
services/platform/convex/node_only/sandbox/workspace_files.ts lines 144-145
before the `sessionReadFile` call, apply the same normalization and boundary
check to `args.path`. At services/platform/convex/http.ts lines 523-530 before
passing the `path` query parameter to the internal action, normalize it and
verify it does not escape `/user/workspace`. Use a canonical path resolution
function (not string prefix matching) to properly detect and block directory
traversal and ensure all entry points enforce the workspace boundary
consistently.
- Around line 148-156: Remove the `as const` assertions in the return statements
where status is set to 'ok', 'missing', and 'gone'. Create const variable
assignments before the return statements (such as const okStatus = 'ok', const
missingStatus = 'missing', const goneStatus = 'gone') and use these variables in
place of the inline string literals with `as const`. TypeScript will
automatically infer the narrow literal types from const assignments without
requiring the `as const` assertions.

In `@services/platform/convex/sandbox/workspace_files.test.ts`:
- Around line 62-63: The test file contains two separate issues: First, the eq
function on lines 62-63 uses unnecessary type casts (as string) because it
accepts field as a generic string. Refactor the eq function to use a
discriminated union type for the field parameter, accepting either 'ownerType'
or 'ownerId' as separate overloads or a union type, so that the value parameter
is properly inferred as string without casting. Second, at lines 80-87, the cast
as unknown as QueryHandler is used because the mock of internalQuery returns the
config object directly instead of wrapping it in a QueryHandler like the actual
framework does. Add a one-line comment before the forUser variable assignment
explaining this framework-generated gap: that Convex generated functions wrap
internalQuery results while this mock returns the config directly.

In `@services/platform/lib/screencast-relay.ts`:
- Around line 152-171: The JSDoc comment for the `toArrayBuffer` function
overstates the guarantee by claiming it always returns "a fresh, non-shared
`ArrayBuffer`" when in fact line 159 returns the input `ArrayBuffer` directly
without copying. Update the JSDoc comment to clarify that the function returns
fresh copies only for ArrayBufferView and string inputs, but returns the input
`ArrayBuffer` unchanged when a plain `ArrayBuffer` is passed in. This correction
maintains the actual behavior while making the documentation accurate.

In `@services/platform/vite-plugins/serve-screencast.ts`:
- Line 81: The `decodeURIComponent` call at the threadId assignment can throw a
synchronous exception if the percent-encoded string is malformed, causing the
upgrade handler to crash. Wrap the `decodeURIComponent(match[1])` call in a
try-catch block to handle malformed paths gracefully. In the catch block, reject
the request (returning an appropriate error response) instead of allowing the
exception to propagate and crash the handler.
- Around line 56-58: Remove the type assertion with `as { sessionId?: string }`
from the res.json() call in the screencast-auth response handling. Instead,
implement runtime validation to check the structure of the response body before
extracting the sessionId. Validate that the body is an object and that
sessionId, if present, is a string. Return null if validation fails or the
sessionId is invalid. This ensures that malformed responses cannot propagate
invalid sessionIds into relay path construction at this system boundary.
- Around line 109-137: The pending array in the browser.on('message') event
handler grows without limit while the spawner socket is not open, which can
cause unbounded memory growth if a fast sender sends many messages before the
spawner opens. Add a check before pushing data to the pending array in the
browser.on('message') handler to cap the maximum size of pending messages. If
the pending array reaches a configured size limit, either drop the message,
close the browser connection, or log a warning to prevent memory exhaustion from
a runaway sender.

In `@services/sandbox-runtime/entrypoint.sh`:
- Around line 352-363: The curl command in the readiness probe loop that checks
the CDP endpoint at http://127.0.0.1:9222/json/version lacks a timeout
parameter. If the endpoint accepts the connection but stalls without responding,
the script can block indefinitely at this curl call. Add a timeout flag (such as
--max-time or --connect-timeout) to the curl invocation to ensure the probe
fails fast and continues to the next iteration or timeout path, maintaining the
intended fail-open behavior.

In `@services/sandbox/src/volume.ts`:
- Around line 5-6: The comment at lines 5-6 in volume.ts incorrectly states that
the container workspace uses `--tmpfs /user`, but the runtime has been updated
to bind-mount `/user` via `type=bind` (as implemented in docker-args.ts). Update
this comment to accurately reflect that the workspace is now bind-mounted rather
than using tmpfs, so that developers understand the current mounting strategy
and avoid future confusion when reviewing or debugging workspace volume
handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c513cbb-c2e8-4fd1-9fd9-92a22e8f86ca

📥 Commits

Reviewing files that changed from the base of the PR and between 678661b and 4de310b.

⛔ Files ignored due to path filters (2)
  • bun.lock is excluded by !**/*.lock
  • services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**
📒 Files selected for processing (88)
  • packages/agent_adapters/src/build-exec.test.ts
  • packages/agent_adapters/src/claude_code/adapter.ts
  • packages/agent_adapters/src/types.ts
  • services/platform/app/features/chat/components/chat-input.tsx
  • services/platform/app/features/chat/components/composer-mode-menu.tsx
  • services/platform/app/features/chat/hooks/use-sandbox-panes.ts
  • services/platform/app/features/workspace/components/live-browser-context.tsx
  • services/platform/app/features/workspace/components/live-browser-pane.tsx
  • services/platform/app/features/workspace/components/workspace-files-context.tsx
  • services/platform/app/features/workspace/components/workspace-files-pane.tsx
  • services/platform/app/features/workspace/viewers/renderable-file-viewer.tsx
  • services/platform/app/hooks/use-is-mobile.ts
  • services/platform/app/routes/dashboard/$id/chat.tsx
  • services/platform/convex/agents/external_agent/run_external_agent.ts
  • services/platform/convex/http.ts
  • services/platform/convex/lib/rate_limiter/index.ts
  • services/platform/convex/node_only/sandbox/helpers/session_client.ts
  • services/platform/convex/node_only/sandbox/helpers/spawner_client.ts
  • services/platform/convex/node_only/sandbox/integration_skills.ts
  • services/platform/convex/node_only/sandbox/internal_actions.ts
  • services/platform/convex/node_only/sandbox/run_agent.ts
  • services/platform/convex/node_only/sandbox/steer_delivery.ts
  • services/platform/convex/node_only/sandbox/steer_files.test.ts
  • services/platform/convex/node_only/sandbox/steer_files.ts
  • services/platform/convex/node_only/sandbox/workspace_files.ts
  • services/platform/convex/sandbox/wire.ts
  • services/platform/convex/sandbox/workspace_files.test.ts
  • services/platform/convex/sandbox/workspace_files.ts
  • services/platform/lib/screencast-relay.test.ts
  • services/platform/lib/screencast-relay.ts
  • services/platform/messages/de.json
  • services/platform/messages/en.json
  • services/platform/messages/fr.json
  • services/platform/package.json
  • services/platform/server.test.ts
  • services/platform/server.ts
  • services/platform/types/novnc.d.ts
  • services/platform/vite-plugins/serve-screencast.ts
  • services/platform/vite.config.ts
  • services/proxy/Caddyfile
  • services/sandbox-runtime/Dockerfile
  • services/sandbox-runtime/daemon/src/exec-manager.ts
  • services/sandbox-runtime/daemon/src/file-ops.test.ts
  • services/sandbox-runtime/daemon/src/main.ts
  • services/sandbox-runtime/daemon/src/protocol.ts
  • services/sandbox-runtime/daemon/src/screencast-tunnel.test.ts
  • services/sandbox-runtime/daemon/src/screencast-tunnel.ts
  • services/sandbox-runtime/daemon/src/steer-hook.test.ts
  • services/sandbox-runtime/docker-entrypoint.sh
  • services/sandbox-runtime/entrypoint.sh
  • services/sandbox-runtime/tale-playwright-mcp
  • services/sandbox/docs/kubernetes.md
  • services/sandbox/docs/sessions.md
  • services/sandbox/src/auth.test.ts
  • services/sandbox/src/auth.ts
  • services/sandbox/src/backend/kubernetes/exec-spec.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-backend.ts
  • services/sandbox/src/backend/kubernetes/k8s-harvest.ts
  • services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts
  • services/sandbox/src/backend/kubernetes/k8s-protocol.ts
  • services/sandbox/src/backend/kubernetes/k8s-session-backend.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-session-backend.ts
  • services/sandbox/src/backend/kubernetes/k8s-session-pod-spec.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-session-pod-spec.ts
  • services/sandbox/src/backend/kubernetes/k8s-stage.ts
  • services/sandbox/src/backend/local-workspace-run.ts
  • services/sandbox/src/config.ts
  • services/sandbox/src/docker-args.test.ts
  • services/sandbox/src/docker-args.ts
  • services/sandbox/src/exec-common.ts
  • services/sandbox/src/exec-response.ts
  • services/sandbox/src/server.test.ts
  • services/sandbox/src/server.ts
  • services/sandbox/src/session/docker-session-args.test.ts
  • services/sandbox/src/session/docker-session-args.ts
  • services/sandbox/src/session/runnerd-protocol.ts
  • services/sandbox/src/session/screencast-relay.test.ts
  • services/sandbox/src/session/screencast-relay.ts
  • services/sandbox/src/session/session-routes.test.ts
  • services/sandbox/src/session/session-routes.ts
  • services/sandbox/src/spawn-prior-outputs.test.ts
  • services/sandbox/src/spawn-staging.test.ts
  • services/sandbox/src/types.ts
  • services/sandbox/src/validate-request.ts
  • services/sandbox/src/volume.ts
  • services/sandbox/src/wire.ts

Comment on lines +241 to +252
label: tChat('workspaceFiles.toggleLabel', {
defaultValue: 'Workspace files',
}),
icon: FolderOpen,
selected: files.isOpen,
onClick: () => files.toggle(),
},
{
type: 'item',
label: tChat('liveBrowser.toggleLabel', {
defaultValue: 'Live browser',
}),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove hardcoded English fallback strings from localized menu labels.

Line 241 and Line 250 add English defaultValue literals that can surface to end users and bypass locale enforcement. Use translation keys only here.

As per coding guidelines, "Never hardcode user-facing strings in JSX or code. Always use the translation hook useT(namespace) from services/platform/lib/i18n/client.tsx."

Suggested fix
-          label: tChat('workspaceFiles.toggleLabel', {
-            defaultValue: 'Workspace files',
-          }),
+          label: tChat('workspaceFiles.toggleLabel'),
...
-          label: tChat('liveBrowser.toggleLabel', {
-            defaultValue: 'Live browser',
-          }),
+          label: tChat('liveBrowser.toggleLabel'),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/chat/components/composer-mode-menu.tsx` around
lines 241 - 252, Remove the hardcoded English defaultValue fallback strings from
the tChat function calls in the menu label definitions. For the workspace files
menu item with the tChat('workspaceFiles.toggleLabel', ...) call and the live
browser menu item with the tChat('liveBrowser.toggleLabel', ...) call, delete
the defaultValue property and its string value from each call, leaving only the
translation key as the function parameter. This ensures translations are
enforced and no English fallbacks can surface to end users.

Source: Coding guidelines

Comment on lines +75 to +130
useEffect(() => {
// Never construct RFB during SSR or before the host node mounts.
const container = containerRef.current;
if (!container || typeof window === 'undefined') return undefined;

setStatus('connecting');

let rfb: RFB | null = null;
try {
rfb = new RFB(container, buildScreencastUrl(threadId), {});
// Read-only viewer knobs (set as instance props per noVNC docs).
rfb.viewOnly = true;
rfb.scaleViewport = true;
rfb.clipViewport = true;
rfb.focusOnClick = false;
rfb.background = '#000';
rfbRef.current = rfb;
} catch (err) {
console.error('[live-browser] RFB construction failed', err);
setStatus('error');
return undefined;
}

const handleConnect = () => setStatus('streaming');
const handleDisconnect = (event: CustomEvent<{ clean?: boolean }>) => {
// `detail.clean` true = the server (or our own disconnect) closed
// cleanly; false = a transport/protocol failure.
setStatus(event.detail?.clean ? 'disconnected' : 'error');
};
const handleSecurityFailure = (event: CustomEvent) => {
// Auth / org-scope rejection at the WS handshake (cookie failed, or the
// thread isn't the user's). Surface as a generic error — never as
// "disconnected", so the empty/retry copy stays honest.
console.warn('[live-browser] RFB security failure', event.detail);
setStatus('error');
};

rfb.addEventListener('connect', handleConnect);
rfb.addEventListener('disconnect', handleDisconnect);
rfb.addEventListener('securityfailure', handleSecurityFailure);

return () => {
rfb?.removeEventListener('connect', handleConnect);
rfb?.removeEventListener('disconnect', handleDisconnect);
rfb?.removeEventListener('securityfailure', handleSecurityFailure);
try {
rfb?.disconnect();
} catch (err) {
// disconnect() can throw if the socket never opened — log, don't mask
// the unmount.
console.warn('[live-browser] RFB disconnect during cleanup', err);
}
rfbRef.current = null;
};
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [threadId, connectNonce]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the RFB lifecycle with sessionActive and include it in dependencies.

At Line 75, the connect effect still runs when sessionActive is false, and because sessionActive is omitted from the dependency list, an existing socket can stay open after the pane switches to the empty state. That contradicts the intended gate and can strand WebSocket connections.

Suggested fix
-  useEffect(() => {
+  useEffect(() => {
+    if (!sessionActive) {
+      try {
+        rfbRef.current?.disconnect();
+      } catch (err) {
+        console.warn('[live-browser] RFB disconnect during inactive gate', err);
+      }
+      rfbRef.current = null;
+      return;
+    }
+
     // Never construct RFB during SSR or before the host node mounts.
     const container = containerRef.current;
     if (!container || typeof window === 'undefined') return undefined;
@@
-  }, [threadId, connectNonce]);
+  }, [threadId, connectNonce, sessionActive]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/workspace/components/live-browser-pane.tsx`
around lines 75 - 130, The RFB lifecycle effect is not properly guarded by the
sessionActive flag and does not include it in the dependency list, which allows
WebSocket connections to remain open when sessionActive becomes false. Add an
early return guard at the beginning of the useEffect (after the containerRef and
window checks) that returns if sessionActive is false, and include sessionActive
in the dependency array at the end alongside threadId and connectNonce. This
ensures the effect properly respects the session state and cleanup runs when
sessionActive changes.

Comment on lines +391 to +400
<div
ref={resizeRef}
onMouseDown={handleMouseDown}
className="absolute top-0 -left-1 z-10 h-full w-2 cursor-col-resize"
role="separator"
aria-orientation="vertical"
aria-label={t('liveBrowser.paneClose', {
defaultValue: 'Resize live browser panel',
})}
/>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Custom resize separators are mouse-only and not keyboard operable.

Both pane resizers expose role="separator" but only bind onMouseDown, so keyboard users can focus neither operation path nor value changes.

  • services/platform/app/features/workspace/components/live-browser-pane.tsx#L391-L400: add tabIndex={0}, arrow-key resize handling, and aria-valuemin/aria-valuemax/aria-valuenow tied to width state.
  • services/platform/app/features/workspace/components/workspace-files-pane.tsx#L1133-L1142: apply the same keyboard+ARIA value support to keep parity and prevent accessibility drift.

As per coding guidelines, “Ensure all interactive elements are keyboard-reachable.”

📍 Affects 2 files
  • services/platform/app/features/workspace/components/live-browser-pane.tsx#L391-L400 (this comment)
  • services/platform/app/features/workspace/components/workspace-files-pane.tsx#L1133-L1142
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/workspace/components/live-browser-pane.tsx`
around lines 391 - 400, The resize separator elements in both pane components
are mouse-only and lack keyboard accessibility. In
services/platform/app/features/workspace/components/live-browser-pane.tsx at
lines 391-400, add tabIndex={0} to make the separator focusable, implement
onKeyDown handler to support arrow-key resize operations (tracking width
changes), and add aria-valuemin, aria-valuemax, and aria-valuenow ARIA
attributes reflecting the current pane width state. Apply identical keyboard and
ARIA value attribute changes to the separator in
services/platform/app/features/workspace/components/workspace-files-pane.tsx at
lines 1133-1142 to maintain parity between components and ensure keyboard users
can both focus and operate both resize controls.

Source: Coding guidelines

Comment on lines +570 to +607
function TreeRowButton({
isActive,
depth,
onClick,
title,
ariaLabel,
ariaExpanded,
dataDirPath,
dataParentPath,
children,
}: TreeRowButtonProps) {
const state = isActive
? 'bg-muted text-foreground'
: 'text-muted-foreground hover:bg-muted/60 hover:text-foreground';
return (
<button
type="button"
role="treeitem"
aria-selected={isActive}
aria-level={depth + 1}
tabIndex={isActive ? 0 : -1}
onClick={onClick}
title={title}
aria-label={ariaLabel}
aria-expanded={ariaExpanded}
data-dir-path={dataDirPath}
data-parent-path={dataParentPath ?? undefined}
style={{ paddingLeft: `${0.5 + depth * 0.75}rem` }}
className={cn(
'flex w-full items-center gap-1.5 rounded-md px-2 py-1 text-left text-xs',
state,
'focus-visible:ring-ring focus-visible:ring-1 focus-visible:outline-none',
)}
>
{children}
</button>
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tree keyboard entry is broken when no item is selected.

At Line 590, tabIndex is 0 only for isActive. On initial load, nothing is active, so all treeitem buttons become -1 and keyboard users cannot tab into the tree.

As per coding guidelines, “Ensure all interactive elements are keyboard-reachable.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/workspace/components/workspace-files-pane.tsx`
around lines 570 - 607, The TreeRowButton function sets tabIndex to 0 only when
isActive is true, which causes all treeitem buttons to have tabIndex=-1 on
initial load when nothing is active, breaking keyboard navigation. Modify the
tabIndex assignment logic to ensure the first tree item (or a designated default
item) is always keyboard-reachable with tabIndex=0, while inactive items retain
tabIndex=-1. This requires passing additional context to the TreeRowButton
component to identify which item should be initially focusable (e.g., an
isFirstItem prop or similar indicator), then updating the tabIndex assignment to
use that context alongside the isActive state.

Source: Coding guidelines

Comment on lines +964 to +999
onClick={() => setRefreshNonce((n) => n + 1)}
aria-label={t('workspaceFiles.refresh', {
defaultValue: 'Refresh',
})}
>
<RefreshCw className="size-3.5" />
</Button>
</Tooltip>
{/* Rendered in every variant. On desktop it's the pane's close; in
the mobile Sheet (`embedded`) it replaces the Sheet's own absolute
top-right close (suppressed via `hideClose`) so it can't collide
with the Show-hidden / Refresh actions in this same row. */}
<Tooltip
content={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
side="bottom"
>
<Button
variant="ghost"
size="icon"
className="size-7"
onClick={close}
aria-label={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
>
<X className="size-3.5" />
</Button>
</Tooltip>
</div>
</div>

{!sessionRunning ? (
<SessionStoppedState />
) : (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stopped-state gating makes Refresh ineffective and traps the pane.

At Line 997, once sessionRunning flips false, WorkspaceFileTree unmounts. After that, Line 964’s Refresh only bumps refreshNonce with no mounted tree to consume it, so the pane can’t recover without closing/reopening.

Suggested fix
-            <Button
+            <Button
               variant="ghost"
               size="icon"
               className="size-7"
-              onClick={() => setRefreshNonce((n) => n + 1)}
+              onClick={() => {
+                setSessionRunning(true);
+                setRefreshNonce((n) => n + 1);
+              }}
               aria-label={t('workspaceFiles.refresh', {
                 defaultValue: 'Refresh',
               })}
             >
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
onClick={() => setRefreshNonce((n) => n + 1)}
aria-label={t('workspaceFiles.refresh', {
defaultValue: 'Refresh',
})}
>
<RefreshCw className="size-3.5" />
</Button>
</Tooltip>
{/* Rendered in every variant. On desktop it's the pane's close; in
the mobile Sheet (`embedded`) it replaces the Sheet's own absolute
top-right close (suppressed via `hideClose`) so it can't collide
with the Show-hidden / Refresh actions in this same row. */}
<Tooltip
content={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
side="bottom"
>
<Button
variant="ghost"
size="icon"
className="size-7"
onClick={close}
aria-label={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
>
<X className="size-3.5" />
</Button>
</Tooltip>
</div>
</div>
{!sessionRunning ? (
<SessionStoppedState />
) : (
onClick={() => {
setSessionRunning(true);
setRefreshNonce((n) => n + 1);
}}
aria-label={t('workspaceFiles.refresh', {
defaultValue: 'Refresh',
})}
>
<RefreshCw className="size-3.5" />
</Button>
</Tooltip>
{/* Rendered in every variant. On desktop it's the pane's close; in
the mobile Sheet (`embedded`) it replaces the Sheet's own absolute
top-right close (suppressed via `hideClose`) so it can't collide
with the Show-hidden / Refresh actions in this same row. */}
<Tooltip
content={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
side="bottom"
>
<Button
variant="ghost"
size="icon"
className="size-7"
onClick={close}
aria-label={t('workspaceFiles.paneClose', {
defaultValue: 'Close files panel',
})}
>
<X className="size-3.5" />
</Button>
</Tooltip>
</div>
</div>
{!sessionRunning ? (
<SessionStoppedState />
) : (
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/workspace/components/workspace-files-pane.tsx`
around lines 964 - 999, The Refresh button at line 964 becomes ineffective once
the session stops because the condition at line 997 causes WorkspaceFileTree to
unmount when sessionRunning is false, meaning any refreshNonce change has no
component to consume it. Disable the Refresh Button by adding a disabled
attribute that evaluates to true when sessionRunning is false, preventing users
from attempting a refresh action that cannot succeed while the WorkspaceFileTree
is unmounted.

Comment on lines +56 to +58
// oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
const body = (await res.json()) as { sessionId?: string };
return body.sessionId ?? null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate screencast-auth JSON at runtime; don’t trust an assertion at this boundary.

This endpoint is a system boundary. The current cast can accept malformed bodies and propagate an invalid sessionId into relay path construction.

Proposed fix
-    // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
-    const body = (await res.json()) as { sessionId?: string };
-    return body.sessionId ?? null;
+    const body: unknown = await res.json();
+    if (body === null || typeof body !== 'object') return null;
+    const sessionId =
+      'sessionId' in body &&
+      typeof body.sessionId === 'string' &&
+      body.sessionId.length > 0
+        ? body.sessionId
+        : null;
+    return sessionId;
As per coding guidelines: “Validate at every system boundary” and “Never use `as`, `any`, or `unknown` type assertions in TypeScript.”
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
const body = (await res.json()) as { sessionId?: string };
return body.sessionId ?? null;
const body: unknown = await res.json();
if (body === null || typeof body !== 'object') return null;
const sessionId =
'sessionId' in body &&
typeof body.sessionId === 'string' &&
body.sessionId.length > 0
? body.sessionId
: null;
return sessionId;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/vite-plugins/serve-screencast.ts` around lines 56 - 58,
Remove the type assertion with `as { sessionId?: string }` from the res.json()
call in the screencast-auth response handling. Instead, implement runtime
validation to check the structure of the response body before extracting the
sessionId. Validate that the body is an object and that sessionId, if present,
is a string. Return null if validation fails or the sessionId is invalid. This
ensures that malformed responses cannot propagate invalid sessionIds into relay
path construction at this system boundary.

Source: Coding guidelines

// Not our path — leave the socket untouched so Vite's HMR upgrade
// handler (and any other) can claim it.
if (!match) return;
const threadId = decodeURIComponent(match[1]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard decodeURIComponent to prevent upgrade-handler crashes on malformed paths.

A malformed percent-encoded thread id can throw synchronously and abort the upgrade handler. Reject bad paths instead of letting the exception bubble.

Proposed fix
-          const threadId = decodeURIComponent(match[1]);
+          let threadId: string;
+          try {
+            threadId = decodeURIComponent(match[1]);
+          } catch {
+            socket.write('HTTP/1.1 400 Bad Request\r\nConnection: close\r\n\r\n');
+            socket.destroy();
+            return;
+          }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/vite-plugins/serve-screencast.ts` at line 81, The
`decodeURIComponent` call at the threadId assignment can throw a synchronous
exception if the percent-encoded string is malformed, causing the upgrade
handler to crash. Wrap the `decodeURIComponent(match[1])` call in a try-catch
block to handle malformed paths gracefully. In the catch block, reject the
request (returning an appropriate error response) instead of allowing the
exception to propagate and crash the handler.

Comment on lines +109 to +137
let closed = false;
const pending: RawData[] = [];
const teardown = (): void => {
if (closed) return;
closed = true;
try {
spawner.close();
} catch (err) {
console.warn('[serve-screencast] spawner close failed:', err);
}
try {
browser.close();
} catch (err) {
console.warn('[serve-screencast] browser close failed:', err);
}
};

spawner.on('open', () => {
for (const buf of pending) spawner.send(buf);
pending.length = 0;
});
// spawner → browser (framebuffer); browser → spawner (RFB input, tiny).
spawner.on('message', (data) => {
if (browser.readyState === WebSocket.OPEN) browser.send(data);
});
browser.on('message', (data) => {
if (spawner.readyState === WebSocket.OPEN) spawner.send(data);
else pending.push(data);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cap pre-open browser→spawner buffering to avoid unbounded memory growth.

pending grows without limit while the spawner socket is not open. A fast sender can exhaust memory before teardown.

Proposed fix
+        const PENDING_MAX_BYTES = 256 * 1024;
         let closed = false;
         const pending: RawData[] = [];
+        let pendingBytes = 0;
+        const rawDataBytes = (data: RawData): number =>
+          typeof data === 'string'
+            ? Buffer.byteLength(data)
+            : Array.isArray(data)
+              ? data.reduce((n, b) => n + b.byteLength, 0)
+              : data.byteLength;
@@
         spawner.on('open', () => {
           for (const buf of pending) spawner.send(buf);
           pending.length = 0;
+          pendingBytes = 0;
         });
@@
         browser.on('message', (data) => {
           if (spawner.readyState === WebSocket.OPEN) spawner.send(data);
-          else pending.push(data);
+          else {
+            const next = pendingBytes + rawDataBytes(data);
+            if (next > PENDING_MAX_BYTES) {
+              console.warn('[serve-screencast] pending upstream buffer exceeded cap');
+              teardown();
+              return;
+            }
+            pending.push(data);
+            pendingBytes = next;
+          }
         });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/vite-plugins/serve-screencast.ts` around lines 109 - 137,
The pending array in the browser.on('message') event handler grows without limit
while the spawner socket is not open, which can cause unbounded memory growth if
a fast sender sends many messages before the spawner opens. Add a check before
pushing data to the pending array in the browser.on('message') handler to cap
the maximum size of pending messages. If the pending array reaches a configured
size limit, either drop the message, close the browser connection, or log a
warning to prevent memory exhaustion from a runaway sender.

Comment thread services/sandbox-runtime/entrypoint.sh Outdated
Comment on lines +352 to +363
# Block on CDP readiness, fail-open. A failed curl must NOT abort the script
# (set -e) — the loop swallows it and the timeout path only WARNs.
_i=0
while [ "$_i" -lt 30 ]; do
if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"
return 0
fi
_i=$((_i + 1))
sleep 0.5
done
echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a hard timeout to the CDP readiness probe.

Line 356 uses curl -fsS with no timeout. If the local endpoint accepts but stalls, startup can block indefinitely and runnerd never starts, which breaks the intended fail-open behavior.

Suggested fix
-    if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
+    if curl --connect-timeout 1 --max-time 1 -fsS \
+      "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Block on CDP readiness, fail-open. A failed curl must NOT abort the script
# (set -e) — the loop swallows it and the timeout path only WARNs.
_i=0
while [ "$_i" -lt 30 ]; do
if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"
return 0
fi
_i=$((_i + 1))
sleep 0.5
done
echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2
# Block on CDP readiness, fail-open. A failed curl must NOT abort the script
# (set -e) — the loop swallows it and the timeout path only WARNs.
_i=0
while [ "$_i" -lt 30 ]; do
if curl --connect-timeout 1 --max-time 1 -fsS \
"http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"
return 0
fi
_i=$((_i + 1))
sleep 0.5
done
echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox-runtime/entrypoint.sh` around lines 352 - 363, The curl
command in the readiness probe loop that checks the CDP endpoint at
http://127.0.0.1:9222/json/version lacks a timeout parameter. If the endpoint
accepts the connection but stalls without responding, the script can block
indefinitely at this curl call. Add a timeout flag (such as --max-time or
--connect-timeout) to the curl invocation to ensure the probe fails fast and
continues to the next iteration or timeout path, maintaining the intended
fail-open behavior.

Comment on lines +5 to 6
// container itself uses a `--tmpfs /user` for the workspace, so there is
// no per-call workspace volume to manage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale workspace mount comment (--tmpfs /user is no longer accurate).

Line 5 says the runtime workspace is --tmpfs /user, but the runtime now bind-mounts /user via type=bind in services/sandbox/src/docker-args.ts. Update this comment to avoid future debugging confusion.

Suggested patch
-// container itself uses a `--tmpfs /user` for the workspace, so there is
+// container itself bind-mounts the per-exec host workspace at `/user`, so there is
 // no per-call workspace volume to manage.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// container itself uses a `--tmpfs /user` for the workspace, so there is
// no per-call workspace volume to manage.
// container itself bind-mounts the per-exec host workspace at `/user`, so there is
// no per-call workspace volume to manage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/volume.ts` around lines 5 - 6, The comment at lines 5-6
in volume.ts incorrectly states that the container workspace uses `--tmpfs
/user`, but the runtime has been updated to bind-mount `/user` via `type=bind`
(as implemented in docker-args.ts). Update this comment to accurately reflect
that the workspace is now bind-mounted rather than using tmpfs, so that
developers understand the current mounting strategy and avoid future confusion
when reviewing or debugging workspace volume handling.

larryro added 11 commits June 16, 2026 12:41
Selecting a previewable file now fetches and renders its source
immediately instead of stopping at the "Select Preview" placeholder;
markdown opens on Source (its Source/Preview toggle stays available).
Binary / non-previewable files keep the download-only notice, and the
toolbar Preview button becomes the load-retry affordance (disabled
while loading).
The Bun audit gate (Security workflow) began failing on three HIGH
production advisories published after main last passed:
  - form-data <4.0.6  (GHSA-hmw2-7cc7-3qxx, CRLF injection)
  - ws <8.21.0        (GHSA-96hv-2xvq-fx4p, memory-exhaustion DoS)
  - vite <=8.0.15     (GHSA-fx2h-pf6j-xcff, server.fs.deny bypass)

All three are transitive (vite is also a direct devDep). Pin the patched
versions via the existing root `overrides` block — the `ws` override was
itself pinning the vulnerable 8.20.1 — and bump the direct vite devDep in
root + platform to match. `bun audit --prod --audit-level=high` is clean.
The read-only live browser view adds Xvfb + x11vnc and the Noto CJK/emoji
font set to the sandbox-runtime image so non-Latin pages render glyphs,
not tofu. That pushes the CI-measured size to ~2.56 GB, over the old
2450 MB budget (Validate images failed at 2564 MB). Raise it to 2820 MB
(measured + ~10% headroom, per the existing budget rationale) and document
the new components.
`useVoiceModeEffective` subscribed to `getVoiceModeEffective` via
convex/react `useQuery`, which re-throws on a server error. Under load the
query can hit Convex's 1s execution limit; the throw then propagates to
the page error boundary and replaces the whole chat with "Something went
wrong" (observed crashing the arena E2E across all retries).

Voice mode is non-essential, so switch to the non-throwing `useConvexQuery`
wrapper (errors land in `.error`, not the boundary) and degrade to
"voice off" on failure, logging once. The hook's public `VoiceModeState`
contract is unchanged.
…pecs

All three failed deterministically on shards 1/4 (on main too), rooted in
the org being shared across specs / message-persist timing:

- automation: the settle-wait only handled "test row present" or "empty";
  a sibling spec leaves other automations in the shared org — a third
  state neither locator matches -> 60s timeout. Settle on the table count
  footer (covers any non-empty list) or the empty state instead.

- projects-depth: the project key is derived from the name's word initials
  ("E2E Depth <suffix>" -> "ED<x>"), dropping the unique suffix, so create
  collided on PROJECT_KEY_TAKEN in the shared org. Supply an explicit
  unique key from the suffix.

- chat-depth: getSharedThread snapshots messages to _creationTime <=
  sharedAt; the spec enabled sharing before the message was persisted
  downstream, so the shared view was empty. Wait for the turn to complete
  (message persisted) before sharing.
…sient timeout

Under load the self-hosted Convex backend can blow the hard 1s query limit on
getThreadMeta — its RLS path makes a cross-component isOrgMember hop to the
Better Auth member table, amplified on the local backend. convex/react's
useQuery re-throws that into the page error boundary and blanks the whole chat.
Fatal for arena mode, whose split view lives in ephemeral state a remount can't
restore (so an E2E reload can't recover it).

Move the three remaining getThreadMeta consumers (use-message-processing,
plan-approval-card, external-agent-mode-toggle) onto the non-throwing
useConvexQuery wrapper — the same migration chat-interface already uses — so a
transient timeout degrades to a loading state and the reactive query recovers
on the next tick. All three already read the result via optional chaining, so
undefined-on-error is a no-op. Test mock mirrors the registry into the wrapper's
`{ data }` shape.
Saving/deleting a project secret is a Convex ACTION (encrypt-then-upsert) which
— unlike a mutation — is not re-sent when the backend drops the WS mid-flight
(the CI backend intermittently 1011s under load). The lost action left the
dialog open with no success toast, failing the spec. Retry the idempotent
action until the durable state holds (dialog closed / row gone) instead of
waiting on the ephemeral toast.
The managed Chromium for the live-browser view (SANDBOX_BROWSER_VIEW) could wedge with its CDP HTTP listener still answering while no CDP session could attach: connectOverCDP connected then timed out, the blind restart loop never recycled a hung-but-alive browser, and the agent looped browser_navigate with no recovery. The profile also lived on ephemeral tmpfs, so site logins never survived a restart.

entrypoint: move the profile to the persistent /user/.runtime/browser-profile (survives turns, idle-stop/resume, container restart) and supervise Chromium with a self-healing loop — clear the singleton lock + crash-restore state before each (re)launch (preserves logins) and expose a pid/reset control channel.

runnerd: a real CDP liveness probe (HTTP + Target.getTargets round-trip, not just /json/version), /browser/{restart,reset,close-pages}, a browser-health field in /healthz, and a per-exec pre-flight self-heal so the agent never attaches to a wedged browser.

spawner + platform: proxy the browser controls, close the turn's tabs on Stop (logins preserved), gate the idle reaper on CDP health, and add a public resetThreadBrowser action. agent: browser-recovery guidance so a mid-turn CDP error doesn't loop. UI: a Reset-browser affordance (wipes the profile) in the live-browser pane.
`CdpHealth` is only used locally (the waitHealthy/probe return type) and
imported by no other module, so the `export` tripped the Knip gate. Make it a
local interface — no behaviour change; re-export if a cross-module consumer
appears.
entrypoint.sh runs under `set -e`, but the managed-Chromium supervisor loop must survive its child exiting non-zero: `wait "$_bpid"` returns 137 when runnerd SIGKILLs Chromium to recycle a wedged/reset browser, which under errexit terminated the supervisor subshell — so after the FIRST recycle the browser was never relaunched (CDP stayed dead, pid file went stale). Disable errexit inside the loop (`set +e`); every command there is already best-effort guarded. Found via live container E2E: wedge → recycle now relaunches and CDP recovers, logins preserved.
setProjectSecret (and the guardrails secret_box) require a 32-byte
ENCRYPTION_SECRET_HEX on the Convex deployment. Unlike INSTANCE_SECRET /
BETTER_AUTH_SECRET it had no insecure-local default, so the E2E backend ran
without it and every secret save threw "ENCRYPTION_SECRET_HEX environment
variable is not set" — projects-depth's secret step failed deterministically
(masked until the create-key collision was fixed and the test reached it).

Derive a stable 32-byte key from INSTANCE_SECRET in the dev orchestrator (same
pattern as ensureWebdavHmacKey) and push it via ORCHESTRATOR_MANAGED_KEYS so
dev + E2E exercise secrets with zero setup; the key is stable across restarts
so already-encrypted rows still decrypt. Explicit .env wins; prod sets a real
key.
@larryro larryro merged commit 55bc8f7 into main Jun 16, 2026
40 checks passed
@larryro larryro deleted the feat/sandbox-chat-observability branch June 16, 2026 07:13
yannickmonney added a commit that referenced this pull request Jun 16, 2026
Pick up #1890 (read-only sandbox observability in chat). Resolutions mirror the
prior merge: old-location e2e specs (automation/chat-depth/projects-depth) and
tests/container-image-test.sh stay deleted (relocated to tests/e2e + rewritten
as TS integration tests); bun.lock regenerated from the merged package.json,
preserving the vite 8.0.16 / ws 8.21.0 / form-data 4.0.6 security overrides.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant