feat: read-only sandbox observability in chat — live browser + workspace file explorer by larryro · Pull Request #1890 · tale-project/tale

larryro · 2026-06-16T04:10:47Z

Summary

Adds read-only observability for external-agent sandbox sessions to the chat page: a live view of the agent's browser and a read-only file explorer of its workspace — surfaced only on external-agent threads (which have no side panes today, so these are additive). Also lands the /user workspace-layout refactor that both build on.

The agent runs in a per-(org,user) sandbox session container and frequently drives a headless Chromium via Playwright MCP; until now the user could only read the text/tool timeline. This makes the agent's work visible without granting any control.

What's included

Part C — clean /user layout (a9c88113b): renames the session data root /workspace → /user, separating the user working area (/user/workspace, the agent CWD) from I/O (uploads/, output/) and platform plumbing (/user/.runtime/). No migration — the session sandbox is unreleased; local data is wiped + the image rebuilt.

Part A — live browser view (ca4ab4492 + fixes): headed Chromium + Xvfb + x11vnc in the runtime image, streamed to a noVNC client over an authenticated WebSocket chain (runnerd raw tunnel → spawner relay → platform WS termination with cookie/org auth → Caddy). Read-only is guaranteed at the source (x11vnc -viewonly); the framebuffer never leaves loopback. Gated off by default behind the SANDBOX_BROWSER_VIEW operator flag. CJK/emoji fonts + --test-type infobar fix included.

Part B — workspace file explorer (cfea8b64b + fixes): read-only tree of /user/workspace via the existing runnerd /fs APIs, with a file viewer. Browse + view + download; stopped session → "resume to browse".

This session's polish:

Replay-401 fix (08b2faa43): the spawner's HMAC had no nonce, so deterministic empty-body GETs (e.g. sessionIsAlive) collided in the replay cache. Added a per-request nonce (backward-compatible: legacy nonce-less requests still verify).
Composer declutter + gating (8f7da945b, 4de310b3f): removed the redundant Workspace-files/Live-browser pills (desktop uses right-edge strips, mobile uses the + menu); centralized the show/hide gate in one useSandboxPanesAvailable hook, fixing a previously-ungated strip that showed on non-sandbox threads.
Viewer UX (4898e6837): Download is always available for any file; Preview is an explicit on-demand action; Markdown renders user-friendly (with a Source toggle).
Mobile (d9e2b386f, b6b9d1024, 9197db274): viewport-gated panes (no desktop backdrop bleed), +-menu entries, no header-button overlap in the Sheet, left-tree/right-viewer layout.

Security

Read-only is enforced at the VNC source (-viewonly), not just in the client.
Cross-org viewing rejected before the WS upgrade (session→org ∈ cookie orgs).
Framebuffer stays on loopback; session containers publish no ports; runnerd is the sole reader.
The new HMAC nonce keeps replay protection intact (a verbatim replay still collides in the cache).

Verification

Live E2E driven through a real browser (desktop + iPhone CDP device emulation): live VNC stream of the agent browsing, read-only confirmed; file tree + preview + download; markdown rendered; stopped-session resume state; mobile +-menu → Sheet.
Ran an adversarial multi-dimension review of the diff (nonce security / gating regressions / viewer correctness / completeness) with each finding independently verified — gating clean; the doc/comment + orphaned-i18n-key findings it confirmed are fixed in this PR.

Pre-PR checklist

Ran bun run check — green except one pre-existing known-flaky test, convex/integrations/slack/http_actions.test.ts (convexTest doesn't register components under turbo parallel load; passes 8/8 in isolation, unrelated to this PR). All TS lint/typecheck and 72,500+ tests otherwise pass.
N/A — no hand-rolled skeletons added.
Updated services/platform/messages/{en,de,fr}.json (new viewer keys; de-CH inherits de).
N/A — Docs: the parent sandbox-session / external-agent chat feature is unreleased and entirely undocumented (no docs/ pages exist for it); these are observability additions to it, so there is no doc page to extend. Docs will land with the feature's documentation pass.
N/A — no docs pages touched.
N/A — @tale/docs unaffected.
N/A — README unaffected.

Summary by CodeRabbit

Release Notes

New Features
- Added live browser streaming view for external agent sessions—watch read-only mirroring of the agent's browser in real-time.
- Added workspace files explorer pane to browse and preview files from the agent's workspace during chat sessions.
- Added mobile support for workspace files and live browser features via modal dialogs.
- Integrated sandbox observability panes into composer for external-agent threads, with toggle controls in the mode menu.

…ing under .runtime Separate the clean user working area from platform plumbing so it can be surfaced read-only in chat (workspace file explorer, follow-up). The data root /workspace becomes /user: /user/workspace agent CWD + repo clone target (was /workspace/repo) /user/uploads,output user I/O /user/.runtime/{home,deps,tale} HOME, pip/npm targets, steer/wrappers/skills TMPDIR -> /tmp (off the persistent volume) WORKSPACE_ROOT, the entrypoint skeleton+env (both daemon and one-shot modes), run_agent workdir default, staging dests, steer/skills paths, the k8s backend, docker-session-args bind target + volume tmpfs, and the adapter CLAUDE_CONFIG_DIR/TALE_STEER_DIR all move accordingly. Per-org pip/npm/bun caches stay at /cache/* (independent of HOME). No data migration — the session sandbox feature is unreleased. sandbox 283 pass, sandbox-runtime/daemon 49 pass, agent_adapters 30 pass; tsc clean.

…l-agent sandboxes Adds a 'Workspace files' side pane (external-agent threads only) that browses the live session workspace under /user, read-only, with file preview + download. Backend: - sandbox/workspace_files.ts: resolveBrowsableSession + resolveBrowsableSessionForUser internalQueries — the single canAccessThread auth boundary (cross-org access throws), shared by both the list action and the download route, and reusable by the live-browser feature. Owner/status resolution mirrors getThreadSandboxState. - node_only/sandbox/workspace_files.ts: listWorkspaceDir public action (server-side filtering: drops node_modules/.git, hides dotfiles unless showHidden, dirs-first sort, 1000-entry cap with truncated flag) + readWorkspaceFileBytes internalAction. - http.ts: GET /api/sandbox/workspace_file — cookie-authed, org-scoped, content + download (Content-Disposition), 401/403/404/409/429. - session_client.ts: sessionReadFile helper; rate_limiter: security:workspace-file. Frontend: - workspace-files-pane.tsx (lazy file tree: one listWorkspaceDir per directory expand; file viewer via the content route — Shiki for text, object-URL images, download-only for binaries/too-large; show-hidden + refresh; resume-to-browse empty state), workspace-files-context.tsx, workspace-files-toggle.tsx (composer pill, gated to external-agent + a session). Mobile via Sheet. - i18n: chat.workspaceFiles.* in en/de/fr (de-CH falls back to de). Read-only (no edit/delete). Reads the live container FS, distinct from the curated threadFiles. tsc + oxlint clean; resolver unit tests 7/7.

…in chat Adds a 'Live browser' side pane (external-agent threads only) that streams the agent's headed Chromium read-only over VNC-through-WebSocket. OFF by default (operator flag SANDBOX_BROWSER_VIEW on BOTH the spawner and platform). End-to-end chain (raw RFB; one browser-facing WS termination at the platform): Xvfb :99 <- headed Chromium (CDP 127.0.0.1:9222, MCP attaches via --cdp-endpoint) x11vnc 127.0.0.1:5900 -viewonly (airtight read-only; agent drives via CDP, not X) -> runnerd GET /screencast (raw HTTP/1.1 Upgrade tunnel, x-tale-runnerd-token; zero-dep) -> spawner GET /v1/sessions/:id/screencast (Bun WS server, HMAC; WS<->raw-tunnel bridge) -> platform GET /screencast/:threadId (Bun WS, cookie->org via screencast-auth httpAction reusing resolveBrowsableSessionForUser; WS<->WS relay) -> browser noVNC RFB(viewOnly) <canvas> Image/runtime (A1): Dockerfile adds xvfb/x11vnc/fonts; entrypoint start_browser_stack (supervised Xvfb + x11vnc -localhost -viewonly + headed Chromium, fail-open readiness poll) gated on TALE_BROWSER_CDP=1; adapter swaps Playwright MCP to --cdp-endpoint when spec.browserCdp; tale-playwright-mcp shim skips proxy flags in CDP mode; config.browserView -> --env TALE_BROWSER_CDP; run_external_agent sets browserCdp from SANDBOX_BROWSER_VIEW. runnerd (A2): server.on('upgrade') /screencast raw byte tunnel to 127.0.0.1:5900, activeScreencasts counter in /healthz, single-teardown, 502 when x11vnc down. spawner (A3): Bun WS server, HMAC-before-upgrade, Bun.connect raw upgrade to runnerd, 101 parse + binary relay + bufferedAmount backpressure; sweepExpired exemption when activeScreencasts>0 (a watched session isn't reaped). platform (A4): /api/sandbox/screencast-auth httpAction (the auth oracle) + server.ts wrapper-fetch WS termination + Bun WS client to spawner (HMAC headers) + relay; Caddy /screencast/* route + encode exclusion. frontend (A5): live-browser-pane (noVNC RFB, viewOnly + pointer-events-none + View-only badge, connecting/streaming/disconnected/error/empty states), live-browser-toggle, live-browser-context (mutually exclusive with the workspace-files pane); @novnc/novnc dep; i18n chat.liveBrowser.* en/de/fr. sandbox 302, daemon 57, agent_adapters 32 pass; platform tsc + oxlint + i18n tests clean.

…rate-limit bucket, dev mirror Found + fixed while running the read-only sandbox-observability feature end-to-end in a browser: - Desktop modal-backdrop bug: the Workspace-files + Live-browser panes opened BOTH the desktop docked pane AND the mobile Sheet on desktop. A Sheet's md:hidden only hides its content via CSS — Radix still portals the Dialog backdrop, which covered the screen + intercepted clicks. Add a useIsMobile hook and mutually gate the docked panes (desktop) vs the Sheets (mobile). - /api/sandbox/screencast-auth referenced rate-limit bucket 'security:screencast-auth' that was never defined → 500 on every live-browser auth probe (tsc/lint can't catch a missing runtime config key). Define it. - serve-screencast Vite dev plugin: bun dev runs Vite, not server.ts, so the prod /screencast WS isn't served in dev (like serve-webdav/watch-examples mirror their server.ts routes). Mirror it, reusing screencast-relay.ts's HMAC helpers so the contract can't drift. + @types/ws devdep. Live E2E verified (fresh sandbox): clean /user tree, lazy expand, file viewer; live VNC stream of the agent's headed Chromium navigating example.com, read-only.

…ox infobar The live browser view rendered CJK text as tofu boxes (e.g. google.com.hk's Chinese UI) because the runtime image only shipped Latin fonts. Add fonts-noto-cjk + fonts-noto-color-emoji so real-world / non-Latin pages render correctly in the VNC stream. Also pass the headed Chromium --test-type + --disable-infobars + --no-first-run/--no-default-browser-check so the 'unsupported command-line flag: --no-sandbox' infobar (and first-run prompts) don't eat the top of every streamed page. Verified live: agent opens google.com.hk, types a search, reads results — the stream shows readable Chinese (no tofu) and no infobar. Image +~190 MB (CJK).

The 'Workspace files' panel rooted at the /user data root, so it listed the sibling uploads/ and output/ I/O dirs alongside the working tree. Root it at /user/workspace instead — the panel now shows only the agent's working files (cloned repos, created files), matching the 'Workspace files' label. uploads/ and output/ are platform I/O staging, not workspace contents. Verified live: panel shows the agent's hello.txt directly, no uploads/output.

… explorer When a session's backend is evicted (idle/crash/reconcile lag) the Convex row can still read 'active' for a short window — a phantom. The explorer then hit the spawner, got a 404 (no registry entry), and rendered it as an empty folder (listing) or 'missing or too large' (file read) — misleading, since the file exists and the sandbox is just not running. listWorkspaceDir + readWorkspaceFileBytes now probe sessionIsAlive when the spawner 404s: a dead backend surfaces as sessionRunning:false (list) / a 409 session_not_running (read) → the existing 'resume to browse' UI — while a live session still distinguishes a genuinely missing/oversized file (404). Verified: with a live session the read returns 200; the happy path is unchanged. (The session loss in testing was self-inflicted — deleting the spawner's .spawner.lock under a running spawner triggers a restart loop that evicts sessions; resolved by a clean restart.)

…ht viewer The explorer stacked the tree above the file viewer (top/bottom). Switch to the conventional file-explorer layout — tree on the left, viewer on the right — on desktop (`md:flex-row`, tree a 1/3 / 160–280px left sidebar with a right border), while keeping the stacked layout under `md` where a narrow screen has no room for two columns. Bump the default pane width 480→560 so both columns breathe. Verified live: tree (tale/, hello.txt) on the left, 'hi from sandbox' rendered in the viewer on the right.

…obile The Workspace-files / Live-browser composer pills sit at x>=545 in a 390px viewport (verified under iPhone device emulation), so they're unreachable on mobile. Surface both toggles in the always-reachable "+" composer menu under a "Sandbox" section, gated the same as the pills (external-agent thread + an existing session). Fix the mobile Sheet header overlap: the pane body rendered its Show-hidden / Refresh actions in the header row, but the Sheet also drew its own absolute top-right close button on top of Refresh. Render the pane's own close button in every variant (it was previously desktop-only) and pass `hideClose` to both mobile Sheets so the close lives in the header row with the other actions instead of colliding (ESC / overlay-tap still dismiss). The now-dead `embedded` body prop is removed.

…lse-positives The spawner's replay protection keys its short-TTL cache on the request signature, but the signature was fully deterministic for a given (method, path, ms-timestamp, body). An empty-body GET to a fixed path — notably the `sessionIsAlive` liveness probe — therefore signs the SAME string when two probes land in the same millisecond, so the second was rejected as `replay` (a spurious 401). This surfaced as the workspace file explorer logging "sandbox session get failed (401)" on a stopped session, and is a latent race on the phantom-heal path generally. Bind a fresh per-request nonce into the signed string (`METHOD\npath\nts\nnonce\nsha256(body)`), sent as `x-tale-sandbox-nonce`. Distinct requests now get distinct signatures (no false collision) while a verbatim replay of a captured request still hits the cache. The nonce is OPTIONAL in `verify` — when the header is absent the legacy nonce-less string is verified — so a not-yet-updated client (e.g. spawner_client.ts) or a rolling-deploy window keeps authenticating. Updates both platform signers (session_client, screencast-relay) and the spawner verifier + authorize(), with tests for the nonce, the verbatim replay, and the legacy fallback.

… via one hook The Workspace-files / Live-browser composer pills duplicated the open affordance that already exists — the right-edge collapse strips on desktop and the `+`-menu entries on mobile — and crowded the composer row (the pills also overflow off-screen under `md`). Remove both pills and their components. Centralize the gate in a new `useSandboxPanesAvailable` hook (external-agent agent + a thread sandbox session, any lifecycle). chat.tsx now mounts the desktop panes and the mobile Sheets only when it returns true, and the `+`-menu uses the same hook — so a normal chat thread never shows a stray strip or a portalled backdrop. The previously-ungated desktop strip is now gated correctly.

…ew, rendered markdown Restructure the file viewer around a persistent toolbar: - Download is ALWAYS available for any selected file (any size/type), not only on the binary/too-large card as before. The icon sits inside the <a> (not Button's `icon` prop) so `asChild` no longer forwards className onto a Fragment (React dev warning). - Preview is an explicit on-demand action (no auto-fetch on select) with a proper AbortController + object-URL lifecycle; the toolbar button hides once content is shown so it can't collide with a renderer's own toggle. - Markdown (md/mdx/markdown) opens rendered/user-friendly via RenderableFileViewer's Preview mode (gains an optional `defaultMode`; with a Source toggle to view raw); everything else stays syntax-highlighted source. Also split the misleading reuse of `workspaceFiles.emptyDir` ("This folder is empty.") — the no-file-selected placeholder now uses a dedicated `selectFile` string. New i18n keys (preview, previewHint, previewUnavailable, selectFile) added to en/de/fr (de-CH inherits de).

…he composer pills Deleting the Workspace-files / Live-browser composer pills left their `workspaceFiles.toggleTooltip` and `liveBrowser.toggleTooltip` strings unreferenced, which fails the "every key in base files is referenced by source" i18n usage test. Remove them from en/de/fr (toggleLabel stays — the pane strips + `+`-menu still use it; de-CH never had these).

+    deps.touch();
+    // pipe() applies backpressure (pause the source when the dest buffer fills,
+    // resume on drain) on both legs.
+    socket.pipe(vnc);


@@ -0,0 +1,158 @@
+'use node';


coderabbitai · 2026-06-16T04:27:38Z

📝 Walkthrough

Walkthrough

This PR adds read-only sandbox observability for external-agent chats: a workspace files pane and a live browser pane backed by new authenticated HTTP and WebSocket flows. It introduces CDP-based browser attachment for agent execution, adds nonce-aware request signing for screencast/auth requests, and wires runnerd and platform relays for browser streaming. It also migrates sandbox runtime paths and related contracts from /workspace to /user, updates session/container startup accordingly, and refreshes tests, docs, and translations for the new behavior.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

sandbox/k8s: stage attestation (prestage.json) lives on a runner-writable path #1847 — This PR updates the same sandbox Kubernetes path contract area, including TALE_DIR and related stage/harvest runtime paths under /user/.runtime/tale.

Possibly related PRs

tale-project/tale#1841 — Both PRs modify the sandbox Kubernetes pod/backend path and workspace-mount code paths.
tale-project/tale#1880 — Both PRs change Claude Code adapter buildExec and related adapter typing/configuration.
tale-project/tale#1872 — Both PRs update the agent-adapter/session-exec path used when building persistent agent runs.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sandbox-chat-observability

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.

coderabbitai

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/platform/app/features/chat/components/composer-mode-menu.tsx`:
- Around line 241-252: Remove the hardcoded English defaultValue fallback
strings from the tChat function calls in the menu label definitions. For the
workspace files menu item with the tChat('workspaceFiles.toggleLabel', ...) call
and the live browser menu item with the tChat('liveBrowser.toggleLabel', ...)
call, delete the defaultValue property and its string value from each call,
leaving only the translation key as the function parameter. This ensures
translations are enforced and no English fallbacks can surface to end users.

In `@services/platform/app/features/workspace/components/live-browser-pane.tsx`:
- Around line 391-400: The resize separator elements in both pane components are
mouse-only and lack keyboard accessibility. In
services/platform/app/features/workspace/components/live-browser-pane.tsx at
lines 391-400, add tabIndex={0} to make the separator focusable, implement
onKeyDown handler to support arrow-key resize operations (tracking width
changes), and add aria-valuemin, aria-valuemax, and aria-valuenow ARIA
attributes reflecting the current pane width state. Apply identical keyboard and
ARIA value attribute changes to the separator in
services/platform/app/features/workspace/components/workspace-files-pane.tsx at
lines 1133-1142 to maintain parity between components and ensure keyboard users
can both focus and operate both resize controls.
- Around line 75-130: The RFB lifecycle effect is not properly guarded by the
sessionActive flag and does not include it in the dependency list, which allows
WebSocket connections to remain open when sessionActive becomes false. Add an
early return guard at the beginning of the useEffect (after the containerRef and
window checks) that returns if sessionActive is false, and include sessionActive
in the dependency array at the end alongside threadId and connectNonce. This
ensures the effect properly respects the session state and cleanup runs when
sessionActive changes.

In
`@services/platform/app/features/workspace/components/workspace-files-pane.tsx`:
- Around line 570-607: The TreeRowButton function sets tabIndex to 0 only when
isActive is true, which causes all treeitem buttons to have tabIndex=-1 on
initial load when nothing is active, breaking keyboard navigation. Modify the
tabIndex assignment logic to ensure the first tree item (or a designated default
item) is always keyboard-reachable with tabIndex=0, while inactive items retain
tabIndex=-1. This requires passing additional context to the TreeRowButton
component to identify which item should be initially focusable (e.g., an
isFirstItem prop or similar indicator), then updating the tabIndex assignment to
use that context alongside the isActive state.
- Around line 964-999: The Refresh button at line 964 becomes ineffective once
the session stops because the condition at line 997 causes WorkspaceFileTree to
unmount when sessionRunning is false, meaning any refreshNonce change has no
component to consume it. Disable the Refresh Button by adding a disabled
attribute that evaluates to true when sessionRunning is false, preventing users
from attempting a refresh action that cannot succeed while the WorkspaceFileTree
is unmounted.

In `@services/platform/convex/agents/external_agent/run_external_agent.ts`:
- Line 140: The BROWSER_VIEW_ENABLED constant uses a strict string equality
check for the SANDBOX_BROWSER_VIEW environment variable, accepting only the
literal value '1', while the sandbox service uses boolEnvOpt() which accepts
multiple boolean-like values ('true', '1', 'yes', 'on') in a case-insensitive
manner. This causes a mismatch where the platform layer and sandbox layer
interpret the same environment variable differently. Update the
BROWSER_VIEW_ENABLED assignment to use the boolEnvOpt() function (the same
function used in the sandbox service) or implement matching logic that accepts
the same range of boolean-like values to ensure consistent behavior across both
layers.

In `@services/platform/convex/http.ts`:
- Around line 576-588: The catch blocks at both affected sites in
services/platform/convex/http.ts are catching all errors and returning 403,
which masks genuine operational failures. At the anchor site (lines 576-588) and
the sibling site (lines 714-726), replace the generic catch clause with a
specific check for UnauthorizedError, returning 403 only for that error type,
while allowing other unexpected errors to propagate naturally so they are
properly handled as 500 errors by the framework. This matches the established
pattern used elsewhere in the codebase where UnauthorizedError is explicitly
distinguished from other exceptions.

In `@services/platform/convex/node_only/sandbox/run_agent.ts`:
- Around line 104-110: The `browserCdp` field was added to
`RunAgentInSessionArgs` and is being forwarded to `buildExec`, but the `args`
validator schema in `runAgentInSession` does not include this field, preventing
it from being passed through the action wrapper. Add `browserCdp` as an optional
boolean field to the `args` validator in `runAgentInSession` to match the field
definition in `RunAgentInSessionArgs` and enable the field to flow through to
`buildExec` when calls are made through the action wrapper.

In `@services/platform/convex/node_only/sandbox/workspace_files.ts`:
- Around line 65-70: The current path validation using simple string checks for
`..` and NUL bytes does not enforce the `/user/workspace` boundary properly. At
services/platform/convex/node_only/sandbox/workspace_files.ts lines 65-70 in the
function containing the `sessionListFiles` call, normalize `args.path` relative
to `WORKSPACE_ROOT` and reject any resulting path that falls outside
`/user/workspace`. At
services/platform/convex/node_only/sandbox/workspace_files.ts lines 144-145
before the `sessionReadFile` call, apply the same normalization and boundary
check to `args.path`. At services/platform/convex/http.ts lines 523-530 before
passing the `path` query parameter to the internal action, normalize it and
verify it does not escape `/user/workspace`. Use a canonical path resolution
function (not string prefix matching) to properly detect and block directory
traversal and ensure all entry points enforce the workspace boundary
consistently.
- Around line 148-156: Remove the `as const` assertions in the return statements
where status is set to 'ok', 'missing', and 'gone'. Create const variable
assignments before the return statements (such as const okStatus = 'ok', const
missingStatus = 'missing', const goneStatus = 'gone') and use these variables in
place of the inline string literals with `as const`. TypeScript will
automatically infer the narrow literal types from const assignments without
requiring the `as const` assertions.

In `@services/platform/convex/sandbox/workspace_files.test.ts`:
- Around line 62-63: The test file contains two separate issues: First, the eq
function on lines 62-63 uses unnecessary type casts (as string) because it
accepts field as a generic string. Refactor the eq function to use a
discriminated union type for the field parameter, accepting either 'ownerType'
or 'ownerId' as separate overloads or a union type, so that the value parameter
is properly inferred as string without casting. Second, at lines 80-87, the cast
as unknown as QueryHandler is used because the mock of internalQuery returns the
config object directly instead of wrapping it in a QueryHandler like the actual
framework does. Add a one-line comment before the forUser variable assignment
explaining this framework-generated gap: that Convex generated functions wrap
internalQuery results while this mock returns the config directly.

In `@services/platform/lib/screencast-relay.ts`:
- Around line 152-171: The JSDoc comment for the `toArrayBuffer` function
overstates the guarantee by claiming it always returns "a fresh, non-shared
`ArrayBuffer`" when in fact line 159 returns the input `ArrayBuffer` directly
without copying. Update the JSDoc comment to clarify that the function returns
fresh copies only for ArrayBufferView and string inputs, but returns the input
`ArrayBuffer` unchanged when a plain `ArrayBuffer` is passed in. This correction
maintains the actual behavior while making the documentation accurate.

In `@services/platform/vite-plugins/serve-screencast.ts`:
- Line 81: The `decodeURIComponent` call at the threadId assignment can throw a
synchronous exception if the percent-encoded string is malformed, causing the
upgrade handler to crash. Wrap the `decodeURIComponent(match[1])` call in a
try-catch block to handle malformed paths gracefully. In the catch block, reject
the request (returning an appropriate error response) instead of allowing the
exception to propagate and crash the handler.
- Around line 56-58: Remove the type assertion with `as { sessionId?: string }`
from the res.json() call in the screencast-auth response handling. Instead,
implement runtime validation to check the structure of the response body before
extracting the sessionId. Validate that the body is an object and that
sessionId, if present, is a string. Return null if validation fails or the
sessionId is invalid. This ensures that malformed responses cannot propagate
invalid sessionIds into relay path construction at this system boundary.
- Around line 109-137: The pending array in the browser.on('message') event
handler grows without limit while the spawner socket is not open, which can
cause unbounded memory growth if a fast sender sends many messages before the
spawner opens. Add a check before pushing data to the pending array in the
browser.on('message') handler to cap the maximum size of pending messages. If
the pending array reaches a configured size limit, either drop the message,
close the browser connection, or log a warning to prevent memory exhaustion from
a runaway sender.

In `@services/sandbox-runtime/entrypoint.sh`:
- Around line 352-363: The curl command in the readiness probe loop that checks
the CDP endpoint at http://127.0.0.1:9222/json/version lacks a timeout
parameter. If the endpoint accepts the connection but stalls without responding,
the script can block indefinitely at this curl call. Add a timeout flag (such as
--max-time or --connect-timeout) to the curl invocation to ensure the probe
fails fast and continues to the next iteration or timeout path, maintaining the
intended fail-open behavior.

In `@services/sandbox/src/volume.ts`:
- Around line 5-6: The comment at lines 5-6 in volume.ts incorrectly states that
the container workspace uses `--tmpfs /user`, but the runtime has been updated
to bind-mount `/user` via `type=bind` (as implemented in docker-args.ts). Update
this comment to accurately reflect that the workspace is now bind-mounted rather
than using tmpfs, so that developers understand the current mounting strategy
and avoid future confusion when reviewing or debugging workspace volume
handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c513cbb-c2e8-4fd1-9fd9-92a22e8f86ca

📥 Commits

Reviewing files that changed from the base of the PR and between 678661b and 4de310b.

⛔ Files ignored due to path filters (2)

bun.lock is excluded by !**/*.lock
services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (88)

packages/agent_adapters/src/build-exec.test.ts
packages/agent_adapters/src/claude_code/adapter.ts
packages/agent_adapters/src/types.ts
services/platform/app/features/chat/components/chat-input.tsx
services/platform/app/features/chat/components/composer-mode-menu.tsx
services/platform/app/features/chat/hooks/use-sandbox-panes.ts
services/platform/app/features/workspace/components/live-browser-context.tsx
services/platform/app/features/workspace/components/live-browser-pane.tsx
services/platform/app/features/workspace/components/workspace-files-context.tsx
services/platform/app/features/workspace/components/workspace-files-pane.tsx
services/platform/app/features/workspace/viewers/renderable-file-viewer.tsx
services/platform/app/hooks/use-is-mobile.ts
services/platform/app/routes/dashboard/$id/chat.tsx
services/platform/convex/agents/external_agent/run_external_agent.ts
services/platform/convex/http.ts
services/platform/convex/lib/rate_limiter/index.ts
services/platform/convex/node_only/sandbox/helpers/session_client.ts
services/platform/convex/node_only/sandbox/helpers/spawner_client.ts
services/platform/convex/node_only/sandbox/integration_skills.ts
services/platform/convex/node_only/sandbox/internal_actions.ts
services/platform/convex/node_only/sandbox/run_agent.ts
services/platform/convex/node_only/sandbox/steer_delivery.ts
services/platform/convex/node_only/sandbox/steer_files.test.ts
services/platform/convex/node_only/sandbox/steer_files.ts
services/platform/convex/node_only/sandbox/workspace_files.ts
services/platform/convex/sandbox/wire.ts
services/platform/convex/sandbox/workspace_files.test.ts
services/platform/convex/sandbox/workspace_files.ts
services/platform/lib/screencast-relay.test.ts
services/platform/lib/screencast-relay.ts
services/platform/messages/de.json
services/platform/messages/en.json
services/platform/messages/fr.json
services/platform/package.json
services/platform/server.test.ts
services/platform/server.ts
services/platform/types/novnc.d.ts
services/platform/vite-plugins/serve-screencast.ts
services/platform/vite.config.ts
services/proxy/Caddyfile
services/sandbox-runtime/Dockerfile
services/sandbox-runtime/daemon/src/exec-manager.ts
services/sandbox-runtime/daemon/src/file-ops.test.ts
services/sandbox-runtime/daemon/src/main.ts
services/sandbox-runtime/daemon/src/protocol.ts
services/sandbox-runtime/daemon/src/screencast-tunnel.test.ts
services/sandbox-runtime/daemon/src/screencast-tunnel.ts
services/sandbox-runtime/daemon/src/steer-hook.test.ts
services/sandbox-runtime/docker-entrypoint.sh
services/sandbox-runtime/entrypoint.sh
services/sandbox-runtime/tale-playwright-mcp
services/sandbox/docs/kubernetes.md
services/sandbox/docs/sessions.md
services/sandbox/src/auth.test.ts
services/sandbox/src/auth.ts
services/sandbox/src/backend/kubernetes/exec-spec.test.ts
services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
services/sandbox/src/backend/kubernetes/k8s-backend.ts
services/sandbox/src/backend/kubernetes/k8s-harvest.ts
services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts
services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts
services/sandbox/src/backend/kubernetes/k8s-protocol.ts
services/sandbox/src/backend/kubernetes/k8s-session-backend.test.ts
services/sandbox/src/backend/kubernetes/k8s-session-backend.ts
services/sandbox/src/backend/kubernetes/k8s-session-pod-spec.test.ts
services/sandbox/src/backend/kubernetes/k8s-session-pod-spec.ts
services/sandbox/src/backend/kubernetes/k8s-stage.ts
services/sandbox/src/backend/local-workspace-run.ts
services/sandbox/src/config.ts
services/sandbox/src/docker-args.test.ts
services/sandbox/src/docker-args.ts
services/sandbox/src/exec-common.ts
services/sandbox/src/exec-response.ts
services/sandbox/src/server.test.ts
services/sandbox/src/server.ts
services/sandbox/src/session/docker-session-args.test.ts
services/sandbox/src/session/docker-session-args.ts
services/sandbox/src/session/runnerd-protocol.ts
services/sandbox/src/session/screencast-relay.test.ts
services/sandbox/src/session/screencast-relay.ts
services/sandbox/src/session/session-routes.test.ts
services/sandbox/src/session/session-routes.ts
services/sandbox/src/spawn-prior-outputs.test.ts
services/sandbox/src/spawn-staging.test.ts
services/sandbox/src/types.ts
services/sandbox/src/validate-request.ts
services/sandbox/src/volume.ts
services/sandbox/src/wire.ts

coderabbitai · 2026-06-16T04:27:41Z

+          label: tChat('workspaceFiles.toggleLabel', {
+            defaultValue: 'Workspace files',
+          }),
+          icon: FolderOpen,
+          selected: files.isOpen,
+          onClick: () => files.toggle(),
+        },
+        {
+          type: 'item',
+          label: tChat('liveBrowser.toggleLabel', {
+            defaultValue: 'Live browser',
+          }),


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove hardcoded English fallback strings from localized menu labels.

Line 241 and Line 250 add English defaultValue literals that can surface to end users and bypass locale enforcement. Use translation keys only here.

As per coding guidelines, "Never hardcode user-facing strings in JSX or code. Always use the translation hook useT(namespace) from services/platform/lib/i18n/client.tsx."

Suggested fix

- label: tChat('workspaceFiles.toggleLabel', { - defaultValue: 'Workspace files', - }), + label: tChat('workspaceFiles.toggleLabel'), ... - label: tChat('liveBrowser.toggleLabel', { - defaultValue: 'Live browser', - }), + label: tChat('liveBrowser.toggleLabel'),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/app/features/chat/components/composer-mode-menu.tsx` around lines 241 - 252, Remove the hardcoded English defaultValue fallback strings from the tChat function calls in the menu label definitions. For the workspace files menu item with the tChat('workspaceFiles.toggleLabel', ...) call and the live browser menu item with the tChat('liveBrowser.toggleLabel', ...) call, delete the defaultValue property and its string value from each call, leaving only the translation key as the function parameter. This ensures translations are enforced and no English fallbacks can surface to end users.

Source: Coding guidelines

coderabbitai · 2026-06-16T04:27:41Z

+  useEffect(() => {
+    // Never construct RFB during SSR or before the host node mounts.
+    const container = containerRef.current;
+    if (!container || typeof window === 'undefined') return undefined;
+
+    setStatus('connecting');
+
+    let rfb: RFB | null = null;
+    try {
+      rfb = new RFB(container, buildScreencastUrl(threadId), {});
+      // Read-only viewer knobs (set as instance props per noVNC docs).
+      rfb.viewOnly = true;
+      rfb.scaleViewport = true;
+      rfb.clipViewport = true;
+      rfb.focusOnClick = false;
+      rfb.background = '#000';
+      rfbRef.current = rfb;
+    } catch (err) {
+      console.error('[live-browser] RFB construction failed', err);
+      setStatus('error');
+      return undefined;
+    }
+
+    const handleConnect = () => setStatus('streaming');
+    const handleDisconnect = (event: CustomEvent<{ clean?: boolean }>) => {
+      // `detail.clean` true = the server (or our own disconnect) closed
+      // cleanly; false = a transport/protocol failure.
+      setStatus(event.detail?.clean ? 'disconnected' : 'error');
+    };
+    const handleSecurityFailure = (event: CustomEvent) => {
+      // Auth / org-scope rejection at the WS handshake (cookie failed, or the
+      // thread isn't the user's). Surface as a generic error — never as
+      // "disconnected", so the empty/retry copy stays honest.
+      console.warn('[live-browser] RFB security failure', event.detail);
+      setStatus('error');
+    };
+
+    rfb.addEventListener('connect', handleConnect);
+    rfb.addEventListener('disconnect', handleDisconnect);
+    rfb.addEventListener('securityfailure', handleSecurityFailure);
+
+    return () => {
+      rfb?.removeEventListener('connect', handleConnect);
+      rfb?.removeEventListener('disconnect', handleDisconnect);
+      rfb?.removeEventListener('securityfailure', handleSecurityFailure);
+      try {
+        rfb?.disconnect();
+      } catch (err) {
+        // disconnect() can throw if the socket never opened — log, don't mask
+        // the unmount.
+        console.warn('[live-browser] RFB disconnect during cleanup', err);
+      }
+      rfbRef.current = null;
+    };
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [threadId, connectNonce]);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the RFB lifecycle with sessionActive and include it in dependencies.

At Line 75, the connect effect still runs when sessionActive is false, and because sessionActive is omitted from the dependency list, an existing socket can stay open after the pane switches to the empty state. That contradicts the intended gate and can strand WebSocket connections.

Suggested fix

- useEffect(() => { + useEffect(() => { + if (!sessionActive) { + try { + rfbRef.current?.disconnect(); + } catch (err) { + console.warn('[live-browser] RFB disconnect during inactive gate', err); + } + rfbRef.current = null; + return; + } + // Never construct RFB during SSR or before the host node mounts. const container = containerRef.current; if (!container || typeof window === 'undefined') return undefined; @@ - }, [threadId, connectNonce]); + }, [threadId, connectNonce, sessionActive]);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/app/features/workspace/components/live-browser-pane.tsx` around lines 75 - 130, The RFB lifecycle effect is not properly guarded by the sessionActive flag and does not include it in the dependency list, which allows WebSocket connections to remain open when sessionActive becomes false. Add an early return guard at the beginning of the useEffect (after the containerRef and window checks) that returns if sessionActive is false, and include sessionActive in the dependency array at the end alongside threadId and connectNonce. This ensures the effect properly respects the session state and cleanup runs when sessionActive changes.

coderabbitai · 2026-06-16T04:27:41Z

+      <div
+        ref={resizeRef}
+        onMouseDown={handleMouseDown}
+        className="absolute top-0 -left-1 z-10 h-full w-2 cursor-col-resize"
+        role="separator"
+        aria-orientation="vertical"
+        aria-label={t('liveBrowser.paneClose', {
+          defaultValue: 'Resize live browser panel',
+        })}
+      />


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Custom resize separators are mouse-only and not keyboard operable.

Both pane resizers expose role="separator" but only bind onMouseDown, so keyboard users can focus neither operation path nor value changes.

services/platform/app/features/workspace/components/live-browser-pane.tsx#L391-L400: add tabIndex={0}, arrow-key resize handling, and aria-valuemin/aria-valuemax/aria-valuenow tied to width state.

services/platform/app/features/workspace/components/workspace-files-pane.tsx#L1133-L1142: apply the same keyboard+ARIA value support to keep parity and prevent accessibility drift.

As per coding guidelines, “Ensure all interactive elements are keyboard-reachable.”

📍 Affects 2 files

services/platform/app/features/workspace/components/live-browser-pane.tsx#L391-L400 (this comment)

services/platform/app/features/workspace/components/workspace-files-pane.tsx#L1133-L1142

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/app/features/workspace/components/live-browser-pane.tsx` around lines 391 - 400, The resize separator elements in both pane components are mouse-only and lack keyboard accessibility. In services/platform/app/features/workspace/components/live-browser-pane.tsx at lines 391-400, add tabIndex={0} to make the separator focusable, implement onKeyDown handler to support arrow-key resize operations (tracking width changes), and add aria-valuemin, aria-valuemax, and aria-valuenow ARIA attributes reflecting the current pane width state. Apply identical keyboard and ARIA value attribute changes to the separator in services/platform/app/features/workspace/components/workspace-files-pane.tsx at lines 1133-1142 to maintain parity between components and ensure keyboard users can both focus and operate both resize controls.

Source: Coding guidelines

coderabbitai · 2026-06-16T04:27:41Z

+function TreeRowButton({
+  isActive,
+  depth,
+  onClick,
+  title,
+  ariaLabel,
+  ariaExpanded,
+  dataDirPath,
+  dataParentPath,
+  children,
+}: TreeRowButtonProps) {
+  const state = isActive
+    ? 'bg-muted text-foreground'
+    : 'text-muted-foreground hover:bg-muted/60 hover:text-foreground';
+  return (
+    <button
+      type="button"
+      role="treeitem"
+      aria-selected={isActive}
+      aria-level={depth + 1}
+      tabIndex={isActive ? 0 : -1}
+      onClick={onClick}
+      title={title}
+      aria-label={ariaLabel}
+      aria-expanded={ariaExpanded}
+      data-dir-path={dataDirPath}
+      data-parent-path={dataParentPath ?? undefined}
+      style={{ paddingLeft: `${0.5 + depth * 0.75}rem` }}
+      className={cn(
+        'flex w-full items-center gap-1.5 rounded-md px-2 py-1 text-left text-xs',
+        state,
+        'focus-visible:ring-ring focus-visible:ring-1 focus-visible:outline-none',
+      )}
+    >
+      {children}
+    </button>
+  );
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tree keyboard entry is broken when no item is selected.

At Line 590, tabIndex is 0 only for isActive. On initial load, nothing is active, so all treeitem buttons become -1 and keyboard users cannot tab into the tree.

As per coding guidelines, “Ensure all interactive elements are keyboard-reachable.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/app/features/workspace/components/workspace-files-pane.tsx` around lines 570 - 607, The TreeRowButton function sets tabIndex to 0 only when isActive is true, which causes all treeitem buttons to have tabIndex=-1 on initial load when nothing is active, breaking keyboard navigation. Modify the tabIndex assignment logic to ensure the first tree item (or a designated default item) is always keyboard-reachable with tabIndex=0, while inactive items retain tabIndex=-1. This requires passing additional context to the TreeRowButton component to identify which item should be initially focusable (e.g., an isFirstItem prop or similar indicator), then updating the tabIndex assignment to use that context alongside the isActive state.

Source: Coding guidelines

coderabbitai · 2026-06-16T04:27:41Z

+              onClick={() => setRefreshNonce((n) => n + 1)}
+              aria-label={t('workspaceFiles.refresh', {
+                defaultValue: 'Refresh',
+              })}
+            >
+              <RefreshCw className="size-3.5" />
+            </Button>
+          </Tooltip>
+          {/* Rendered in every variant. On desktop it's the pane's close; in
+              the mobile Sheet (`embedded`) it replaces the Sheet's own absolute
+              top-right close (suppressed via `hideClose`) so it can't collide
+              with the Show-hidden / Refresh actions in this same row. */}
+          <Tooltip
+            content={t('workspaceFiles.paneClose', {
+              defaultValue: 'Close files panel',
+            })}
+            side="bottom"
+          >
+            <Button
+              variant="ghost"
+              size="icon"
+              className="size-7"
+              onClick={close}
+              aria-label={t('workspaceFiles.paneClose', {
+                defaultValue: 'Close files panel',
+              })}
+            >
+              <X className="size-3.5" />
+            </Button>
+          </Tooltip>
+        </div>
+      </div>
+
+      {!sessionRunning ? (
+        <SessionStoppedState />
+      ) : (


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stopped-state gating makes Refresh ineffective and traps the pane.

At Line 997, once sessionRunning flips false, WorkspaceFileTree unmounts. After that, Line 964’s Refresh only bumps refreshNonce with no mounted tree to consume it, so the pane can’t recover without closing/reopening.

Suggested fix

- <Button + <Button variant="ghost" size="icon" className="size-7" - onClick={() => setRefreshNonce((n) => n + 1)} + onClick={() => { + setSessionRunning(true); + setRefreshNonce((n) => n + 1); + }} aria-label={t('workspaceFiles.refresh', { defaultValue: 'Refresh', })} >

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

onClick={() => setRefreshNonce((n) => n + 1)}

aria-label={t('workspaceFiles.refresh', {

defaultValue: 'Refresh',

})}

>

<RefreshCw className="size-3.5" />

</Button>

</Tooltip>

{/* Rendered in every variant. On desktop it's the pane's close; in

the mobile Sheet (`embedded`) it replaces the Sheet's own absolute

top-right close (suppressed via `hideClose`) so it can't collide

with the Show-hidden / Refresh actions in this same row. */}

<Tooltip

content={t('workspaceFiles.paneClose', {

defaultValue: 'Close files panel',

})}

side="bottom"

>

<Button

variant="ghost"

size="icon"

className="size-7"

onClick={close}

aria-label={t('workspaceFiles.paneClose', {

defaultValue: 'Close files panel',

})}

>

<X className="size-3.5" />

</Button>

</Tooltip>

</div>

</div>

{!sessionRunning ? (

<SessionStoppedState />

) : (

onClick={() => {

setSessionRunning(true);

setRefreshNonce((n) => n + 1);

}}

aria-label={t('workspaceFiles.refresh', {

defaultValue: 'Refresh',

})}

>

<RefreshCw className="size-3.5" />

</Button>

</Tooltip>

{/* Rendered in every variant. On desktop it's the pane's close; in

the mobile Sheet (`embedded`) it replaces the Sheet's own absolute

top-right close (suppressed via `hideClose`) so it can't collide

with the Show-hidden / Refresh actions in this same row. */}

<Tooltip

content={t('workspaceFiles.paneClose', {

defaultValue: 'Close files panel',

})}

side="bottom"

>

<Button

variant="ghost"

size="icon"

className="size-7"

onClick={close}

aria-label={t('workspaceFiles.paneClose', {

defaultValue: 'Close files panel',

})}

>

<X className="size-3.5" />

</Button>

</Tooltip>

</div>

</div>

{!sessionRunning ? (

<SessionStoppedState />

) : (

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/app/features/workspace/components/workspace-files-pane.tsx` around lines 964 - 999, The Refresh button at line 964 becomes ineffective once the session stops because the condition at line 997 causes WorkspaceFileTree to unmount when sessionRunning is false, meaning any refreshNonce change has no component to consume it. Disable the Refresh Button by adding a disabled attribute that evaluates to true when sessionRunning is false, preventing users from attempting a refresh action that cannot succeed while the WorkspaceFileTree is unmounted.

coderabbitai · 2026-06-16T04:27:42Z

+    // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
+    const body = (await res.json()) as { sessionId?: string };
+    return body.sessionId ?? null;


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate screencast-auth JSON at runtime; don’t trust an assertion at this boundary.

This endpoint is a system boundary. The current cast can accept malformed bodies and propagate an invalid sessionId into relay path construction.

Proposed fix

- // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion - const body = (await res.json()) as { sessionId?: string }; - return body.sessionId ?? null; + const body: unknown = await res.json(); + if (body === null || typeof body !== 'object') return null; + const sessionId = + 'sessionId' in body && + typeof body.sessionId === 'string' && + body.sessionId.length > 0 + ? body.sessionId + : null; + return sessionId;

As per coding guidelines: “Validate at every system boundary” and “Never use `as`, `any`, or `unknown` type assertions in TypeScript.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion

const body = (await res.json()) as { sessionId?: string };

return body.sessionId ?? null;

const body: unknown = await res.json();

if (body === null || typeof body !== 'object') return null;

const sessionId =

'sessionId' in body &&

typeof body.sessionId === 'string' &&

body.sessionId.length > 0

? body.sessionId

: null;

return sessionId;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/vite-plugins/serve-screencast.ts` around lines 56 - 58, Remove the type assertion with `as { sessionId?: string }` from the res.json() call in the screencast-auth response handling. Instead, implement runtime validation to check the structure of the response body before extracting the sessionId. Validate that the body is an object and that sessionId, if present, is a string. Return null if validation fails or the sessionId is invalid. This ensures that malformed responses cannot propagate invalid sessionIds into relay path construction at this system boundary.

Source: Coding guidelines

coderabbitai · 2026-06-16T04:27:42Z

+          // Not our path — leave the socket untouched so Vite's HMR upgrade
+          // handler (and any other) can claim it.
+          if (!match) return;
+          const threadId = decodeURIComponent(match[1]);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard decodeURIComponent to prevent upgrade-handler crashes on malformed paths.

A malformed percent-encoded thread id can throw synchronously and abort the upgrade handler. Reject bad paths instead of letting the exception bubble.

Proposed fix

- const threadId = decodeURIComponent(match[1]); + let threadId: string; + try { + threadId = decodeURIComponent(match[1]); + } catch { + socket.write('HTTP/1.1 400 Bad Request\r\nConnection: close\r\n\r\n'); + socket.destroy(); + return; + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/vite-plugins/serve-screencast.ts` at line 81, The `decodeURIComponent` call at the threadId assignment can throw a synchronous exception if the percent-encoded string is malformed, causing the upgrade handler to crash. Wrap the `decodeURIComponent(match[1])` call in a try-catch block to handle malformed paths gracefully. In the catch block, reject the request (returning an appropriate error response) instead of allowing the exception to propagate and crash the handler.

coderabbitai · 2026-06-16T04:27:42Z

+        let closed = false;
+        const pending: RawData[] = [];
+        const teardown = (): void => {
+          if (closed) return;
+          closed = true;
+          try {
+            spawner.close();
+          } catch (err) {
+            console.warn('[serve-screencast] spawner close failed:', err);
+          }
+          try {
+            browser.close();
+          } catch (err) {
+            console.warn('[serve-screencast] browser close failed:', err);
+          }
+        };
+
+        spawner.on('open', () => {
+          for (const buf of pending) spawner.send(buf);
+          pending.length = 0;
+        });
+        // spawner → browser (framebuffer); browser → spawner (RFB input, tiny).
+        spawner.on('message', (data) => {
+          if (browser.readyState === WebSocket.OPEN) browser.send(data);
+        });
+        browser.on('message', (data) => {
+          if (spawner.readyState === WebSocket.OPEN) spawner.send(data);
+          else pending.push(data);
+        });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cap pre-open browser→spawner buffering to avoid unbounded memory growth.

pending grows without limit while the spawner socket is not open. A fast sender can exhaust memory before teardown.

Proposed fix

+ const PENDING_MAX_BYTES = 256 * 1024; let closed = false; const pending: RawData[] = []; + let pendingBytes = 0; + const rawDataBytes = (data: RawData): number => + typeof data === 'string' + ? Buffer.byteLength(data) + : Array.isArray(data) + ? data.reduce((n, b) => n + b.byteLength, 0) + : data.byteLength; @@ spawner.on('open', () => { for (const buf of pending) spawner.send(buf); pending.length = 0; + pendingBytes = 0; }); @@ browser.on('message', (data) => { if (spawner.readyState === WebSocket.OPEN) spawner.send(data); - else pending.push(data); + else { + const next = pendingBytes + rawDataBytes(data); + if (next > PENDING_MAX_BYTES) { + console.warn('[serve-screencast] pending upstream buffer exceeded cap'); + teardown(); + return; + } + pending.push(data); + pendingBytes = next; + } });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/vite-plugins/serve-screencast.ts` around lines 109 - 137, The pending array in the browser.on('message') event handler grows without limit while the spawner socket is not open, which can cause unbounded memory growth if a fast sender sends many messages before the spawner opens. Add a check before pushing data to the pending array in the browser.on('message') handler to cap the maximum size of pending messages. If the pending array reaches a configured size limit, either drop the message, close the browser connection, or log a warning to prevent memory exhaustion from a runaway sender.

coderabbitai · 2026-06-16T04:27:42Z

+  # Block on CDP readiness, fail-open. A failed curl must NOT abort the script
+  # (set -e) — the loop swallows it and the timeout path only WARNs.
+  _i=0
+  while [ "$_i" -lt 30 ]; do
+    if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then
+      echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"
+      return 0
+    fi
+    _i=$((_i + 1))
+    sleep 0.5
+  done
+  echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a hard timeout to the CDP readiness probe.

Line 356 uses curl -fsS with no timeout. If the local endpoint accepts but stalls, startup can block indefinitely and runnerd never starts, which breaks the intended fail-open behavior.

Suggested fix

- if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then + if curl --connect-timeout 1 --max-time 1 -fsS \ + "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Block on CDP readiness, fail-open. A failed curl must NOT abort the script

# (set -e) — the loop swallows it and the timeout path only WARNs.

_i=0

while [ "$_i" -lt 30 ]; do

if curl -fsS "http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then

echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"

return 0

fi

_i=$((_i + 1))

sleep 0.5

done

echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2

# Block on CDP readiness, fail-open. A failed curl must NOT abort the script

# (set -e) — the loop swallows it and the timeout path only WARNs.

_i=0

while [ "$_i" -lt 30 ]; do

if curl --connect-timeout 1 --max-time 1 -fsS \

"http://127.0.0.1:9222/json/version" >/dev/null 2>&1; then

echo "[entrypoint] live browser view ready (CDP 127.0.0.1:9222, VNC 127.0.0.1:5900, view-only)"

return 0

fi

_i=$((_i + 1))

sleep 0.5

done

echo "[entrypoint] WARN: headed Chromium CDP not ready within ~15s; continuing (runnerd starts; Playwright MCP --cdp-endpoint will retry the connection)" >&2

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox-runtime/entrypoint.sh` around lines 352 - 363, The curl command in the readiness probe loop that checks the CDP endpoint at http://127.0.0.1:9222/json/version lacks a timeout parameter. If the endpoint accepts the connection but stalls without responding, the script can block indefinitely at this curl call. Add a timeout flag (such as --max-time or --connect-timeout) to the curl invocation to ensure the probe fails fast and continues to the next iteration or timeout path, maintaining the intended fail-open behavior.

coderabbitai · 2026-06-16T04:27:42Z

+// container itself uses a `--tmpfs /user` for the workspace, so there is
 // no per-call workspace volume to manage.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale workspace mount comment (--tmpfs /user is no longer accurate).

Line 5 says the runtime workspace is --tmpfs /user, but the runtime now bind-mounts /user via type=bind in services/sandbox/src/docker-args.ts. Update this comment to avoid future debugging confusion.

Suggested patch

-// container itself uses a `--tmpfs /user` for the workspace, so there is +// container itself bind-mounts the per-exec host workspace at `/user`, so there is // no per-call workspace volume to manage.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// container itself uses a `--tmpfs /user` for the workspace, so there is

// no per-call workspace volume to manage.

// container itself bind-mounts the per-exec host workspace at `/user`, so there is

// no per-call workspace volume to manage.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox/src/volume.ts` around lines 5 - 6, The comment at lines 5-6 in volume.ts incorrectly states that the container workspace uses `--tmpfs /user`, but the runtime has been updated to bind-mount `/user` via `type=bind` (as implemented in docker-args.ts). Update this comment to accurately reflect that the workspace is now bind-mounted rather than using tmpfs, so that developers understand the current mounting strategy and avoid future confusion when reviewing or debugging workspace volume handling.

Selecting a previewable file now fetches and renders its source immediately instead of stopping at the "Select Preview" placeholder; markdown opens on Source (its Source/Preview toggle stays available). Binary / non-previewable files keep the download-only notice, and the toolbar Preview button becomes the load-retry affordance (disabled while loading).

The Bun audit gate (Security workflow) began failing on three HIGH production advisories published after main last passed: - form-data <4.0.6 (GHSA-hmw2-7cc7-3qxx, CRLF injection) - ws <8.21.0 (GHSA-96hv-2xvq-fx4p, memory-exhaustion DoS) - vite <=8.0.15 (GHSA-fx2h-pf6j-xcff, server.fs.deny bypass) All three are transitive (vite is also a direct devDep). Pin the patched versions via the existing root `overrides` block — the `ws` override was itself pinning the vulnerable 8.20.1 — and bump the direct vite devDep in root + platform to match. `bun audit --prod --audit-level=high` is clean.

The read-only live browser view adds Xvfb + x11vnc and the Noto CJK/emoji font set to the sandbox-runtime image so non-Latin pages render glyphs, not tofu. That pushes the CI-measured size to ~2.56 GB, over the old 2450 MB budget (Validate images failed at 2564 MB). Raise it to 2820 MB (measured + ~10% headroom, per the existing budget rationale) and document the new components.

`useVoiceModeEffective` subscribed to `getVoiceModeEffective` via convex/react `useQuery`, which re-throws on a server error. Under load the query can hit Convex's 1s execution limit; the throw then propagates to the page error boundary and replaces the whole chat with "Something went wrong" (observed crashing the arena E2E across all retries). Voice mode is non-essential, so switch to the non-throwing `useConvexQuery` wrapper (errors land in `.error`, not the boundary) and degrade to "voice off" on failure, logging once. The hook's public `VoiceModeState` contract is unchanged.

…pecs All three failed deterministically on shards 1/4 (on main too), rooted in the org being shared across specs / message-persist timing: - automation: the settle-wait only handled "test row present" or "empty"; a sibling spec leaves other automations in the shared org — a third state neither locator matches -> 60s timeout. Settle on the table count footer (covers any non-empty list) or the empty state instead. - projects-depth: the project key is derived from the name's word initials ("E2E Depth <suffix>" -> "ED<x>"), dropping the unique suffix, so create collided on PROJECT_KEY_TAKEN in the shared org. Supply an explicit unique key from the suffix. - chat-depth: getSharedThread snapshots messages to _creationTime <= sharedAt; the spec enabled sharing before the message was persisted downstream, so the shared view was empty. Wait for the turn to complete (message persisted) before sharing.

…sient timeout Under load the self-hosted Convex backend can blow the hard 1s query limit on getThreadMeta — its RLS path makes a cross-component isOrgMember hop to the Better Auth member table, amplified on the local backend. convex/react's useQuery re-throws that into the page error boundary and blanks the whole chat. Fatal for arena mode, whose split view lives in ephemeral state a remount can't restore (so an E2E reload can't recover it). Move the three remaining getThreadMeta consumers (use-message-processing, plan-approval-card, external-agent-mode-toggle) onto the non-throwing useConvexQuery wrapper — the same migration chat-interface already uses — so a transient timeout degrades to a loading state and the reactive query recovers on the next tick. All three already read the result via optional chaining, so undefined-on-error is a no-op. Test mock mirrors the registry into the wrapper's `{ data }` shape.

Saving/deleting a project secret is a Convex ACTION (encrypt-then-upsert) which — unlike a mutation — is not re-sent when the backend drops the WS mid-flight (the CI backend intermittently 1011s under load). The lost action left the dialog open with no success toast, failing the spec. Retry the idempotent action until the durable state holds (dialog closed / row gone) instead of waiting on the ephemeral toast.

The managed Chromium for the live-browser view (SANDBOX_BROWSER_VIEW) could wedge with its CDP HTTP listener still answering while no CDP session could attach: connectOverCDP connected then timed out, the blind restart loop never recycled a hung-but-alive browser, and the agent looped browser_navigate with no recovery. The profile also lived on ephemeral tmpfs, so site logins never survived a restart. entrypoint: move the profile to the persistent /user/.runtime/browser-profile (survives turns, idle-stop/resume, container restart) and supervise Chromium with a self-healing loop — clear the singleton lock + crash-restore state before each (re)launch (preserves logins) and expose a pid/reset control channel. runnerd: a real CDP liveness probe (HTTP + Target.getTargets round-trip, not just /json/version), /browser/{restart,reset,close-pages}, a browser-health field in /healthz, and a per-exec pre-flight self-heal so the agent never attaches to a wedged browser. spawner + platform: proxy the browser controls, close the turn's tabs on Stop (logins preserved), gate the idle reaper on CDP health, and add a public resetThreadBrowser action. agent: browser-recovery guidance so a mid-turn CDP error doesn't loop. UI: a Reset-browser affordance (wipes the profile) in the live-browser pane.

`CdpHealth` is only used locally (the waitHealthy/probe return type) and imported by no other module, so the `export` tripped the Knip gate. Make it a local interface — no behaviour change; re-export if a cross-module consumer appears.

entrypoint.sh runs under `set -e`, but the managed-Chromium supervisor loop must survive its child exiting non-zero: `wait "$_bpid"` returns 137 when runnerd SIGKILLs Chromium to recycle a wedged/reset browser, which under errexit terminated the supervisor subshell — so after the FIRST recycle the browser was never relaunched (CDP stayed dead, pid file went stale). Disable errexit inside the loop (`set +e`); every command there is already best-effort guarded. Found via live container E2E: wedge → recycle now relaunches and CDP recovers, logins preserved.

setProjectSecret (and the guardrails secret_box) require a 32-byte ENCRYPTION_SECRET_HEX on the Convex deployment. Unlike INSTANCE_SECRET / BETTER_AUTH_SECRET it had no insecure-local default, so the E2E backend ran without it and every secret save threw "ENCRYPTION_SECRET_HEX environment variable is not set" — projects-depth's secret step failed deterministically (masked until the create-key collision was fixed and the test reached it). Derive a stable 32-byte key from INSTANCE_SECRET in the dev orchestrator (same pattern as ensureWebdavHmacKey) and push it via ORCHESTRATOR_MANAGED_KEYS so dev + E2E exercise secrets with zero setup; the key is stable across restarts so already-encrypted rows still decrypt. Explicit .env wins; prod sets a real key.

Pick up #1890 (read-only sandbox observability in chat). Resolutions mirror the prior merge: old-location e2e specs (automation/chat-depth/projects-depth) and tests/container-image-test.sh stay deleted (relocated to tests/e2e + rewritten as TS integration tests); bun.lock regenerated from the merged package.json, preserving the vite 8.0.16 / ws 8.21.0 / form-data 4.0.6 security overrides.

larryro added 13 commits June 15, 2026 23:07

github-code-quality Bot found potential problems Jun 16, 2026

View reviewed changes

coderabbitai Bot requested changes Jun 16, 2026

View reviewed changes

larryro added 11 commits June 16, 2026 12:41

larryro merged commit 55bc8f7 into main Jun 16, 2026
40 checks passed

larryro deleted the feat/sandbox-chat-observability branch June 16, 2026 07:13

coderabbitai Bot mentioned this pull request Jun 16, 2026

feat(platform): agent-initiated browser human takeover #1895

Merged

7 tasks

-    // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
-    const body = (await res.json()) as { sessionId?: string };
-    return body.sessionId ?? null;
+    const body: unknown = await res.json();
+    if (body === null || typeof body !== 'object') return null;
+    const sessionId =
+      'sessionId' in body &&
+      typeof body.sessionId === 'string' &&
+      body.sessionId.length > 0
+        ? body.sessionId
+        : null;
+    return sessionId;

		// container itself uses a `--tmpfs /user` for the workspace, so there is
		// no per-call workspace volume to manage.

Conversation

larryro commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Security

Verification

Pre-PR checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Walkthrough

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading