feat(sandbox): persistent agent sessions for Claude Code#1872
Conversation
| @@ -0,0 +1,180 @@ | |||
| 'use node'; | |||
| @@ -0,0 +1,266 @@ | |||
| 'use node'; | |||
📝 WalkthroughWalkthroughThe PR adds a new Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 24
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@compose.yml`:
- Around line 691-692: The compose entry uses an external image declaration
image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8} but sets pull_policy:
${PULL_POLICY:-build}, which is invalid for external images; update the
pull_policy default to a valid external-image policy (e.g., change
${PULL_POLICY:-build} to ${PULL_POLICY:-if-not-present} or
${PULL_POLICY:-always}) and ensure the pull_policy key in the bifrost service no
longer defaults to "build" or relies on PULL_POLICY being "build".
In `@packages/agent_adapters/src/build-exec.test.ts`:
- Around line 11-16: The test fixture constant `base` is implicitly typed;
change its declaration to validate against `AgentRunSpec` using the TypeScript
`satisfies` operator (i.e., declare the object and append `satisfies
AgentRunSpec`) so the object is checked for conformance while preserving
inference — update the `base` definition in build-exec.test.ts accordingly and
remove any explicit casts or annotations if present.
In `@packages/agent_adapters/src/claude_code/parse.ts`:
- Around line 117-125: Extract the model string once into a local variable
before constructing the events entry (e.g., const modelStr =
str(message?.model)) and use modelStr in both the conditional spread and the
value ({ ...(modelStr ? { model: modelStr } : {}) }) instead of calling
str(message?.model) twice; apply this change in the block that builds the events
object (where events.push is called and num(...) uses usage fields) to preserve
behavior and avoid duplicate calls.
In `@packages/agent_adapters/src/opencode/parse.ts`:
- Around line 67-85: The handler for type === 'tool_use' currently normalizes
events even when toolUseId is empty (toolUseId = str(part?.id) ??
str(part?.callID) ?? ''), which can corrupt correlations; change the logic in
the type === 'tool_use' branch (around maybeStart(ev), toolUseId, and AgentEvent
creation) to detect a missing/empty toolUseId and in that case avoid emitting
normalized 'tool-use' or 'tool-result' events — instead push the raw event (or
skip normalization) and return; if toolUseId is present, continue with the
existing normalization and creation of AgentEvent objects as before.
In `@packages/agent_adapters/src/types.ts`:
- Around line 15-34: OpenCodeAdapter.buildExec currently places
AgentRunSpec.prompt into argv (violating the SessionExecSpec contract that
prompt must be sent via stdin), so change OpenCodeAdapter.buildExec to set the
exec object's stdin to spec.prompt (matching ClaudeCodeAdapter) and remove the
prompt from the args array; update the corresponding test in build-exec.test.ts
to assert stdin contains the prompt instead of it appearing in argv and ensure
any helper/constructors that build the SessionExecSpec still omit prompt from
argv.
In `@services/platform/convex/node_only/sandbox/helpers/session_client.ts`:
- Around line 252-258: The parseData helper currently swallows JSON.parse errors
silently; update the parseData<T>(data: string): T | null function to either log
the parse error at debug/trace level using the module logger (include the raw
data and error.message) or, if silence is intentional, add a clear comment
explaining that malformed SSE payloads are expected and why they should be
ignored; reference the parseData function to implement the change so debugging
information is available when needed.
In `@services/platform/convex/node_only/sandbox/run_agent.ts`:
- Around line 116-143: The AbortController instance named controller is created
but never triggered; either wire it into cancellation or remove it: if you
intend to cancel the sessionExec early (e.g., on args.timeoutMs expiry or an
external cancel), call controller.abort() at the appropriate place (for example
from a timeout handler or by exposing controller to the caller) so sessionExec
receives an abort; otherwise remove the controller and stop passing
controller.signal to sessionExec to avoid unused overhead. Ensure references are
to AbortController, controller, sessionExec, and args.timeoutMs when making the
change.
- Around line 49-54: The return schema currently uses v.string() for the status
field which is too permissive; change the schema in the returns object (the
validator in run_agent.ts that defines status) to
v.union(v.literal('completed'), v.literal('cancelled'), v.literal('failed')) so
the Convex boundary enforces only the three implemented statuses (the code paths
returning 'completed','cancelled','failed' around the existing return logic);
update the status validator accordingly to catch invalid values at validation
time.
- Around line 55-68: The handler currently force-casts args.agentSlug to
AgentSlug for getAgentAdapter which violates the no-"as" rule; add a type guard
function (e.g., isAgentSlug(value): value is AgentSlug) that checks the runtime
string against the allowed AgentSlug union, use that guard to validate
args.agentSlug before calling getAgentAdapter, and handle the invalid case
(throw an error or return a clear Convex-friendly failure) instead of using
"as"; update the handler to call getAgentAdapter(args.agentSlug) only after the
guard passes so the compile-time type is correct.
In `@services/platform/convex/node_only/sandbox/session_credentials.ts`:
- Around line 22-28: Remove the unused ResolvedCredential interface to avoid
dead code: locate the ResolvedCredential type declaration and delete it (it
isn’t referenced anywhere; the resolveSessionCredentials action returns { env,
git } directly), or alternatively change resolveSessionCredentials’s return type
to explicitly use ResolvedCredential if you intend to keep the interface; prefer
removing the interface unless you plan to use it elsewhere or add a comment
explaining its future purpose.
- Around line 92-99: getDecryptedCredentials can throw and currently will abort
the whole operation; wrap the call to getDecryptedCredentials(...) (the block
producing `decrypted` and then calling `pickSecret(decrypted)`) in a try-catch,
log a warning including the grant `slug` and the error (matching the existing
missing-credential warning pattern), and continue the loop when decryption fails
so one corrupted credential won’t break session creation.
In `@services/platform/convex/sandbox/wire.ts`:
- Around line 166-179: Add a compile-time parity guard that asserts the exported
SandboxSessionProfile union matches the cross-service literal definition
(sandboxSessionProfileLiterals) using the project's Equal<> pattern; import the
Equal helper if needed and place the assertion after the
SandboxSessionProfile/type export so TypeScript fails the build if the two
literal sets drift.
In `@services/sandbox-runtime/daemon/src/env-store.ts`:
- Around line 17-23: The constructor(seed?: Record<string,string>) is currently
only applying isDeniedEnvName filtering and bypasses the same caps enforced by
patch(), causing oversized values or too many entries at startup; update the
constructor to enforce the same entry count and value-size limits used by
patch(): iterate seed entries honoring isDeniedEnvName, then for each entry
check the max entries cap (same limit used by patch()), enforce per-value size
limit (same byte/char limit and trimming/rejection behavior), and stop/skip
additional entries once the cap is reached so the in-memory this.store is
populated under the same constraints as patch() to keep behavior consistent with
the patch() logic.
In `@services/sandbox-runtime/daemon/src/exec-manager.ts`:
- Around line 195-200: The empty catch blocks in exec-manager.ts around the
process.kill call (when negative pid is used to signal a process group) and the
other similar block at the 231-235 region must not swallow errors; update those
catches to log the exception with context (include child.pid, signal, and
operation like "kill process group" or "stdio wiring/timeout cancel") using
console.warn or console.error and include the error.stack or message, or rethrow
if the caller must handle it; reference the child.pid/child and signal variables
and ensure the log provides clear context so failures in kill/stdio/timeout
paths are visible.
- Around line 240-255: The stdout/stderr handlers (child.stdout.on and
child.stderr.on) allow a chunk that crosses the byte cap to be emitted in full
because they check bytes >= req.*MaxBytes before adding chunk.byteLength; update
the logic to compute remaining = req.*MaxBytes - currentBytes and if remaining
<= 0 set *Trunc = true and return, otherwise if chunk.byteLength > remaining
emit only the first 'remaining' bytes (mark *Trunc = true after emitting the
truncated portion) and add remaining to currentBytes; use Buffer.slice or
equivalent to emit the truncated chunk via ringEmit and ensure
stdoutBytes/stderrBytes reflect the actual bytes emitted.
- Around line 327-332: Replace the unsafe assertion in emitRingLine by parsing
into unknown, performing a discriminator-based narrowing on the parsed value
(check it's an object, has a string 't' property matching the RunnerdExecEvent
discriminator and the required fields for that event shape), and only call
emit(...) with the now-narrowed RunnerdExecEvent; remove the eslint suppression.
Implement this as a small type-guard or runtime validator for RunnerdExecEvent
(referencing the RunnerdExecEvent type and its 't' discriminator) and use it to
validate the JSON.parse result before emitting, handling invalid data in the
catch/error path instead of asserting.
In `@services/sandbox-runtime/daemon/src/file-ops.ts`:
- Around line 70-80: The code currently fully buffers the response with await
res.arrayBuffer() before checking size which can OOM; update the fetch handling
around the fetch(item.url) response by first checking
res.headers.get('content-length') against FETCH_MAX_BYTES and skipping if too
large, then use res.body?.getReader() to stream-read chunks while maintaining a
running byte count and canceling/aborting the reader when total >
FETCH_MAX_BYTES (pushing skipped with { path: item.path, reason: 'too_large' }),
handle the case where res.body is missing by pushing skipped with { path:
item.path, reason: 'no_body' }, and only construct the final Buffer from
accumulated chunks when the size check passes.
In `@services/sandbox-runtime/daemon/src/main.ts`:
- Around line 239-250: The JSON parsed from readBody is assigned to a typed
variable without runtime validation, risking malformed input to envStore.patch;
update the POST /env handler to treat JSON.parse result as unknown, validate
that the parsed value has optional properties "set" as Record<string,string> and
"unset" as string[] (or sanitize them) before calling envStore.patch, and return
400 on invalid shapes (use the existing sendJson to respond). Reference
readBody, the local variable body, envStore.patch, touch, and sendJson when
locating and updating the handler.
In `@services/sandbox-runtime/daemon/src/protocol.ts`:
- Line 22: The denylist check uses an `as` cast which violates the TS rule
banning `as`; remove the cast and perform a cast-free check by replacing
`(RUNNERD_ENV_DENYLIST as readonly string[]).includes(name)` with a type-safe
alternative such as using `RUNNERD_ENV_DENYLIST.some(v => v === name)` or widen
the declaration of RUNNERD_ENV_DENYLIST so `includes(name)` is already valid;
update the condition in the function that references `name` and
`RUNNERD_ENV_DENYLIST` accordingly.
In `@services/sandbox-runtime/Dockerfile`:
- Around line 81-85: The Dockerfile currently swallows all Playwright install
failures by ending the fallback chain with "|| true", risking an image without
Chromium; update the RUN block that sets PLAYWRIGHT_BROWSERS_PATH and runs
"/opt/agents/bin/mcp-server-playwright --help" and the two "playwright install
chromium" commands to remove the final "|| true" and add a verification step
(e.g., test for the Chromium binary under PLAYWRIGHT_BROWSERS_PATH or run "npx
--prefix /opt/agents playwright --version" / "playwright show-trace" equivalent)
and exit non‑zero if verification fails so the build fails when Chromium is not
installed.
In `@services/sandbox/docs/sessions.md`:
- Around line 44-50: Update the "Tier 2 — managed-entry credentials" paragraph
to remove the claim that git credentials are fetched per-operation with
immediate revocation; instead state the current v1 helper is env-backed and that
per-operation fetching is a planned follow-up, and clarify that revocation
timing depends on the env-backed lifecycle rather than being immediate.
Reference the "Tier 2 — managed-entry credentials" heading and the audited event
name "sandboxCredentialAccess" so reviewers can locate and adjust the sentence
that currently asserts per-operation fetch and immediate revocation.
In `@services/sandbox/src/backend/docker/docker-session-backend.ts`:
- Around line 52-53: Ensure uid/gid are validated instead of silently parsing
profile.user: add strict validation in config.ts when reading SANDBOX_AGENT_USER
(use a /^\d+:\d+$/ match, parse uid/gid, ensure both >= 1 and throw a clear
Error if invalid) and export the validated "user" string and numeric uid/gid;
then update docker-session-backend.ts to consume the validated values (replace
the raw Number(profile.user.split(...)) logic) and add a defensive check before
calling chown (verify Number.isInteger(uid) && Number.isInteger(gid) && uid >= 1
&& gid >= 1) to avoid NaN or root defaults in runtime.
In `@services/sandbox/src/backend/kubernetes/k8s-session-backend.ts`:
- Around line 86-99: The timeout budget is not shared: change the flow after
waitForEndpoint to compute remainingMs = spec.createdAtMs +
this.cfg.session.createHealthTimeoutMs - Date.now(), throw a timeout error (or
immediately destroy and throw) if remainingMs <= 0, and pass remainingMs into
waitForRunnerd instead of this.cfg.session.createHealthTimeoutMs (keep using {
baseUrl: endpoint, token: this.tokenFor(spec.sessionId) } and the same error
handling around destroySession); use the waitForEndpoint and waitForRunnerd
symbols and spec.createdAtMs/cfg.session.createHealthTimeoutMs to locate the
code to update.
In `@services/sandbox/src/session/session-routes.ts`:
- Around line 88-113: The catch in sweepExpired (inside
services/sandbox/src/session/session-routes.ts) swallows runnerdHealth
errors—update the catch to log the error (e.g., console.warn or console.error)
including context like the session id and endpoint returned from registry.list()
and the error object, but keep the current behavior of not re-throwing so
reaping is unchanged; reference the tokenFor, runnerdHealth, registry.list, and
backend.destroySession usages to locate the block to modify.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 361ef54a-e4ea-40a2-a6a2-d5bb8871e6bb
⛔ Files ignored due to path filters (3)
bun.lockis excluded by!**/*.lockservices/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**services/sandbox-runtime/daemon/bun.lockis excluded by!**/*.lock
📒 Files selected for processing (73)
compose.ymlpackage.jsonpackages/agent_adapters/.oxlintrc.jsonpackages/agent_adapters/README.mdpackages/agent_adapters/fixtures/claude_code/issue-to-pr.jsonlpackages/agent_adapters/fixtures/opencode/issue-to-pr.jsonlpackages/agent_adapters/package.jsonpackages/agent_adapters/scripts/record-fixture.mdpackages/agent_adapters/src/build-exec.test.tspackages/agent_adapters/src/claude_code/adapter.tspackages/agent_adapters/src/claude_code/parse.test.tspackages/agent_adapters/src/claude_code/parse.tspackages/agent_adapters/src/events.tspackages/agent_adapters/src/index.tspackages/agent_adapters/src/jsonl.tspackages/agent_adapters/src/opencode/adapter.tspackages/agent_adapters/src/opencode/parse.test.tspackages/agent_adapters/src/opencode/parse.tspackages/agent_adapters/src/registry.tspackages/agent_adapters/src/types.tspackages/agent_adapters/tsconfig.jsonservices/platform/convex/node_only/sandbox/bifrost_admin.tsservices/platform/convex/node_only/sandbox/helpers/session_client.tsservices/platform/convex/node_only/sandbox/run_agent.tsservices/platform/convex/node_only/sandbox/session_credentials.tsservices/platform/convex/sandbox/session_mutations.tsservices/platform/convex/sandbox/sessions_schema.tsservices/platform/convex/sandbox/wire.tsservices/platform/convex/schema.tsservices/sandbox-egress/entrypoint.shservices/sandbox-runtime/Dockerfileservices/sandbox-runtime/daemon/.oxlintrc.jsonservices/sandbox-runtime/daemon/package.jsonservices/sandbox-runtime/daemon/src/env-store.test.tsservices/sandbox-runtime/daemon/src/env-store.tsservices/sandbox-runtime/daemon/src/exec-manager.test.tsservices/sandbox-runtime/daemon/src/exec-manager.tsservices/sandbox-runtime/daemon/src/file-ops.test.tsservices/sandbox-runtime/daemon/src/file-ops.tsservices/sandbox-runtime/daemon/src/main.tsservices/sandbox-runtime/daemon/src/protocol.tsservices/sandbox-runtime/daemon/tsconfig.jsonservices/sandbox-runtime/entrypoint.shservices/sandbox-runtime/managed-settings.jsonservices/sandbox-runtime/tale-git-credentialservices/sandbox/docs/sessions.mdservices/sandbox/src/backend/docker/docker-session-backend.tsservices/sandbox/src/backend/index.tsservices/sandbox/src/backend/kubernetes/exec-spec.test.tsservices/sandbox/src/backend/kubernetes/k8s-backend.test.tsservices/sandbox/src/backend/kubernetes/k8s-pod-spec.test.tsservices/sandbox/src/backend/kubernetes/k8s-session-backend.tsservices/sandbox/src/backend/kubernetes/k8s-session-pod-spec.test.tsservices/sandbox/src/backend/kubernetes/k8s-session-pod-spec.tsservices/sandbox/src/backend/types.tsservices/sandbox/src/config.tsservices/sandbox/src/docker-args.test.tsservices/sandbox/src/http-util.tsservices/sandbox/src/server.tsservices/sandbox/src/session/docker-session-args.test.tsservices/sandbox/src/session/docker-session-args.tsservices/sandbox/src/session/runnerd-client.tsservices/sandbox/src/session/runnerd-protocol.tsservices/sandbox/src/session/session-naming.tsservices/sandbox/src/session/session-registry.tsservices/sandbox/src/session/session-routes.test.tsservices/sandbox/src/session/session-routes.tsservices/sandbox/src/session/session-test-config.tsservices/sandbox/src/session/validate-session.tsservices/sandbox/src/sse.tsservices/sandbox/src/types.tsservices/sandbox/src/wire.tstests/container-sandbox-runtime-test.sh
| image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8} | ||
| pull_policy: ${PULL_POLICY:-build} |
There was a problem hiding this comment.
pull_policy: build is invalid for external images.
The bifrost service uses an external image (maximhq/bifrost) but inherits pull_policy: ${PULL_POLICY:-build}. The build policy is meaningless without a build: block — Docker Compose will fail or behave unexpectedly when PULL_POLICY=build.
For external images, use always or if-not-present:
🐛 Proposed fix
bifrost:
image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8}
- pull_policy: ${PULL_POLICY:-build}
+ pull_policy: always
container_name: tale-bifrost📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8} | |
| pull_policy: ${PULL_POLICY:-build} | |
| image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8} | |
| pull_policy: always |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@compose.yml` around lines 691 - 692, The compose entry uses an external image
declaration image: maximhq/bifrost:${BIFROST_VERSION:-1.4.8} but sets
pull_policy: ${PULL_POLICY:-build}, which is invalid for external images; update
the pull_policy default to a valid external-image policy (e.g., change
${PULL_POLICY:-build} to ${PULL_POLICY:-if-not-present} or
${PULL_POLICY:-always}) and ensure the pull_policy key in the bifrost service no
longer defaults to "build" or relies on PULL_POLICY being "build".
| const base: AgentRunSpec = { | ||
| prompt: 'Fix issue #1 and open a PR', | ||
| model: 'claude-sonnet-4-6', | ||
| gateway: { baseUrl: 'http://bifrost:8080', token: 'sk-bf-test' }, | ||
| workdir: '/workspace/repo', | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Prefer satisfies for test fixture type validation.
The base test fixture uses implicit typing. Per coding guidelines, test fixtures should use the satisfies operator to validate type conformance while preserving inference.
♻️ Proposed fix
-const base: AgentRunSpec = {
+const base = {
prompt: 'Fix issue `#1` and open a PR',
model: 'claude-sonnet-4-6',
gateway: { baseUrl: 'http://bifrost:8080', token: 'sk-bf-test' },
workdir: '/workspace/repo',
-};
+} satisfies AgentRunSpec;Based on learnings: In TypeScript test fixtures, prefer the 'satisfies' operator over 'as' type assertions or explicit annotations for config objects and test data. 'Satisfies' provides compile-time validation that the value conforms to the type without altering the inferred type, improving type safety.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const base: AgentRunSpec = { | |
| prompt: 'Fix issue #1 and open a PR', | |
| model: 'claude-sonnet-4-6', | |
| gateway: { baseUrl: 'http://bifrost:8080', token: 'sk-bf-test' }, | |
| workdir: '/workspace/repo', | |
| }; | |
| const base = { | |
| prompt: 'Fix issue `#1` and open a PR', | |
| model: 'claude-sonnet-4-6', | |
| gateway: { baseUrl: 'http://bifrost:8080', token: 'sk-bf-test' }, | |
| workdir: '/workspace/repo', | |
| } satisfies AgentRunSpec; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/agent_adapters/src/build-exec.test.ts` around lines 11 - 16, The
test fixture constant `base` is implicitly typed; change its declaration to
validate against `AgentRunSpec` using the TypeScript `satisfies` operator (i.e.,
declare the object and append `satisfies AgentRunSpec`) so the object is checked
for conformance while preserving inference — update the `base` definition in
build-exec.test.ts accordingly and remove any explicit casts or annotations if
present.
Source: Learnings
| events.push({ | ||
| type: 'usage', | ||
| ...(str(message?.model) ? { model: str(message?.model) } : {}), | ||
| inputTokens: num(usage.input_tokens), | ||
| outputTokens: num(usage.output_tokens), | ||
| cacheReadTokens: num(usage.cache_read_input_tokens), | ||
| cacheWriteTokens: num(usage.cache_creation_input_tokens), | ||
| }); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Minor: str(message?.model) called twice.
The model string extraction is performed twice in the same expression. Extract to a variable for clarity and efficiency:
+ const model = str(message?.model);
events.push({
type: 'usage',
- ...(str(message?.model) ? { model: str(message?.model) } : {}),
+ ...(model ? { model } : {}),
inputTokens: num(usage.input_tokens),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| events.push({ | |
| type: 'usage', | |
| ...(str(message?.model) ? { model: str(message?.model) } : {}), | |
| inputTokens: num(usage.input_tokens), | |
| outputTokens: num(usage.output_tokens), | |
| cacheReadTokens: num(usage.cache_read_input_tokens), | |
| cacheWriteTokens: num(usage.cache_creation_input_tokens), | |
| }); | |
| } | |
| const model = str(message?.model); | |
| events.push({ | |
| type: 'usage', | |
| ...(model ? { model } : {}), | |
| inputTokens: num(usage.input_tokens), | |
| outputTokens: num(usage.output_tokens), | |
| cacheReadTokens: num(usage.cache_read_input_tokens), | |
| cacheWriteTokens: num(usage.cache_creation_input_tokens), | |
| }); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/agent_adapters/src/claude_code/parse.ts` around lines 117 - 125,
Extract the model string once into a local variable before constructing the
events entry (e.g., const modelStr = str(message?.model)) and use modelStr in
both the conditional spread and the value ({ ...(modelStr ? { model: modelStr }
: {}) }) instead of calling str(message?.model) twice; apply this change in the
block that builds the events object (where events.push is called and num(...)
uses usage fields) to preserve behavior and avoid duplicate calls.
| if (type === 'tool_use') { | ||
| const events = this.maybeStart(ev); | ||
| const state = obj(part?.state); | ||
| const status = str(state?.status); | ||
| const toolUseId = str(part?.id) ?? str(part?.callID) ?? ''; | ||
| if (status === 'completed' || status === 'error') { | ||
| const out: AgentEvent = { type: 'tool-result', toolUseId }; | ||
| if (state?.output !== undefined) out.output = state.output; | ||
| if (status === 'error') out.isError = true; | ||
| events.push(out); | ||
| } else { | ||
| events.push({ | ||
| type: 'tool-use', | ||
| toolUseId, | ||
| toolName: str(part?.tool) ?? '', | ||
| input: state?.input, | ||
| }); | ||
| } | ||
| return events; |
There was a problem hiding this comment.
Do not emit normalized tool events with an empty toolUseId.
Line 71 falls back to '' when both part.id and part.callID are missing. That can corrupt downstream tool-use/tool-result correlation by merging malformed events under the same key. Treat missing IDs as non-normalizable (e.g., emit raw and skip normalized tool events).
Proposed fix
- const toolUseId = str(part?.id) ?? str(part?.callID) ?? '';
+ const toolUseId = str(part?.id) ?? str(part?.callID);
+ if (!toolUseId) {
+ events.push({ type: 'raw', agent: 'opencode', payload: ev });
+ return events;
+ }
if (status === 'completed' || status === 'error') {
const out: AgentEvent = { type: 'tool-result', toolUseId };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/agent_adapters/src/opencode/parse.ts` around lines 67 - 85, The
handler for type === 'tool_use' currently normalizes events even when toolUseId
is empty (toolUseId = str(part?.id) ?? str(part?.callID) ?? ''), which can
corrupt correlations; change the logic in the type === 'tool_use' branch (around
maybeStart(ev), toolUseId, and AgentEvent creation) to detect a missing/empty
toolUseId and in that case avoid emitting normalized 'tool-use' or 'tool-result'
events — instead push the raw event (or skip normalization) and return; if
toolUseId is present, continue with the existing normalization and creation of
AgentEvent objects as before.
| export interface AgentRunSpec { | ||
| prompt: string; | ||
| /** Gateway model id (e.g. an org-allowlisted model). */ | ||
| model?: string; | ||
| /** Resume handle captured from a prior run's `run-started`/`result` | ||
| * (Claude session_id / OpenCode sessionID). Continues the same agent | ||
| * conversation in the same workspace. */ | ||
| agentSessionId?: string; | ||
| /** Agent loop cap; defaults to 40 (matches the platform agent maxSteps). */ | ||
| maxTurns?: number; | ||
| /** Extra system-prompt text appended to the agent's defaults. */ | ||
| systemPromptAppend?: string; | ||
| gateway: GatewayTarget; | ||
| /** Working directory inside the session (e.g. /workspace/repo). */ | ||
| workdir: string; | ||
| /** Enable the in-container Playwright MCP server. Default true for the | ||
| * agent profile; entry points pass false for headless/no-browser tasks to | ||
| * save the per-turn tool-definition token overhead. */ | ||
| browserMcp?: boolean; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how prompts are passed to agents in both adapters
echo "=== Checking Claude Code adapter ==="
rg -A5 'buildExec.*prompt' packages/agent_adapters/src/claude_code/
echo "=== Checking OpenCode adapter ==="
rg -A5 'buildExec.*prompt|argv\.push.*prompt' packages/agent_adapters/src/opencode/
echo "=== Checking if stdin is used for prompts ==="
rg 'stdin.*prompt|prompt.*stdin' packages/agent_adapters/src/Repository: tale-project/tale
Length of output: 1045
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Locate SessionExecSpec and related fields ==="
rg -n "export interface SessionExecSpec|interface SessionExecSpec|SessionExecSpec" packages/agent_adapters/src/types.ts
echo "--- types.ts (around SessionExecSpec) ---"
sed -n '1,120p' packages/agent_adapters/src/types.ts | nl -ba
echo "=== Inspect OpenCode adapter around argv.push(spec.prompt) ==="
rg -n "argv\.push\(spec\.prompt\)|buildExec|stdin:" packages/agent_adapters/src/opencode/adapter.ts
echo "--- opencode/adapter.ts (relevant section) ---"
sed -n '1,220p' packages/agent_adapters/src/opencode/adapter.ts | nl -ba
echo "=== Inspect build-exec test expectations and implementation ==="
rg -n "prompt rides stdin|stdin\).*base\.prompt|buildExec" packages/agent_adapters/src/build-exec.test.ts
echo "--- build-exec.test.ts (relevant section) ---"
sed -n '1,220p' packages/agent_adapters/src/build-exec.test.ts | nl -ba
rg -n "function buildExec|buildExec\(" packages/agent_adapters/src | head -n 50
echo "=== If build-exec.ts exists, show it ==="
ls -1 packages/agent_adapters/src | rg "build-exec|buildExec"
if [ -f packages/agent_adapters/src/build-exec.ts ]; then
sed -n '1,240p' packages/agent_adapters/src/build-exec.ts | nl -ba
fi
if [ -f packages/agent_adapters/src/build-exec.tsx ]; then
sed -n '1,240p' packages/agent_adapters/src/build-exec.tsx | nl -ba
fi
echo "=== Inspect Claude Code adapter how it sets stdin ==="
rg -n "stdin:\s*spec\.prompt|argv.*spec\.prompt|buildExec" packages/agent_adapters/src/claude_code/adapter.ts
echo "--- claude_code/adapter.ts (relevant section) ---"
sed -n '1,240p' packages/agent_adapters/src/claude_code/adapter.ts | nl -baRepository: tale-project/tale
Length of output: 305
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== List agent_adapters/src ==="
ls -la packages/agent_adapters/src
echo "=== types.ts: show SessionExecSpec + AgentRunSpec contracts ==="
sed -n '1,120p' packages/agent_adapters/src/types.ts | cat -n
echo "=== Show SessionExecSpec usage/buildExec ==="
rg -n "buildExec\\(spec: AgentRunSpec\\)|SessionExecSpec" packages/agent_adapters/src/types.ts packages/agent_adapters/src -S
echo "=== opencode/adapter.ts: show relevant argv/stdin handling ==="
sed -n '1,220p' packages/agent_adapters/src/opencode/adapter.ts | cat -n
echo "=== claude_code/adapter.ts: show relevant argv/stdin handling ==="
sed -n '1,220p' packages/agent_adapters/src/claude_code/adapter.ts | cat -n
echo "=== build-exec.*: show implementation + tests ==="
ls -la packages/agent_adapters/src | rg -n "build-exec"
rg -n "buildExec\\(" packages/agent_adapters/src -S
sed -n '1,220p' packages/agent_adapters/src/build-exec.test.ts | cat -n
echo "=== locate any build-exec implementation file ==="
fd -t f "build-exec*" packages/agent_adapters/src
fd -t f "buildExec*" packages/agent_adapters/srcRepository: tale-project/tale
Length of output: 15791
Avoid passing AgentRunSpec.prompt via argv in OpenCodeAdapter (process-list data leak).
SessionExecSpeccontract explicitly says the prompt is on stdin, never argv (packages/agent_adapters/src/types.ts).OpenCodeAdapter.buildExecviolates this by pushing the prompt as a trailing positional argument (packages/agent_adapters/src/opencode/adapter.ts), andbuild-exec.test.tsasserts this behavior—whereasClaudeCodeAdaptercorrectly usesstdin: spec.prompt.- Change
OpenCodeAdapterto pass the prompt viastdin(and update the corresponding test) so sensitive prompts don’t end up in process arguments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/agent_adapters/src/types.ts` around lines 15 - 34,
OpenCodeAdapter.buildExec currently places AgentRunSpec.prompt into argv
(violating the SessionExecSpec contract that prompt must be sent via stdin), so
change OpenCodeAdapter.buildExec to set the exec object's stdin to spec.prompt
(matching ClaudeCodeAdapter) and remove the prompt from the args array; update
the corresponding test in build-exec.test.ts to assert stdin contains the prompt
instead of it appearing in argv and ensure any helper/constructors that build
the SessionExecSpec still omit prompt from argv.
| RUN PLAYWRIGHT_BROWSERS_PATH=/opt/ms-playwright \ | ||
| /opt/agents/bin/mcp-server-playwright --help >/dev/null 2>&1 || true \ | ||
| && npx --prefix /opt/agents playwright install chromium 2>/dev/null \ | ||
| || node /opt/agents/lib/node_modules/@playwright/mcp/node_modules/.bin/playwright install chromium 2>/dev/null \ | ||
| || true |
There was a problem hiding this comment.
Silent Playwright install failure may leave image without Chromium.
The fallback chain terminates with || true, so the build succeeds even if all three install attempts fail. Consider adding a verification step or removing the final || true so the build fails if Chromium isn't installed.
💡 Suggested verification
# Chromium for Playwright MCP (system deps already apt-installed above).
RUN PLAYWRIGHT_BROWSERS_PATH=/opt/ms-playwright \
/opt/agents/bin/mcp-server-playwright --help >/dev/null 2>&1 || true \
&& npx --prefix /opt/agents playwright install chromium 2>/dev/null \
|| node /opt/agents/lib/node_modules/@playwright/mcp/node_modules/.bin/playwright install chromium 2>/dev/null \
- || true
+ || true \
+ && test -d /opt/ms-playwright/chromium-* \
+ && echo "Chromium installed successfully"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox-runtime/Dockerfile` around lines 81 - 85, The Dockerfile
currently swallows all Playwright install failures by ending the fallback chain
with "|| true", risking an image without Chromium; update the RUN block that
sets PLAYWRIGHT_BROWSERS_PATH and runs "/opt/agents/bin/mcp-server-playwright
--help" and the two "playwright install chromium" commands to remove the final
"|| true" and add a verification step (e.g., test for the Chromium binary under
PLAYWRIGHT_BROWSERS_PATH or run "npx --prefix /opt/agents playwright --version"
/ "playwright show-trace" equivalent) and exit non‑zero if verification fails so
the build fails when Chromium is not installed.
| - **Tier 2 — managed-entry credentials** (integration secrets — git tokens, | ||
| DB passwords, third-party API keys — that can't be transparently proxied): | ||
| enter the sandbox, but only through one managed pipeline — explicit | ||
| per-session grant (default empty) → broker fetch (never baked into | ||
| env/image/PodSpec) → audited (`sandboxCredentialAccess`) → revoked on | ||
| destroy (git creds fetched per-operation, so revocation is immediate). | ||
|
|
There was a problem hiding this comment.
Align Tier-2 git revocation docs with current implementation.
Line 49 currently claims per-operation git credential fetch with immediate revocation, but the helper implementation is env-backed in v1 and explicitly calls per-operation fetch a follow-up. Please update this section to avoid overstating current security guarantees.
📝 Proposed doc correction
- env/image/PodSpec) → audited (`sandboxCredentialAccess`) → revoked on
- destroy (git creds fetched per-operation, so revocation is immediate).
+ env/image/PodSpec) → audited (`sandboxCredentialAccess`) → revoked on
+ destroy (v1 git creds are env-backed in-session; per-operation fetch is a
+ documented follow-up).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Tier 2 — managed-entry credentials** (integration secrets — git tokens, | |
| DB passwords, third-party API keys — that can't be transparently proxied): | |
| enter the sandbox, but only through one managed pipeline — explicit | |
| per-session grant (default empty) → broker fetch (never baked into | |
| env/image/PodSpec) → audited (`sandboxCredentialAccess`) → revoked on | |
| destroy (git creds fetched per-operation, so revocation is immediate). | |
| - **Tier 2 — managed-entry credentials** (integration secrets — git tokens, | |
| DB passwords, third-party API keys — that can't be transparently proxied): | |
| enter the sandbox, but only through one managed pipeline — explicit | |
| per-session grant (default empty) → broker fetch (never baked into | |
| env/image/PodSpec) → audited (`sandboxCredentialAccess`) → revoked on | |
| destroy (v1 git creds are env-backed in-session; per-operation fetch is a | |
| documented follow-up). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/docs/sessions.md` around lines 44 - 50, Update the "Tier 2 —
managed-entry credentials" paragraph to remove the claim that git credentials
are fetched per-operation with immediate revocation; instead state the current
v1 helper is env-backed and that per-operation fetching is a planned follow-up,
and clarify that revocation timing depends on the env-backed lifecycle rather
than being immediate. Reference the "Tier 2 — managed-entry credentials" heading
and the audited event name "sandboxCredentialAccess" so reviewers can locate and
adjust the sentence that currently asserts per-operation fetch and immediate
revocation.
| const uid = Number(profile.user.split(':')[0] ?? '65534'); | ||
| const gid = Number(profile.user.split(':')[1] ?? '65534'); |
There was a problem hiding this comment.
Validate uid/gid extraction to prevent silent failures.
The uid/gid extraction from profile.user assumes a valid "uid:gid" format but doesn't validate the result. If profile.user is malformed:
"invalid"(no colon) → uid =Number("invalid")=NaN, causing chown to fail":"(empty parts) → uid = gid =Number("")=0(root), a potential security issue
The chown failure is caught and logged (line 60-64) but the session continues, potentially with incorrect permissions.
🛡️ Proposed validation in config.ts
Add validation when loading SANDBOX_AGENT_USER in config.ts (around line 191):
// In config.ts, after line 191:
const agentUser = process.env.SANDBOX_AGENT_USER ?? '10001:10001';
const userMatch = agentUser.match(/^(\d+):(\d+)$/);
if (!userMatch) {
throw new Error(
`SANDBOX_AGENT_USER must be in 'uid:gid' format; got: ${JSON.stringify(agentUser)}`,
);
}
const uid = Number(userMatch[1]);
const gid = Number(userMatch[2]);
if (uid < 1 || gid < 1) {
throw new Error(
`SANDBOX_AGENT_USER uid and gid must be >= 1 (non-root); got: ${agentUser}`,
);
}
// Then use: user: agentUserThis validates at config load time rather than at runtime, providing clearer feedback to operators.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/src/backend/docker/docker-session-backend.ts` around lines
52 - 53, Ensure uid/gid are validated instead of silently parsing profile.user:
add strict validation in config.ts when reading SANDBOX_AGENT_USER (use a
/^\d+:\d+$/ match, parse uid/gid, ensure both >= 1 and throw a clear Error if
invalid) and export the validated "user" string and numeric uid/gid; then update
docker-session-backend.ts to consume the validated values (replace the raw
Number(profile.user.split(...)) logic) and add a defensive check before calling
chown (verify Number.isInteger(uid) && Number.isInteger(gid) && uid >= 1 && gid
>= 1) to avoid NaN or root defaults in runtime.
| // Poll runnerd readiness via the Pod IP (which appears once scheduled). | ||
| try { | ||
| const endpoint = await this.waitForEndpoint( | ||
| spec.sessionId, | ||
| spec.createdAtMs + this.cfg.session.createHealthTimeoutMs, | ||
| ); | ||
| await waitForRunnerd( | ||
| { baseUrl: endpoint, token: this.tokenFor(spec.sessionId) }, | ||
| this.cfg.session.createHealthTimeoutMs, | ||
| ); | ||
| } catch (err) { | ||
| await this.destroySession(spec.sessionId); | ||
| throw err; | ||
| } |
There was a problem hiding this comment.
Timeout budget is not shared between waitForEndpoint and waitForRunnerd.
waitForEndpoint consumes part of the timeout budget waiting for Pod IP assignment, but waitForRunnerd receives the full createHealthTimeoutMs rather than the remaining time. If Pod scheduling takes 15s of a 30s budget, runnerd still gets 30s instead of 15s, potentially doubling the effective timeout.
Proposed fix: pass remaining budget to waitForRunnerd
// Poll runnerd readiness via the Pod IP (which appears once scheduled).
try {
+ const deadline = spec.createdAtMs + this.cfg.session.createHealthTimeoutMs;
const endpoint = await this.waitForEndpoint(
spec.sessionId,
- spec.createdAtMs + this.cfg.session.createHealthTimeoutMs,
+ deadline,
);
+ const remainingMs = Math.max(0, deadline - Date.now());
await waitForRunnerd(
{ baseUrl: endpoint, token: this.tokenFor(spec.sessionId) },
- this.cfg.session.createHealthTimeoutMs,
+ remainingMs,
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Poll runnerd readiness via the Pod IP (which appears once scheduled). | |
| try { | |
| const endpoint = await this.waitForEndpoint( | |
| spec.sessionId, | |
| spec.createdAtMs + this.cfg.session.createHealthTimeoutMs, | |
| ); | |
| await waitForRunnerd( | |
| { baseUrl: endpoint, token: this.tokenFor(spec.sessionId) }, | |
| this.cfg.session.createHealthTimeoutMs, | |
| ); | |
| } catch (err) { | |
| await this.destroySession(spec.sessionId); | |
| throw err; | |
| } | |
| // Poll runnerd readiness via the Pod IP (which appears once scheduled). | |
| try { | |
| const deadline = spec.createdAtMs + this.cfg.session.createHealthTimeoutMs; | |
| const endpoint = await this.waitForEndpoint( | |
| spec.sessionId, | |
| deadline, | |
| ); | |
| const remainingMs = Math.max(0, deadline - Date.now()); | |
| await waitForRunnerd( | |
| { baseUrl: endpoint, token: this.tokenFor(spec.sessionId) }, | |
| remainingMs, | |
| ); | |
| } catch (err) { | |
| await this.destroySession(spec.sessionId); | |
| throw err; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/src/backend/kubernetes/k8s-session-backend.ts` around lines
86 - 99, The timeout budget is not shared: change the flow after waitForEndpoint
to compute remainingMs = spec.createdAtMs +
this.cfg.session.createHealthTimeoutMs - Date.now(), throw a timeout error (or
immediately destroy and throw) if remainingMs <= 0, and pass remainingMs into
waitForRunnerd instead of this.cfg.session.createHealthTimeoutMs (keep using {
baseUrl: endpoint, token: this.tokenFor(spec.sessionId) } and the same error
handling around destroySession); use the waitForEndpoint and waitForRunnerd
symbols and spec.createdAtMs/cfg.session.createHealthTimeoutMs to locate the
code to update.
| async sweepExpired(nowMs: number = Date.now()): Promise<number> { | ||
| let reaped = 0; | ||
| for (const s of this.registry.list()) { | ||
| let expired = nowMs > s.expiresAtMs; | ||
| if (!expired) { | ||
| try { | ||
| const health = await runnerdHealth({ | ||
| baseUrl: s.endpoint, | ||
| token: this.tokenFor(s.sessionId), | ||
| }); | ||
| expired = nowMs - health.lastActivityAtMs > s.idleTimeoutMs; | ||
| } catch { | ||
| // runnerd unreachable — leave for a later sweep (a transient blip | ||
| // shouldn't reap a session; the TTL is the hard backstop). | ||
| } | ||
| } | ||
| if (expired) { | ||
| await this.backend.destroySession(s.sessionId).catch((err) => { | ||
| console.warn('[sandbox.session] sweep destroy failed:', err); | ||
| }); | ||
| this.registry.delete(s.sessionId); | ||
| reaped += 1; | ||
| } | ||
| } | ||
| return reaped; | ||
| } |
There was a problem hiding this comment.
Log runnerd health check errors in the sweep.
The catch block at lines 99-102 has only a comment but no logging, violating the coding guideline: "No empty catch blocks in JavaScript/TypeScript — log with console.warn/console.error or re-throw to avoid hiding bugs." While the comment explains the business logic for ignoring transient failures, logging the error would still aid debugging without changing the reaping behavior.
🔍 Proposed fix to add logging
try {
const health = await runnerdHealth({
baseUrl: s.endpoint,
token: this.tokenFor(s.sessionId),
});
expired = nowMs - health.lastActivityAtMs > s.idleTimeoutMs;
- } catch {
+ } catch (err) {
+ console.warn('[sandbox.session] sweep health check failed (ignoring):', err);
// runnerd unreachable — leave for a later sweep (a transient blip
// shouldn't reap a session; the TTL is the hard backstop).
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async sweepExpired(nowMs: number = Date.now()): Promise<number> { | |
| let reaped = 0; | |
| for (const s of this.registry.list()) { | |
| let expired = nowMs > s.expiresAtMs; | |
| if (!expired) { | |
| try { | |
| const health = await runnerdHealth({ | |
| baseUrl: s.endpoint, | |
| token: this.tokenFor(s.sessionId), | |
| }); | |
| expired = nowMs - health.lastActivityAtMs > s.idleTimeoutMs; | |
| } catch { | |
| // runnerd unreachable — leave for a later sweep (a transient blip | |
| // shouldn't reap a session; the TTL is the hard backstop). | |
| } | |
| } | |
| if (expired) { | |
| await this.backend.destroySession(s.sessionId).catch((err) => { | |
| console.warn('[sandbox.session] sweep destroy failed:', err); | |
| }); | |
| this.registry.delete(s.sessionId); | |
| reaped += 1; | |
| } | |
| } | |
| return reaped; | |
| } | |
| async sweepExpired(nowMs: number = Date.now()): Promise<number> { | |
| let reaped = 0; | |
| for (const s of this.registry.list()) { | |
| let expired = nowMs > s.expiresAtMs; | |
| if (!expired) { | |
| try { | |
| const health = await runnerdHealth({ | |
| baseUrl: s.endpoint, | |
| token: this.tokenFor(s.sessionId), | |
| }); | |
| expired = nowMs - health.lastActivityAtMs > s.idleTimeoutMs; | |
| } catch (err) { | |
| console.warn('[sandbox.session] sweep health check failed (ignoring):', err); | |
| // runnerd unreachable — leave for a later sweep (a transient blip | |
| // shouldn't reap a session; the TTL is the hard backstop). | |
| } | |
| } | |
| if (expired) { | |
| await this.backend.destroySession(s.sessionId).catch((err) => { | |
| console.warn('[sandbox.session] sweep destroy failed:', err); | |
| }); | |
| this.registry.delete(s.sessionId); | |
| reaped += 1; | |
| } | |
| } | |
| return reaped; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/src/session/session-routes.ts` around lines 88 - 113, The
catch in sweepExpired (inside services/sandbox/src/session/session-routes.ts)
swallows runnerdHealth errors—update the catch to log the error (e.g.,
console.warn or console.error) including context like the session id and
endpoint returned from registry.list() and the error object, but keep the
current behavior of not re-throwing so reaping is unchanged; reference the
tokenFor, runnerdHealth, registry.list, and backend.destroySession usages to
locate the block to modify.
Source: Coding guidelines
| @@ -0,0 +1,230 @@ | |||
| 'use node'; | |||
eb69142 to
d58751e
Compare
Entry-agnostic adapters that turn a normalized agent-run request into a sandbox session exec (argv/env/stdin) and parse each coding agent's native stdout stream into one AgentEvent union. Covers Claude Code (stream-json, usage deduped by message id) and OpenCode (run --format json); generalizes to Codex/Gemini via the registry. Zero runtime deps; consumed by the platform, never by the sandbox service. Registers the package + the runnerd daemon as workspaces.
…ay infra) Turn the one-shot sandbox into a persistent 'remote computer' for coding agents. New /v1/sessions API (create/get/list/destroy/exec(SSE)/cancel/attach/ env/files) proxied to runnerd, a small control daemon that is PID 1 of a long-lived session container (single-file runnerd.mjs, run by the image's Node 24). Docker + exec-free Kubernetes session backends; the registry is a cache, the backend objects + runnerd's activity clock are the source of truth (boot re-adopt + TTL/idle sweep). Single runtime image extended (single-image decision): pinned Claude Code, OpenCode, gh, Playwright+Chromium+@playwright/mcp, git/rg/fd, plus an agent user (uid 10001) selected per profile so bypassPermissions is allowed; one-shot path keeps uid 65534 and is byte-identical (docker-args snapshot unchanged). egress allowlist gains github.com/api.github.com; a Bifrost LLM-gateway compose service is the only path from an in-sandbox agent to an LLM. The /v1/execute one-shot path is untouched; SSE + HMAC auth helpers are shared.
Platform-side glue for persistent sandbox sessions. New tables (sandboxSessions with a polymorphic owner driving lifecycle/access/quota, sandboxSessionTokens, sandboxSessionOps progress rows, sandboxCredentialAccess audit) + session mutations (per-owner/per-org reserve, watchdog, token + progress upsert). node_only helpers: session_client (HMAC platform->spawner client), bifrost_admin (governance API: mint/revoke session virtual key, usage pull, provider provision), session_credentials (Tier-2 broker: decrypt explicit integration grants -> session env, audited), run_agent (progress bridge: buildExec -> sessionExec -> @tale/agent-adapters parser -> throttled sandboxSessionOps so any entry point renders live progress). usageLedger stays the single billing truth; Bifrost spend is the hard cutoff + sync source. Node-runtime files carry 'use node'; api.d.ts regenerated.
Adds a new `primaryBehavior: 'external-agent'` that routes a whole chat turn
to a coding agent (Claude Code or OpenCode) running in a sandbox session,
instead of the platform's own tool loop — the thread IS the session, so the
user chats with the agent directly and follow-ups --resume the same session.
Mirrors the image-generation direct-action precedent (start_agent_chat branch
-> scheduled action -> saveMessage back into the thread).
- schema: 'external-agent' primaryBehavior + agentKind ('claude-code' |
'opencode'); tool-loop fields rejected for it (lib/shared/schemas/agents.ts),
threaded through SerializableAgentConfig / AgentJsonConfig / the
runAgentGeneration arg validator.
- run_external_agent: per-thread session reuse (owner = thread), per-turn
Bifrost virtual key (minted + revoked), --resume via captured agent session
id, final reply saved to the thread; live tool-use streams to
sandboxSessionOps via run_agent.
- session_queries + session_teardown; thread-delete cascade destroys the
thread's sessions (cascade_helpers.ts).
- two built-in agents: Claude Code + OpenCode (engine = which agent you pick;
model from the model picker). Distinct names avoid the existing 'Coder'.
Engine is fixed per agent (single agentKind), so no new composer control.
New trilingual page under platform/agents covering the two built-in external agents — how to use them, the locked-down sandbox (egress allowlist, gateway LLM access, scoped GitHub credentials), engine-vs-model selection, and cost. Adds the nav entry + regenerated search-index frontmatter manifest.
- platform/web/docs Dockerfiles: COPY the new packages/agent_adapters and services/sandbox-runtime/daemon manifests (bun install fails on any workspace listed in the root package.json but absent from the context); re-include daemon/package.json past the services/sandbox-runtime/ dockerignore excludes - sandbox-runtime: opencode release asset is .tar.gz (the .zip 404s); groupadd/useradd need absolute /usr/sbin paths (the ENV PATH override drops sbin); bump Claude Code 1.0.44 -> 2.1.173 (1.0.x rejects the adapter's --include-partial-messages and exits before the agent starts) - compose.yml: the bifrost tag needs the v prefix (1.4.8 does not exist on Docker Hub); healthcheck via busybox wget (the image ships no curl, so the service sat permanently unhealthy) - compose.bifrost-dev.yml: loopback publish of :8080 for host bun-dev convex to reach the management API - container-sandbox-runtime-test.sh: mount a tmpfs /workspace mirroring the spawner contract (baked dir is owned by 65534; agent-uid HOME checks and the runnerd boot probe fail on the bare image)
sweepHostSessionDirs reaped ANY dir in the session root past the one-shot watchdog cutoff (~10 min mtime), including persistent ses-* agent-session workspaces — the 5-min periodic sweep deleted a running session's bind-mount source, wedging the container (every new exec then fails spawn). session-naming.ts documented the ses- prefix skip as the contract, but the sweep never implemented it. Session workspace lifecycle stays with destroySession + the session TTL/idle sweep.
Fixes found by a live chat -> Claude Code -> GitHub PR run (produced #1875): - bifrost_admin: rewrite the VK mint body for the actual v1.4.8 governance API (name required, provider_configs is a per-provider array, budget is singular and rejects reset_duration 'never', team_id/customer_id are mutually exclusive FKs — attribution moves into the key name, response is {virtual_key:{id,value}}) - toGatewayModelRef: Bifrost routes provider/model and rejects Tale's colon-qualified refs; upstreams don't understand the @quant qualifier either — translate at the gateway boundary only - run_external_agent: split EXTERNAL_AGENT_GATEWAY_URL (gateway as seen from inside the session container) from BIFROST_URL (management plane as seen from convex) so host bun-dev convex doesn't leak a host-only URL into the sandbox; wire the until-now-uncalled Tier-2 credential broker (github grant per turn via sessionEnvPatch) so the agent can push branches and open PRs - run_agent: expose runAgentInSessionImpl for direct same-process calls — a ctx.runAction hop is killed at ~300s in self-hosted Convex (the parent's finally then revokes the VK under the still-running agent); forward systemPromptAppend, which previously threw ArgumentValidationError for any agent with instructions - claude-code.json: add openrouter:deepseek/deepseek-v4-flash to supportedModels (verified working through the /anthropic translation path)
…& resume scoping Closes the backend gaps the 2026-06-11 E2E surfaced (verified live on bifrost v1.4.8): - Provider provisioning: rewrite provisionProviders for the real v1.4.8 API (POST create-if-absent; native-provider names only, no base_url override — overriding it 500s the native openrouter route); call it once per session create from the org's own provider files (loadOrgGatewayProviders). - Gateway auth (was unenforced — v1.4.8 ignores the BIFROST_* env vars): applyGatewayConfig pushes client.enforce_auth_on_inference=true (inference now requires a minted VK) and, when BIFROST_ADMIN_PASSWORD is set, auth_config admin Basic-auth over /api/* (management plane no longer anonymous). managementHeaders sends Basic, not the dead bearer. - Model refs: toGatewayModelRef also strips the @quant qualifier the upstream rejects. - Usage attribution: read per-turn spend from the VK budget (v1.4.8 has no /usage endpoint; the agent stream reports 0 tokens through non-Anthropic gateways) with a short bounded poll for the async budget, and write one usageLedger row (agent/model/provider granular). No cron. - Resume scoping: latestAgentSessionId takes sinceStartedAt so a destroyed+recreated session (deterministic id) can't resume a prior session's conversation; teardown also deletes its ops. - gatewayBaseUrl split from management BIFROST_URL; systemPromptAppend now forwarded; runAgentInSessionImpl direct-call avoids the ~300s action-RPC cap.
External-agent (Claude Code / OpenCode) turns showed only a 'Thinking…' spinner until the final reply, even though run_agent streams tool-use/text into sandboxSessionOps every 500ms. Now the chat subscribes to that progress and renders a live timeline.
- New public query sandbox.session_queries_public.getActiveSessionOp({threadId}) (org/thread RLS via canAccessThread) exposes the latest in-flight agent-run op.
- useSessionProgress hook + ExternalAgentLiveTimeline render it in place of the bare ThinkingIndicator while status==='running'; the saved message takes over on completion. Normal chat is untouched (null when there's no sandbox session).
- buildExternalAgentParts maps the op's recentEvents (AgentEvents) to the existing thought-timeline parts shape, reusing ToolStepRow/InlineReasoning — no new row components.
- format-tool-detail learns the Claude Code tools (Bash/Read/Edit/Grep/…) so each row shows the command/path, and ToolStepRow gains an expander revealing the full input + output (clamped) so you see what each Bash actually ran and returned.
External-agent sandboxes were thread-owned (sessionId thr-<threadId>, one per chat) with a per-org cap of 2, so every new chat spun up a fresh sandbox and two chats exhausted the org (the recurring '429 org session cap reached'). Switch to one persistent sandbox per user PER ORG, entered from any of their threads.
- Ownership: run_external_agent owns the session by ('user', '<org>:<user>') with sessionId usr-<userId-prefix>-<fnv(org:user)> — org-scoped so the same user in another org gets an isolated sandbox (verified: org-A workspace is never reachable from org B). Falls back to thread ownership only when no userId is present. sessionIdForUser/userOwnerId live in session_naming.ts.
- Shared workspace: all of a user's threads share one /workspace (a repo cloned / deps installed in one chat are visible in the next — a real personal dev box).
- Per-thread conversation: each thread keeps its OWN Claude Code CLI session. sandboxSessionOps gains a threadId (+ by_threadId index); latestAgentSessionId resumes by threadId (still bounded by the session's createdAt), so thread A and B are independent conversations inside the one container.
- Live-progress query (getActiveSessionOp) now keys off threadId, decoupled from session naming.
- Cascade: thread-delete prunes that thread's ops (deleteOpsForThread) but no longer tears down the shared sandbox; legacy thread-owned sessions are still destroyed for back-compat; the user sandbox idle/TTL-reaps on its own.
- Caps: per-owner stays 1 (one sandbox per user); raise the per-org cap (convex 4->50, spawner default 2->50) so the host-RAM-bound SANDBOX_MAX_SESSIONS + idle reaping are the real limit, not an artificial per-org number.
No data migration: old thr-* sessions stop being reused and idle-reap; ops without threadId are ignored by the threadId-filtered resume. Verified live: two chats as one user share /workspace and reuse ONE sandbox/container with no 429; unit tests cover threadId-scoped resume and org-isolated session ids.
…istory Completed external-agent turns kept only their plain finalText, so the tool-call timeline (which commands ran + their output) was lost once the turn ended — the live view reads a capped (20) rolling op buffer that the UI only renders while running. Now the turn is saved as a proper assistant message whose content carries the full timeline as AI-SDK parts: run_agent accumulates the complete ordered event list (uncapped, alongside the live buffer) and buildAssistantContent maps it to text + tool-call + tool-result parts (paired by toolUseId; outputs clamped to 16 KB so a big clone/diff can't blow the 1 MB doc cap). @convex-dev/agent's listUIMessages/toUIMessages reconstructs the tool-<name> UI parts the existing history renderer already shows — so a reloaded thread shows every Bash/Read with its command + output, expandable, with no new rendering path. Agent text blocks map to text parts (the message body), NOT reasoning, and finalText is appended only if the stream didn't already emit it — fixing the duplicate where the answer showed both in a 'Thinking' block and as the reply. Trivial (tool-less) turns stay a plain string. Verified live: after a full page reload, completed turns show their 3-tool timeline; expanding a step shows the command + its output; the answer renders once.
bun was only in the daemon-build stage (to bundle runnerd); the final runtime image had no bun on PATH, so an in-sandbox coding agent couldn't run bun install/test/scripts or bunx — a real gap for bun-based projects (Tale itself uses bun.lock/bunfig.toml). Copy the bun binary from the already-pulled oven/bun image (Debian/glibc, matches the base) into /usr/local/bin + symlink bunx; add a conformance assertion. Verified: bun 1.3.14 + bunx resolve in the built image alongside Node 24.
npm/pip caches live on shared per-org disk volumes (/cache/npm, /cache/pip) reused across sessions, but bun had no managed cache — with BUN_INSTALL_CACHE_DIR unset it fell to ~/.bun inside the per-user workspace (unmanaged, not shared, grows the workspace). Add a /cache/bun volume alongside pip/npm: bunCacheVolumeName + ensureCacheVolume, the BUN_INSTALL_CACHE_DIR=/cache/bun env, and the mount in the session args. Verified earlier that bun honors HOME/TMPDIR and does NOT touch /tmp during install (the full Tale monorepo, 1928 pkgs, installed cleanly on the 512M tmpfs); this is the consistency/cache-reuse fix, not a correctness one. Verified live: a fresh session container carries BUN_INSTALL_CACHE_DIR=/cache/bun + the mount.
…files The per-org pip/npm/bun cache volume is shared by both session profiles — the one-shot default profile (uid 65534) and the agent session profile (uid 10001). ensureCacheVolume chowned the volume root to 65534:65534, which locked the agent profile out: 'bun install' as uid 10001 failed with 'bun is unable to write files to tempdir: AccessDenied' (bun stages its install tempdir on the cache filesystem), and npm/pip caching silently degraded the same way. Set the volume root to 1777 (sticky world-writable, like /tmp) instead of chowning to a single uid, so every same-org sandbox uid can write its own cache entries. The per-org volume is the isolation boundary (R2.3), so intra-org world-write is acceptable.
The egress proxy no longer ships a 7-host default allowlist. With SANDBOX_EGRESS_ALLOWLIST unset or empty, tinyproxy renders no Filter directive at all and tunnels CONNECT :443 to any public host — tinyproxy 1.11.x treats an empty filter file with FilterDefaultDeny as deny-all and exits on a missing one, so omitting the directive is the only clean open configuration. Setting the variable still renders the same default-deny hostname filter as before; the 7-host registry set lives on as the documented lockdown example in ensure-env and the hardening doc. Unchanged and now load-bearing in open mode: the IP-layer SSRF firewall (IMDS + link-local + RFC1918 REJECT), the --internal sandbox network (the proxy stays the runtime containers' only outbound path), ConnectPort 443, and the Bifrost NO_PROXY path (LLM metering unaffected). The template header must not spell the placeholder in dollar-brace form: envsubst expands it inside comments too, which duplicated the filter block as bare directives. Docs (en/de/fr): default-open story across hardening, overview, container architecture, troubleshooting, compose reference, contributing, run-code policy, and skills — also fixing the long-standing false claim that the egress allowlist is configured in the Governance UI. kubernetes.md now states explicitly that the operator NetworkPolicy is the only egress enforcement on k8s. Verified live: open mode reaches arbitrary HTTPS (200) while IMDS/RFC1918 probes are rejected and non-443 CONNECT gets 403; allowlist mode still 403-filters non-listed hosts.
…to the chat message Stage 1 of decoupling external-agent turns from a single HTTP connection. Create a streaming 'pending' assistant message before the run and patch it on every progress flush (onTimeline) with the timeline-so-far, then finalize to success/failed. The persisted message — not the ephemeral, capped sandboxSessionOps buffer — becomes the durable record, so a cancelled/timed-out/ disconnected turn keeps its full tool history instead of being clobbered by a bare error string. The catch path now appends the failure reason to the partial timeline rather than replacing it. Unifies buildAssistantContent + the onTimeline/result types on the agent component's Infer<vAssistantContent> (the type saveMessage/updateMessage/ listUIMessages speak) so the chain typechecks without casts.
…exec Stage 2. Clicking Stop scheduled cancelExecutionsForThread, which scans only the one-shot sandboxExecutions table — the external-agent path writes sandboxSessionOps, so Stop never reached the in-sandbox agent process (it only died as a side effect of the caller disconnecting at the 660s timeout). Add sessionCancelExec (POST /v1/sessions/:id/exec/:execId/cancel) + a listRunningOpsByThread query + a cancelSessionExecsForThread action, and cascade it from cancelGeneration alongside the one-shot cancel. The cancel makes runnerd emit a terminal 'cancelled' event, so the run's own finalize persists the partial timeline (Stage 1) and marks the message failed — losslessly.
Stage 3. Thread an explicit, env-configurable per-turn timeout (EXTERNAL_AGENT_TURN_TIMEOUT_MS, default 25min) from run_external_agent through runAgentInSessionImpl into the exec body, so the sandbox (runnerd) exec deadline and the caller's SSE fetch share ONE budget instead of the old hardcoded 660s. The fetch gets that budget + a 60s grace so it receives the terminal result rather than aborting first. Bounded by the 30min Convex action ceiling while a turn is one held connection; the cross-action continuation (later stage) lifts it toward the sandbox execMaxTimeoutMs (2h).
…e Anthropic endpoints External-agent (Claude Code) turns route through the Bifrost gateway, which previously skipped any provider not in a hardcoded native allowlist — so a direct custom provider (e.g. DeepSeek) failed to mint a key. Now any provider is provisioned: standard Bifrost names keep native dispatch; everything else becomes a custom upstream. Add a provider/model `apiFormat` field (wire format: openai | anthropic, default openai). `anthropic` provisions the custom provider as base_provider_type anthropic against its `/anthropic` endpoint, which preserves Claude Code's web_search server tool — the OpenAI-translated path dropped it via Bifrost's Responses→Chat fallback. Verified live: the /anthropic gateway call returns server_tool_use + web_search_tool_result (DeepSeek native web search fires). - Per-model gateway routing: each custom model's effective (baseUrl, apiFormat, key) is its own Bifrost upstream (`<slug>__<model>`), via one resolveGatewayRouting shared by the adapter, mint, and provisioner — so model-level overrides finally route on the agent path. - base_provider_type is immutable in Bifrost; ensureProviderConfig deletes + recreates a record when it must change (openai↔anthropic), and now surfaces Bifrost response bodies in errors. - UI: provider edit panel gains an API format selector (default openai), validated against standard-named providers. Chat path is scaffolded only — apiFormat is plumbed with a documented seam for a future createAnthropic; anthropic providers are agent-path-only for now (fall back to the OpenAI-compatible client in chat).
…ages
Follow-ups sent while an external-agent turn is generating ("queue mode")
bypassed the normal send path (enqueueMessage + early return), so they set
no scrollIntentRef and rendered no optimistic bubble. Result: the message
appeared only after a server round-trip via the subscription, and the
viewport never auto-scrolled to show it.
Route the queue send through the same optimistic-message mechanism every
normal send uses: capture the reconciliation baseline, set scrollIntentRef
to 'smooth' immediately before setPendingMessage (no await between) so the
bubble mounts in the same commit and the scroll machine's MutationObserver
snaps it to the viewport top, then roll the bubble back on QUEUE_FULL /
failure. usePendingMessages reconciles on the last-message key (not role),
so the optimistic bubble swaps cleanly for the real message even with a
streaming-assistant baseline mid-generation.
- exec-manager: finalize execs on 'close' with a bounded grace fallback armed on 'exit', plus a settled-guard on the stdout/stderr handlers, so a trailing chunk can't emit after the terminal exit event or mutate the replay ring. - runnerd-client: cancel no longer swallows transport errors (the route's evict->404/502 path was dead code); AbortSignal.timeout on the short RPCs and a baked-in 5s health-probe timeout; cap the NDJSON accumulator at 1MB. - session-routes: roll back the backend session when resolveEndpoint throws on create (was a silent leak); source SessionInfo.state from the registry rather than a hardcoded 'ready'; handleDestroy now deletes-before-await and destroys the backend exactly once. - listSandboxesForOrg: catch UnauthorizedError -> null for non-members, matching the query's documented contract.
getThreadSandboxState exposes a thread's live sandbox lifecycle (creating/active/degraded/stopped + pinned) under the same canAccessThread RLS as the message queries. useThreadSandboxState feeds a SandboxStateIndicator badge in the composer, composed with the live op (useSessionProgress) to show Working / Sleeping / Starting / Recovering / Ready / Always-on. en/de/fr added.
oxlint --type-aware flagged three no-unsafe-type-assertion errors:
- agent_message_parts.extractResultText: narrow each block via `in` + typeof
(map param annotated `unknown` so the chain narrows from object, not any)
instead of `b as { text }`.
- thought-step-types.subAgentSteps: type the `out.steps` assertion as
SubAgentStep[] at the object boundary so Array.isArray narrows it directly,
dropping the any-sourced `steps as SubAgentStep[]`.
…ompact
The composer's sandbox pill was a grey outline chip lost among the other
grey toolbar controls, and in the warm state it read "Ready" — never
conveying that the thread runs in a sandbox VM. Reworked per cross-product
convention (VS Code/Codespaces/Gitpod):
- Identity-first: the chip always reads "Sandbox"; the lifecycle rides on
color + icon + tooltip (state word kept in the accessible name), so it
always conveys "runs in a VM" without widening the label — fixes the
crowded-toolbar truncation ("Sandbox · W…").
- Filled semantic color instead of a ghost outline so it stops blending
into the adjacent pickers; three calm buckets — green = alive
(working/ready), slate = dormant (sleeping / not-yet-provisioned, a
normal recoverable state, not an alarming red/yellow), orange =
transitional (starting/recovering). Only "working" animates.
Adds a filled "slate" Badge variant for the dormant state.
External agents (Claude Code / OpenCode) run in a sandbox VM. Show a muted "Sandbox" marker on those rows in the agent picker so the user can tell which agents are sandboxed before selecting one. Reuses the existing per-row labelBadge slot; the missing-integration hint keeps priority.
The OpenCode external-agent UX isn't ready to put in front of users yet, so remove it from the user-facing surfaces while keeping the runtime support in the image and code for later: - Delete the opencode.json example agent so fresh deployments don't surface it. - Rework the external-agent page (en/de/fr) to document Claude Code as the single built-in external agent, rather than just deleting the word. - Drop OpenCode from the tale-daemon local-CLI list (en/de/fr), leaving Claude Code and Codex; the daemon still drives the opencode CLI if configured. - Regenerate the docs frontmatter manifest. Docs test suite (structural parity, terminology, prose, loanword) passes.
Four checks were red on PR #1872; each is addressed: Bun audit + Trivy fs scan — a newly-published advisory (GHSA-gv7w-rqvm-qjhr, HIGH) flags esbuild <0.28.1. It reaches the tree only transitively (convex pins 0.27.0, tsx ~0.27.0) and is fixable, so per the repo security policy it's a bump, not a suppression. Pin esbuild 0.28.1 via the existing root `overrides` block (same mechanism already used for axios/ws/follow-redirects); this dedupes the two esbuild trees into one. Repo-wide — also unblocks main's next security run. Verified convex's esbuild bundle API still works under 0.28.1. Unit (platform): - session_client.test.ts imported `bun:test`, which the vitest server project can't resolve -> switch to `vitest` (APIs are identical). - start_agent_chat.test.ts: the new settleQueueOnTurnEnd sweep calls `.withIndex(...).collect()`, but the hand-rolled db mock only had `.first`. Add `collect` (-> []) and `delete` to the mock. - i18n orphan: chat.sandbox.ariaLabel was added to en/de/fr, but the pill builds its accessible name from sandbox.label + state, so the key is dead. Remove it from all three locales. Knip: - @tale/agent-adapters is imported by platform convex node modules but was not declared -> add it to services/platform dependencies. - Unused exports in non-mirror impl files (sse, validate-session, session-registry, runnerd-client, file-ops) are only used in-file -> drop the `export`. Delete the dead sandboxSessionExecSseEventLiterals (no consumer, no convex-side mirror counterpart). - Exclude the runnerd-protocol.ts <-> daemon/protocol.ts wire-contract mirrors from the dead-export sweep: each side consumes a different subset of the shared contract, so the members unused on one side are used on the other and must stay in sync. Verified locally: bun audit, knip, format:check, turbo typecheck, turbo lint, full platform suite (72,379), sandbox 241, daemon 42, agent_adapters 24.
Mid-turn stdin steering went silent again for a sub-case the prior quiet-idle work missed: an agent that launches a background workflow and then calls TaskOutput(block=true) to wait is parked inside an in-flight main-level tool. Neither quiet-idle trigger fired — Trigger #1 requires inflightToolUses === 0 (the in-flight TaskOutput keeps it at 1) plus a text last-event, and Trigger #2 requires an inflight Task sub-agent. So the exec was never flagged idle, the queued message took the file/hook path, and no tool/Stop boundary fires during the block — it landed only when the block timed out (minutes), never the prompt ~5s over stdin. Add a third deliverable-idle posture: a background task pending while the only main-level tools in flight are blocking task-reads (TaskOutput), tracked in inflightWaitTools (mirrors inflightSubAgents). Also drop lastMainEventWasText from the background-task trigger — with inflightToolUses === 0 the last main event is necessarily a completed text OR tool-result, so the flag only excluded the tool-result case (the TaskOutput block=false blind spot); it stays for the stdin-confirm path. The three postures move into a pure quietIdleDecision helper so this logic — a delivery blind spot for the third time — gets direct unit coverage. Platform-only (hot-pushed; no image rebuild), but a session container still needs the existing /stdin daemon + stream-json adapter for stdin delivery; old containers fall back to file and stall through the block.
…ed turns External-agent turns falsely failed on long tasks, and the system could not run an agent past ~24h: platform infra was treated as the owner of the agent's life. A draining Convex action that died (hard 600s action-ceiling kill / crash) was declared "abandoned" by the recovery watchdog, which marked the message failed, revoked the VK, and cancelled the exec — killing a still-working agent. Invert ownership: the sandbox exec (runnerd) is the source of truth; the platform is a pure I/O conduit that mirrors the stream and relays input, and NEVER proactively terminates the agent. Turns are now duration-independent (a week, a month) — bounded only by the agent finishing, a user Stop, or a genuine-orphan reap. runnerd (daemon): - Exec deadline is a SLIDING window re-armed on every attach (extendDeadline) — an actively-drained exec runs unbounded; only a genuinely orphaned one (no attach for the window) is reaped. Detach-grace removed (exec survives a dropped platform connection). GET /execs/:id returns running / exited(+exitCode) / gone. spawner: proxy the exec-status route (runnerdExecStatus + handleExecStatus). platform: - Recovery watchdog flipped destructive -> restorative: probe the exec; running -> re-attach/resume (claimRecoveryResume single-claimant gate), never cancel/fail/revoke a live exec; exited/gone -> finalize with the agent's real outcome + a real metadata.error (no more blank "Something went wrong"). - Handoff window 25min -> 480s (under the measured 600s local action ceiling); exec deadline 1h sliding (re-armed each handoff). consumeExecSse idle-read timeout re-attaches a wedged socket without spending the reconnect budget. - Session reaper skips a session with a running turn + refreshes lifetime each seam; MAX_CONTINUATIONS cap removed; budget over-limit never kills (the VK's gateway cap 402s instead). Checkpoint-less resume; bounded cross-seam maps. Verified: tsc + type-aware lint clean; 132 platform + 24 daemon + 5 new unit tests green; live: status endpoint + sliding-deadline reap (4s) + loop-invariant soak (10 handoff cycles, 60/60 ticks lossless, exec survived 61s under an 8s deadline via re-arm-on-attach).
The Base URL cell used an inline <span> with `truncate`, but overflow:hidden and max-width have no effect on inline elements — only white-space:nowrap applied, so long URLs overflowed the fixed-width cell into the neighboring Models column. Add `block` so truncate clips to the cell with an ellipsis, matching the canonical createTextColumn pattern.
A review of the agent-sessions work flagged 14 issues. Verification recalibrated the severities (0 critical, 0 high) but confirmed five with a real kernel; fix them as one pass. C2 (orphaned VK): destroySandbox / teardownThreadSessions marked the session tokens revoked but never deleted the live Bifrost VK. A destroy racing a live turn deletes the op row the per-turn finalize + recovery watchdog key on, so the VK was left a spendable credential. revokeTokensForSession now returns the bifrostKeyIds it revoked, and both 'use node' teardown actions DELETE each via revokeVirtualKey. C4 (PVC leak): the K8s session Secret create sat outside the cleanupFailedCreate envelope, so a Secret failure leaked the just-created (owner-reference-less) workspace PVC. Wrap it in the same try/catch as the Pod create. The backend constructor now takes an injectable client (mirrors KubernetesBackend) for test. C1 (dead watchdog): recoverStuckSessions had no caller, so a leaked session row never aged out of the per-org active-session cap. Wire it into crons.ts (every 5 min); it already exempts stopped rows and rows with a running agent-run op. C5 (double steer): deliverViaStdin built its stdin payload from [...fileRows, ...queuedRows] with no dedup, so a row that flipped queued->delivered between the two non-atomic queue reads was injected twice. Dedup queuedRows against fileRows by messageId. H8 (contradictory result): a process that raced the exec deadline but exited 0 reported status:'completed' alongside errorCode:'TIMEOUT'. Gate the TIMEOUT marker on exitCode !== 0 at both emit sites. Tests: VK-revoke contract + running-op exemption (session_lifecycle), PVC cleanup on Secret failure (new k8s-session-backend.test), clean-exit-vs-deadline (session-routes). tsc + type-aware lint clean; platform + sandbox suites green.
Reasoning now collapses into the single thought header — a chevron reveals all reasoning blocks in order — and the inline timeline shows only action rows (tools, routing). Previously each reasoning segment rendered its own "Thinking" collapsible, so a reasoning-only turn showed two stacked brain icons and a tool turn showed several repeated "Thinking" rows. - ThoughtHeader gains a reserved-width leading slot: a chevron when expandable, else a same-width invisible spacer. The gap-shell ThinkingIndicator is never expandable, so the brain/label keep the same x-offset across the gap->bubble handoff — zero jitter. - MessageThoughtHeader owns reasoning: collapsed-by-default, sticky, reveals all blocks (redacted ones show the neutral note) via the shared ReasoningStepRow + STEP_INDENT. - message-bubble passes reasoningSteps + headerOwnsReasoning; message-segments skips inline reasoning when the header owns it; external-agent-live-timeline mirrors it so the live and saved views match. Also removes the "Still working" stall-silence affordance: drops use-stalled-silence, WORKING_STALL_MS, the `working` ThoughtActivity variant, the lastEventAt threading, and the thoughtProcess.working strings (en/de/fr).
…ning) Adversarially verified all 34 review findings against the code; this lands the one genuine bug plus the cheap, high-value hardening the review surfaced. The remaining findings were false positives or out-of-scope deferrals. - exec-manager: writeStdin refuses on a non-writable held-stdin pipe (new isStdinWritable guard) and the pipe error handler nulls the stream, so a write to a dead-but-still-live child no longer reports ok:true. The platform now falls back to file staging instead of silently dropping a steer message (the real bug). + regression test for the guard. - exec-manager: error listeners on child.stdout/stderr so a pipe error can't crash the multi-session daemon. - main: explicit server.requestTimeout/headersTimeout (request-receipt only; long exec streams unaffected) to bound slow clients. - protocol: drop the dead RUNNERD_DETACH_GRACE_MS export. - docker-session-backend: log the two failed-create cleanup catches instead of swallowing them. - bifrost_admin: scrub upstream response bodies via sanitizeError before they enter errors/logs; log before degrading to null/[] on a non-OK governance/keys read. - session_mutations: org-scope setSessionPinned like its siblings (+ caller + cross-org isolation test). - agent_adapters: log dropped unparseable NDJSON lines in both parsers; correct the stale jsonl doc comment.
A persistent agent session went permanently dark ("Thinking" forever,
queued messages unanswered) once its cumulative stdout crossed the 5MB
SANDBOX_STDOUT_MAX_BYTES cap: runnerd silently dropped every further
stdout frame for the exec's life while the agent kept running and
finishing. Two cumulative accumulators on the output-byte axis caused it
— runnerd's truncation cap and the spawner's stdoutChunks buffer — both
incidentally bounded by the 5MB cap. This is the same monotonic-ceiling
class the unbounded-turn refactor removed on the time axis, missed on the
output-byte axis.
Add an explicit collectOutput flag (default true). The agent run passes
false: its output is consumed live off the SSE (the collected result
buffer is unused for it), so the spawner skips accumulation and tells
runnerd the cap is unlimited (0). One-shot session commands keep
collecting under the 5MB cap.
- daemon: stdoutMaxBytes/stderrMaxBytes <= 0 means unlimited (the 256KB
ring still bounds memory); add a per-exec one-time truncation warn so
the one-shot path's truncation is no longer silent.
- daemon: bound a slow/stalled consumer's response buffer
(RUNNERD_CONSUMER_BUFFER_MAX_BYTES) — it reconnects via /attach and
replays from the ring. Restores the only memory bound the cap gave.
- spawner: collectOutput=false skips stdout/stderr accumulation and
sends stdoutMaxBytes/stderrMaxBytes 0 to runnerd.
- platform: the agent exec sends collectOutput=false.
Verified on rebuilt images: an 8MB stdout exec with cap 0 streams all
8,000,000 bytes untruncated; the 5MB control still truncates.
Verified the CodeRabbit/CodeQL review on PR #1872 against current code (the branch had advanced well past the reviewed commit) and applied the findings that were still valid. compose: - bifrost is the only external-image service, so the repo-wide pull_policy: build default is invalid for it; default to `missing`. packages/agent_adapters: - validate the buildExec test fixture with `satisfies` instead of an annotation; extract the doubled str(message?.model) call. - opencode parser: a tool_use part with no id/callID was normalized under an empty toolUseId, corrupting tool-use/tool-result pairing — forward it as a raw event instead. Regression test added. services/sandbox-runtime/daemon: - env-store seed now honours the same entry/size caps as patch(). - exec output cap was checked before adding the chunk, so a chunk crossing the cap was emitted in full; clip it to the remaining budget so total output never overshoots. Regression test added. - drop the `as RunnerdExecEvent` ring-line cast for a discriminator type guard, and the `as readonly string[]` denylist cast for `.some`. - stream URL fetches with a running byte cap (Content-Length precheck + reader cancel) instead of buffering the whole body before the size check. - shape-validate POST /env bodies before envStore.patch; 400 on malformed input. - log the previously-empty kill / stdin-end catch blocks. services/platform/convex (sandbox): - narrow runAgentInSession's status return validator (and the result type) to the four statuses it actually returns; drop the redundant `as AgentSlug` cast. - credential broker: a decrypt failure on one grant no longer aborts the whole session create — warn and skip, like a missing credential. - add the missing Equal<> parity guard for the session-profile literals. - comment the intentional silence in the SSE parseData catch. services/sandbox (spawner): - validate SANDBOX_AGENT_USER as uid:gid >= 1 at config load and guard the workspace chown, so a malformed value can't land on root. - k8s create: share one health-check deadline between waitForEndpoint and waitForRunnerd so a slow scheduler can't double-spend the budget. - log sweep health-probe failures; correct the Tier-2 doc to describe the env-backed v1 git-credential lifecycle (not per-op revocation). Deliberately not changed: - CodeQL "unknown directive 'use node'": required by Convex for Node runtime modules; removing it would break deployment. - CodeQL "useless conditional !drainActive": a reachable re-check after an await (the drain can resolve mid-await), not dead code. - OpenCode prompt as a positional arg: the `opencode run` CLI has no stdin prompt source, unlike Claude Code; documented in the adapter. - AbortController wiring, dead ResolvedCredential interface, and the Dockerfile Chromium verification were already resolved on the branch.
|
Worked through the CodeRabbit + CodeQL review against current Fixed
Already resolved on-branchAbortController is wired to three cancellation paths; the dead Deliberately not changed
All touched workspaces pass typecheck + lint + tests (agent_adapters 26, sandbox 245, daemon 49, platform sandbox suites green). |
| execId: args.execId, | ||
| }), | ||
| ]); | ||
| if (!drainActive || handoff || eofSent) return; |
…r-handling, tests) Independently verified the 82-finding review against the actual code: the "9 critical + 39 high" reduced to 0 critical / 1 high on reassessment, and 16 were false positives. Fixed the real, impactful items. Security / contract: - exec-manager: register an error listener on close-mode stdin so an async EPIPE can't crash the daemon (the one confirmed high) - docker session backend: route seed env via runnerd POST /env instead of `docker run --env`, so it's no longer exposed through `docker inspect` (honors the SessionSpec.env contract; subsumes the silent env-drop) - start_agent_chat: validate attachment fileIds belong to the org before building presigned URLs (cross-tenant IDOR) - run_external_agent: make the gateway auth posture fail-closed instead of swallowing failures (was fail-open on VK enforcement) - k8s session backend: re-throw non-404 delete errors so the reaper retries; the TTL sweep keeps the registry entry on a stop-throw - compose / CLI: cap_drop ALL on sandbox-egress, add back only NET_ADMIN, DAC_OVERRIDE, CHOWN, SETUID, SETGID (verified live against the image) Races / correctness: - session create: reserve the sessionId synchronously so a concurrent create can't make the loser destroy the winner's container - approvals: FSM guard so only a pending approval can be approved/rejected - run_external_agent: re-read message status before the failover patch so a post-success side-effect failure can't overwrite a sealed segment - generate_response: anchor the cancel check on generationStartTime so a cancel during the front-load window isn't missed (phantom reply) - TTL reaper: probe runnerd liveExecs so a re-adopted busy session past TTL isn't reaped mid-task - agent_message_parts: preserve the Task isError signal; trim before the finalText dedup - session_client: normalize CRLF in the SSE decode (spawner parity) Error-handling (CLAUDE.md): - chat_turn_generate / sandbox cleanup: log instead of swallowing; guarantee process.exit in the shutdown path Perf / robustness: - add a by_sessionId_and_execId index + switch the five point lookups to it (removes an O(n) scan on the 500ms progress flush) - steer/stdin payloads truncate on UTF-8 codepoint boundaries - SANDBOX_TOKEN trimmed on server + both clients (whitespace = unset) - step-rows: aria-controls on the expand toggle Tests: - session_ops_coverage: upsert / finalize / reserve / abandoned-ops / running-by-thread (race-safety paths that had zero coverage) - session-routes: exec-cancel/status/adopt + a beforeEach reset Deferred (structural; an interim would regress or under-fix): listSandboxesForOrg spend counter, external-agent budget reservation, per-thread stream-buffer freeze.
…time size budget
The smoke stack runs with PULL_POLICY=never, which collapses bifrost's
`pull_policy: ${PULL_POLICY:-missing}` to never — so compose aborted with
"No such image: maximhq/bifrost" since the third-party image is neither built
by us nor on GHCR. Pre-pull the pinned tag (resolved from compose.yml) before
the stack starts.
The persistent-agent-session work bundles the full coding stack (Node + bun +
uv + Claude Code + OpenCode + Playwright MCP + Chromium + gh) into the single
sandbox-runtime image, which now lands ~1.87 GB in CI. Raise its size budget
from 900 MB to 2100 MB (measured size + ~10% headroom) to match.
…resolves it run_agent.ts imports @tale/agent-adapters, which the convex bundler compiles inline (the package exports raw .ts, so it can't be an externalPackage). The builder stage copied only packages/ui source, so in the image the node_modules/@tale/agent-adapters symlink dangled and every convex deploy failed with "Could not resolve @tale/agent-adapters" — the platform retried forever and never went healthy, timing out the smoke test at 300s. Copy the package source in the builder stage, mirroring packages/ui.
What
Transforms the sandbox from one-shot code execution into a persistent
"remote computer" that can run coding agents (Claude Code, OpenCode) — the
base service for "give an agent a task in a sandbox" (e.g. GitHub issue →
Claude Code → PR). Entry-point UI (chat tool / workflow node) is intentionally
out of scope per the agreed plan; the service API is entry-agnostic so any
entry reuses it.
How it fits together
/v1/sessionsAPI on the spawner (create/get/list/destroy + exec(SSE)/cancel/attach + env/files), proxied to runnerd — a small control daemon
that is PID 1 of a long-lived session container (single-file
runnerd.mjs,run by the image's Node 24). No
kubectl execanywhere; the K8s backendstays exec-free (Pod-IP HTTP). The registry is a cache; backend
labels/annotations + runnerd's activity clock are the source of truth (boot
re-adopt + TTL/idle sweep).
OpenCode,
gh, Playwright + Chromium +@playwright/mcp, git/rg/fd, plus anagentuser (uid 10001, non-root sobypassPermissionsis allowed) selectedper profile. The one-shot path keeps uid 65534 and is byte-identical
(docker-args snapshot unchanged).
@tale/agent-adapters— entry-agnostic package:buildExec(spec)→session exec (argv/env/stdin), plus a parser turning each agent's native
stdout (Claude
stream-json, OpenCode JSONL) into one normalizedAgentEventunion (usage deduped by message id).
path from an in-sandbox agent to an LLM. Platform mints a session-scoped
virtual key (budget + model allowlist), revoked on destroy. Raw provider keys
never enter the sandbox.
sandboxSessions(polymorphic owner drivinglifecycle/access/quota) + token/progress/credential-audit tables; reserve +
watchdog mutations;
session_client(HMAC platform→spawner),bifrost_admin(governance API),
session_credentials(Tier-2 integration-credential broker→ session env, audited), and
run_agent(progress bridge: exec → adapterparser → throttled
sandboxSessionOpsso any entry renders live progress).usageLedgerstays the single billing truth.Tiered secret model is documented in
services/sandbox/docs/sessions.md:Tier-0 (platform secrets) never enter the sandbox; Tier-1 (LLM keys) proxied;
Tier-2 (integration creds) enter only via explicit per-session grant + broker +
audit + revoke-on-destroy.
Tests
12 — runnerd exec/timeout/cancel/attach + file ops, docker-session-args &
k8s-session-pod-spec snapshots, the session route layer against a fake
runnerd, adapter golden fixtures +
buildExecsnapshots.tsc0 errors;convex codegenbundles clean (new node-runtimemodules carry
'use node').Needs live infra (follow-up, can't run in this env): docker-build image
conformance (
tests/container-sandbox-runtime-test.sh, syntax-checked), kinde2e for the K8s session lifecycle, and a secret-gated agent smoke (real
Bifrost + provider key).
Pre-PR checklist
bun run check. All workspaces this PR touches (sandbox, platform,agent_adapters, runnerd daemon) pass. The only failures are a known
turbo-load
mock.moduleflake in@tale/cli(pull-image.test.ts— passes6/6 in isolation, confirmed) and
@tale/raginterrupted by a local timeout;neither workspace is touched by this PR.
h-[…]— N/A (no UI in this PR).services/platform/messages/{en,de,fr}.json— N/A (no UI messagekeys added).
/docs/{en,de,fr}/— follow-up. This PR adds operator envvars + a compose service but no end-user-reachable surface (entry points
deferred); operator docs for the Bifrost service +
SANDBOX_*/BIFROST_*env land with the entry-point PR.
Summary by CodeRabbit
Release Notes
New Features
Infrastructure Updates