Skip to content

feat(sandbox): docker + exec-free kubernetes dual execution backend#1841

Merged
larryro merged 13 commits into
mainfrom
feat/sandbox-execution-backend
Jun 11, 2026
Merged

feat(sandbox): docker + exec-free kubernetes dual execution backend#1841
larryro merged 13 commits into
mainfrom
feat/sandbox-execution-backend

Conversation

@larryro

@larryro larryro commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

What

Make the sandbox service deployable on both Docker Compose (today) and Kubernetes (Helm), selected by SANDBOX_BACKEND=docker|kubernetes. The Compose path is byte-identical to before; the K8s path is a native Pod-per-exec backend that needs no docker.sock, no host-path workspace, and no --internal egress bridge.

Architecture

A coarse ExecutionBackend.execute(cfg, req, opts) → ExecuteResponse seam:

  • DockerBackend — the Compose path, unchanged. Stages onto a host bind-mount, docker run, harvests back. The orchestration lives in local-workspace-run.ts; existing unit tests are the parity proof.
  • KubernetesBackendexec-free, HTTP-only. Every spawner↔Pod interaction is plain HTTP (createNamespacedPod / readNamespacedPodLog / deleteNamespacedPod) + presigned-URL I/O done inside the Pod.

Why exec-free

Deep local e2e on kind found the K8s exec websocket unreliable under Bun (intermittent Expected 101 upgrade failures; ~0% on a stressed cluster). @kubernetes/client-node's Log.log / readNamespacedPodLog use a plain HTTPS fetch (reliable); only Exec uses isomorphic-ws (the failure point). So the transport uses only HTTP K8s ops — no websocket anywhere.

Exec-free Pod (one Pod, shared /workspace emptyDir, 3 containers)

  • stage initContainer (spawner image): downloads inputs from presigned URLs, writes the multi-step wrapper + prior-stage attestation. Completes before the runner — no sentinel. Required-input failure → PRE_STAGE_FAILED.
  • runner (runtime image): command override runs /entrypoint.sh as a child of sh -c (not exec) so the wrapper captures the exit code to a file; stderr → a file (K8s logs stay clean stdout for phase parsing). No credentials, no callbacks. No runtime-image change.
  • harvest sidecar (spawner image): owns the token + upload slots, enforces the user timeout, uploads /workspace/output via presigned slots + EP1/EP2, and prints one __TALE_RESULT__ line the spawner reads back from its logs.

Security

The per-exec Secret (presigned URLs + SANDBOX_TOKEN + caps) is mounted only into stage/harvest, never the runner — regression-tested. The runner has no token and no URLs.

Horizontal scale

The result rides the harvest container's logs, read by the owning spawner replica — no callback to a Service VIP, no cross-replica affinity. The spawner stays HPA-able; cancel = delete-by-deterministic-name from any replica.

Platform side

Unchanged. The spawner still emits the same SSE phase/stdout/stderr/result events; EP1/EP2 are backend-agnostic.

Verification

  • Docker path: bun run check green; staging/prior-output unit tests unchanged.
  • K8s path — unit: Pod shape + the Secret-never-on-runner invariant, the ExecSpec Secret payload round-trip, and the __TALE_RESULT__ result-line protocol. 156 tests pass, typecheck/lint/format clean.
  • K8s path — on a real cluster (kind, runc): drove the real KubernetesBackend through the actual executeRequest dispatcher.
    • Reliability bar met: 60/60, 50/50, +80 @ concurrency-4 — 100%, deterministic (~2.2–3.3 s each, zero flakes). No exec websocket → no flakiness.
    • basic (output uploaded, sha matched) / failure (exit 7 → RUNTIME_ERROR) / multistep (steps[] wrapper + output harvested).
    • Cancel mid-flight → cancelled, Pod + Secret cleaned up.
    • Runtime security, verified on a live Pod: the runner has no Secret mount / no SANDBOX_TOKEN / no presigned URLs; stage + harvest do.
    • Multi-spawner scale: two independent driver processes, disjoint Pods, no interference (no cross-replica state; cancel = delete-by-deterministic-name).
  • Bug found + fixed by the e2e: the runner log-follow left a detached AbortError on cancel under Bun (client-node 1.4.0's Log.log pipes the body but never invokes the done-callback). Replaced the stream with discrete readNamespacedPodLog polling, so the whole k8s path is request/response HTTP with nothing to abort. Re-validated clean.
  • Prod image build fixed: services/sandbox/bun.lock (the standalone lock the Dockerfile builder uses) was missing @kubernetes/client-node — regenerated so --frozen-lockfile passes.

The exec-free transport is validated on real k8s — the original "don't enable in production until tested" blocker is cleared. What remains before a production cutover is ops/integration, not transport reliability: a smoke against the real platform storage + EP1/EP2 in-cluster (the e2e used an in-cluster mock for those; the code that calls them is backend-agnostic and already production-proven on the docker path), plus the operator-applied gVisor RuntimeClass + egress NetworkPolicy (documented, not yet exercised here).

Deploy contract

services/sandbox/docs/kubernetes.md documents the RBAC (namespaced Role; pods create/get/list/delete, pods/log get, secrets create/update/delete; no pods/exec), the required env (SANDBOX_SPAWNER_IMAGE, NODE_EXTRA_CA_CERTS, …), and the recommended egress NetworkPolicy. Helm manifests are authored separately by the deployment owner.

@larryro

larryro commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Update: validated the backend end-to-end locally against a kind cluster with the real tale-sandbox-runtime image (real Python 3.12 execution, live PHASE markers, separated stderr, harvested json.dump output — 5/5 consistent runs). This closed two of the listed follow-ups and surfaced two real bugs, now fixed in cdaa57e:

  • imagePullPolicy: IfNotPresent on runner + holder — K8s defaults Always for :latest, which re-pulls the image on every execution and ignores a locally-loaded image (kind/air-gapped). IfNotPresent matches the docker path (pull once, reuse).
  • Bun + @kubernetes/client-node abort resilience — the aborted Log.log stream rejected with a detached AbortError under Bun and intermittently crashed the process; now handled at the source via the log done-callback, with pod-status reads + the log-follow connect wrapped in the same retry as the exec ops, plus an unhandledRejection backstop in server.ts.

Remaining follow-ups: log-follow mid-stream reconnect (only the connect is retried today), cacheMode=pvc (needs RWX storage), and the egress NetworkPolicy/RBAC contract doc (Phase 3). Local e2e is reproducible via token-auth + NODE_EXTRA_CA_CERTS against kind.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a pluggable execution backend abstraction for the sandbox spawner, refactoring it from Docker-only to support both Docker and Kubernetes runtimes. A new ExecutionBackend interface defines the contract between orchestration and runtime-specific concerns: workspace creation, cache management, launch, cancellation, and orphan sweeping. Two concrete implementations wire Docker and Kubernetes into this interface. The Docker backend reuses existing utilities and host-directory workspaces. The Kubernetes backend builds on a new Kubernetes client wrapper, provides exec-based file transport, constructs hardened pod specs, manages PVC-backed caches, and orchestrates pod lifecycle with timeout/cancellation racing. Configuration loading is updated to support backend selection and K8s-specific settings. Server startup, execution orchestration, and cleanup logic are refactored to be backend-agnostic, receiving the backend instance as a parameter and delegating runtime decisions to it.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tale-project/tale#1727: Implements the original Docker-based sandbox execution pipeline that this PR refactors behind a pluggable backend abstraction, sharing core modules in services/sandbox/src/server.ts, services/sandbox/src/spawn.ts, and services/sandbox/src/cleanup.ts.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing a dual execution backend for both Docker and Kubernetes in the sandbox service.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering architecture, implementation details, security, verification, and deployment contract with clear sections addressing the objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-execution-backend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/sandbox/src/backend/index.ts`:
- Around line 19-29: The exported type barrel (CacheStores, ExecutionBackend,
HealthResult, LaunchSpec, RunningExecution, RunOptions, RunResult, SweepOptions,
Workspace) is flagged as unused by knip; either remove this re-export block from
services/sandbox/src/backend/index.ts and update all call sites to import those
types directly from './types.ts', or add a knip configuration/annotation to mark
these symbols as intentional public API; locate the export block in index.ts
(the listed type names) and perform the chosen change consistently across the
codebase to satisfy the linter.

In `@services/sandbox/src/backend/kubernetes/k8s-cache.ts`:
- Around line 36-44: The empty catch for
client.core.readNamespacedPersistentVolumeClaim hides real errors; update the
catch to inspect the thrown error (e.g., check error.statusCode / error.code /
error.response?.status) and only fall through to create when the error indicates
a 404/NotFound; for any other error log it with console.error (including the
error and context like name/namespace) and rethrow or return a failure so
permission/network/API errors aren’t masked; ensure the change is applied around
the readNamespacedPersistentVolumeClaim call and the subsequent create path so
only true "not found" errors trigger creation.

In `@services/sandbox/src/backend/kubernetes/k8s-client.ts`:
- Around line 201-206: The custom Writable "stdout" ignores backpressure by
calling void untar.stdin.write(chunk); instead of handling the boolean return;
update the write implementation in the Writable returned for stdout to check the
result of untar.stdin.write(chunk) and if it returns false, pause calling cb()
until untar.stdin emits 'drain' (attach a one-time 'drain' listener that invokes
cb), otherwise call cb() immediately; ensure you also handle errors on
untar.stdin to avoid leaking listeners.

In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts`:
- Around line 11-35: Replace the explicit type annotation on the test fixture
with the TypeScript satisfies operator so the object literal remains fully
inferred but validated against SpawnerConfig; specifically update the cfg
declaration (symbol: cfg) in the k8s-pod-spec.test.ts test to use "satisfies
SpawnerConfig" instead of ": SpawnerConfig" to get compile-time conformance
without changing the inferred type.

In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts`:
- Around line 37-46: The exported type SandboxPodInput is flagged unused
externally; if it's only used as the parameter type for buildSandboxPod, make it
non-exported by removing the export modifier from SandboxPodInput so it remains
a local type, and update any local references (e.g., the inp: SandboxPodInput
parameter on buildSandboxPod) accordingly; alternatively, if you intend it to be
public, add a file-level Knip suppression or export a runtime value that
references the type to prevent the unused-export warning—pick one approach and
apply it consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6be0f5f5-dbbe-4fef-a693-08021f284d79

📥 Commits

Reviewing files that changed from the base of the PR and between 08ca625 and 88eb30b.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • services/sandbox/package.json
  • services/sandbox/src/backend/docker/docker-backend.ts
  • services/sandbox/src/backend/docker/host-dir-workspace.ts
  • services/sandbox/src/backend/index.ts
  • services/sandbox/src/backend/kubernetes/k8s-backend.ts
  • services/sandbox/src/backend/kubernetes/k8s-cache.ts
  • services/sandbox/src/backend/kubernetes/k8s-client.ts
  • services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts
  • services/sandbox/src/backend/kubernetes/k8s-workspace.ts
  • services/sandbox/src/backend/types.ts
  • services/sandbox/src/cleanup.ts
  • services/sandbox/src/config.ts
  • services/sandbox/src/docker-args.test.ts
  • services/sandbox/src/server.ts
  • services/sandbox/src/spawn.ts
  • services/sandbox/src/types.ts

Comment thread services/sandbox/src/backend/index.ts Outdated
Comment on lines +19 to +29
export type {
CacheStores,
ExecutionBackend,
HealthResult,
LaunchSpec,
RunningExecution,
RunOptions,
RunResult,
SweepOptions,
Workspace,
} from './types.ts';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove or justify the unused type re-export barrel to unblock lint.

This export block is currently failing CI (knip reports unused exported types on Line 20-Line 28). Either remove these re-exports and import directly from ./types.ts at use-sites, or explicitly configure knip to treat them as intentional public API.

As per coding guidelines, “Avoid barrel files in TypeScript — direct imports keep the dependency graph legible and tree-shaking honest.”

🧰 Tools
🪛 GitHub Actions: Lint / 2_Knip.txt

[error] 20-28: knip: Unused exported types detected (9). Unused type exports: CacheStores, ExecutionBackend, LaunchSpec, RunningExecution, RunOptions, RunResult, SweepOptions, Workspace.

🪛 GitHub Actions: Lint / Knip

[error] 20-20: knip: Unused exported type 'CacheStores'.


[error] 21-21: knip: Unused exported type 'ExecutionBackend'.


[error] 23-23: knip: Unused exported type 'LaunchSpec'.


[error] 24-24: knip: Unused exported type 'RunningExecution'.


[error] 25-25: knip: Unused exported type 'RunOptions'.


[error] 26-26: knip: Unused exported type 'RunResult'.


[error] 27-27: knip: Unused exported type 'SweepOptions'.


[error] 28-28: knip: Unused exported type 'Workspace'.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/index.ts` around lines 19 - 29, The exported
type barrel (CacheStores, ExecutionBackend, HealthResult, LaunchSpec,
RunningExecution, RunOptions, RunResult, SweepOptions, Workspace) is flagged as
unused by knip; either remove this re-export block from
services/sandbox/src/backend/index.ts and update all call sites to import those
types directly from './types.ts', or add a knip configuration/annotation to mark
these symbols as intentional public API; locate the export block in index.ts
(the listed type names) and perform the chosen change consistently across the
codebase to satisfy the linter.

Sources: Coding guidelines, Pipeline failures

Comment on lines +36 to +44
try {
await client.core.readNamespacedPersistentVolumeClaim({
name,
namespace: client.namespace,
});
return; // already exists
} catch {
// Not found (or transient) — fall through to create.
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Empty catch block violates guidelines and may hide errors.

The catch block at line 42-44 is empty except for a comment. Per coding guidelines, empty catch blocks should log with console.warn or console.error. Additionally, this catch assumes all errors mean "not found," but errors could include permission failures, network issues, or API errors that should be surfaced rather than silently retried with a create operation.

🛡️ Proposed fix to log non-404 errors
  try {
    await client.core.readNamespacedPersistentVolumeClaim({
      name,
      namespace: client.namespace,
    });
    return; // already exists
- } catch {
-   // Not found (or transient) — fall through to create.
+ } catch (err) {
+   // 404 expected if PVC doesn't exist; log other errors but still attempt create
+   const code = typeof err === 'object' && err !== null && 'code' in err ? err.code : undefined;
+   if (code !== 404) {
+     console.warn(`[sandbox.k8s] PVC read failed (code=${code}), will attempt create:`, err);
+   }
  }

As per coding guidelines, no empty catch blocks in JavaScript/TypeScript — log with console.warn/console.error or re-throw to avoid hiding bugs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-cache.ts` around lines 36 - 44,
The empty catch for client.core.readNamespacedPersistentVolumeClaim hides real
errors; update the catch to inspect the thrown error (e.g., check
error.statusCode / error.code / error.response?.status) and only fall through to
create when the error indicates a 404/NotFound; for any other error log it with
console.error (including the error and context like name/namespace) and rethrow
or return a failure so permission/network/API errors aren’t masked; ensure the
change is applied around the readNamespacedPersistentVolumeClaim call and the
subsequent create path so only true "not found" errors trigger creation.

Source: Coding guidelines

Comment on lines +201 to +206
const stdout = new Writable({
write(chunk, _enc, cb) {
void untar.stdin.write(chunk);
cb();
},
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

Respect backpressure in stream write.

Line 203 ignores the boolean return value of untar.stdin.write(chunk) with a void cast. If the untar process is slower than the incoming tar stream, this can cause unbounded memory growth in the stdin buffer. The Writable callback should wait for drain events when write returns false.

♻️ Potential fix using backpressure-aware writes
    const stdout = new Writable({
-     write(chunk, _enc, cb) {
-       void untar.stdin.write(chunk);
-       cb();
+     write(chunk, _enc, cb) {
+       if (!untar.stdin.write(chunk)) {
+         untar.stdin.once('drain', cb);
+       } else {
+         cb();
+       }
      },
    });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-client.ts` around lines 201 -
206, The custom Writable "stdout" ignores backpressure by calling void
untar.stdin.write(chunk); instead of handling the boolean return; update the
write implementation in the Writable returned for stdout to check the result of
untar.stdin.write(chunk) and if it returns false, pause calling cb() until
untar.stdin emits 'drain' (attach a one-time 'drain' listener that invokes cb),
otherwise call cb() immediately; ensure you also handle errors on untar.stdin to
avoid leaking listeners.

Comment on lines +11 to +35
const cfg: SpawnerConfig = {
backend: 'kubernetes',
port: 8003,
sandboxToken: 'test',
runtimeImage: 'tale-sandbox-runtime:test',
runtime: 'runc',
k8s: {
namespace: 'tale-sandbox',
runtimeClassName: 'gvisor',
holderImage: 'busybox:1.36',
cacheMode: 'none',
},
defaultTimeoutMs: 30_000,
maxTimeoutMs: 300_000,
maxConcurrent: 4,
hostSessionRoot: '/var/lib/tale-sandbox/sessions',
cacheVolumePrefix: { pip: 'pip', npm: 'npm' },
egressNetwork: 'tale-sandbox-net',
egressProxy: 'http://sandbox-egress:3128',
stdoutMaxBytes: 5_242_880,
stderrMaxBytes: 5_242_880,
outputFileMaxBytes: 52_428_800,
outputTotalMaxBytes: 104_857_600,
maxRequestBodyBytes: 262_144,
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Prefer satisfies operator for test fixture.

The test configuration uses a type annotation (cfg: SpawnerConfig), but test fixtures benefit from the satisfies operator which provides compile-time validation while preserving the inferred type. This aligns with the TypeScript guidelines for test data.

♻️ Suggested refactor
-const cfg: SpawnerConfig = {
+const cfg = {
   backend: 'kubernetes',
   port: 8003,
   // ... rest of config
-};
+} satisfies SpawnerConfig;

Based on learnings, test fixtures (including TypeScript test data like this SpawnerConfig object) should prefer the 'satisfies' operator over type annotations. 'Satisfies' provides compile-time validation that the value conforms to the type without altering the inferred type, improving type safety.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const cfg: SpawnerConfig = {
backend: 'kubernetes',
port: 8003,
sandboxToken: 'test',
runtimeImage: 'tale-sandbox-runtime:test',
runtime: 'runc',
k8s: {
namespace: 'tale-sandbox',
runtimeClassName: 'gvisor',
holderImage: 'busybox:1.36',
cacheMode: 'none',
},
defaultTimeoutMs: 30_000,
maxTimeoutMs: 300_000,
maxConcurrent: 4,
hostSessionRoot: '/var/lib/tale-sandbox/sessions',
cacheVolumePrefix: { pip: 'pip', npm: 'npm' },
egressNetwork: 'tale-sandbox-net',
egressProxy: 'http://sandbox-egress:3128',
stdoutMaxBytes: 5_242_880,
stderrMaxBytes: 5_242_880,
outputFileMaxBytes: 52_428_800,
outputTotalMaxBytes: 104_857_600,
maxRequestBodyBytes: 262_144,
};
const cfg = {
backend: 'kubernetes',
port: 8003,
sandboxToken: 'test',
runtimeImage: 'tale-sandbox-runtime:test',
runtime: 'runc',
k8s: {
namespace: 'tale-sandbox',
runtimeClassName: 'gvisor',
holderImage: 'busybox:1.36',
cacheMode: 'none',
},
defaultTimeoutMs: 30_000,
maxTimeoutMs: 300_000,
maxConcurrent: 4,
hostSessionRoot: '/var/lib/tale-sandbox/sessions',
cacheVolumePrefix: { pip: 'pip', npm: 'npm' },
egressNetwork: 'tale-sandbox-net',
egressProxy: 'http://sandbox-egress:3128',
stdoutMaxBytes: 5_242_880,
stderrMaxBytes: 5_242_880,
outputFileMaxBytes: 52_428_800,
outputTotalMaxBytes: 104_857_600,
maxRequestBodyBytes: 262_144,
} satisfies SpawnerConfig;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts` around lines 11
- 35, Replace the explicit type annotation on the test fixture with the
TypeScript satisfies operator so the object literal remains fully inferred but
validated against SpawnerConfig; specifically update the cfg declaration
(symbol: cfg) in the k8s-pod-spec.test.ts test to use "satisfies SpawnerConfig"
instead of ": SpawnerConfig" to get compile-time conformance without changing
the inferred type.

Source: Learnings

Comment on lines +37 to +46
export interface SandboxPodInput {
executionId: string;
organizationId: string;
language: Extract<Language, 'python' | 'node' | 'polyglot'>;
/** Absolute path the runtime entrypoint will exec (see docker-args.ts). */
entryPath: string;
startedAtMs: number;
/** Per-org cache PVC names; only consumed when cfg.k8s.cacheMode === 'pvc'. */
cache?: CacheStores;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for SandboxPodInput usage across the codebase
rg -n 'SandboxPodInput' --type=ts -C2

Repository: tale-project/tale

Length of output: 931


Address Knip unused-export for SandboxPodInput

  • SandboxPodInput appears only in services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts (as the inp: SandboxPodInput parameter type for buildSandboxPod); no other file references the SandboxPodInput identifier.
  • Either make SandboxPodInput non-exported if it’s not meant to be an externally consumed public type, or adjust/suppress Knip so exported types used only in type signatures aren’t flagged as unused.
🧰 Tools
🪛 GitHub Actions: Lint / 2_Knip.txt

[error] 37-37: knip: Unused exported type detected. Unused interface export: SandboxPodInput.

🪛 GitHub Actions: Lint / Knip

[error] 37-37: knip: Unused exported interface 'SandboxPodInput'.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts` around lines 37 -
46, The exported type SandboxPodInput is flagged unused externally; if it's only
used as the parameter type for buildSandboxPod, make it non-exported by removing
the export modifier from SandboxPodInput so it remains a local type, and update
any local references (e.g., the inp: SandboxPodInput parameter on
buildSandboxPod) accordingly; alternatively, if you intend it to be public, add
a file-level Knip suppression or export a runtime value that references the type
to prevent the unused-export warning—pick one approach and apply it
consistently.

Source: Pipeline failures

@larryro larryro changed the title feat(sandbox): docker + native kubernetes dual execution backend feat(sandbox): docker + exec-free kubernetes dual execution backend Jun 10, 2026
larryro added 13 commits June 11, 2026 09:05
…re k8s

Introduce an ExecutionBackend seam so the spawner can target both Docker
Compose (now) and Kubernetes (later), selected by SANDBOX_BACKEND (default
'docker'). DockerBackend wraps the UNCHANGED docker-args/spawn-util/volume
modules plus the cleanup helpers; spawn.ts (executeRequest/cancelExecution),
server.ts, and cleanup.ts now route runtime launch, cancel, health, image
warm, orphan sweep, and shutdown through the backend. Staging/harvest still
operate on a local Workspace dir, so the Compose path is behavior-preserving
(the docker-args snapshot, spawn-util, and staging unit tests are unchanged
and green; 123 tests pass).

Two intentional, benign deviations vs the prior code (confirmed by an
adversarial parity review): the no-op 'docker rm' against a never-created
container on pre-launch error paths is elided (a container only exists after
launch, so real removals are unaffected); the periodic-sweep stale threshold
is captured a few ms earlier. Neither changes which containers or session
dirs get reaped. A failed 'docker ps' still short-circuits the host-dir
sweep, matching prior behavior.

Phase 1 of the docker+k8s dual-backend plan; KubernetesBackend lands in Phase 2.
Implements SANDBOX_BACKEND=kubernetes: each execution runs as a Pod via
@kubernetes/client-node. The spawner stays the trusted broker — it stages
inputs into a local dir, tars them INTO the runtime Pod's /workspace via a
holder sidecar, follows the runner's logs, reaps the exit code from Pod
status, tars /workspace/output back OUT, and spawn.ts then runs its existing
harvest/upload/attestation over the local dir. SANDBOX_TOKEN, output byte
caps, and input attestation never enter the Pod.

Transport (exec-tar broker + holder sidecar) and the @kubernetes/client-node
streaming integration under Bun were validated live against a kind cluster
end to end (stage -> run -> harvest; exit code, live PHASE markers, separated
stderr, harvested output all correct). No runtime-image change: the runner's
sentinel-wait + stderr redirect are done via a Pod command override.

New: backend/kubernetes/{k8s-pod-spec(.test),k8s-client,k8s-cache,k8s-workspace,k8s-backend}.ts.
Pod spec mirrors docker-args (resources, securityContext, seccomp, fsGroup,
runtimeClassName for gVisor); 13 pod-spec unit tests. Config: SANDBOX_K8S_*
(namespace, runtimeClass, holderImage), SANDBOX_CACHE (none|pvc). Workspace
finalizeStaging() lifted out of stageWorkspace so the chown is docker-only
(k8s re-owns via tar-in).

Auth note (Bun): Bun's fetch ignores client-cert kubeconfigs, so the backend
uses SA bearer-token auth (in-cluster loadFromCluster, or SANDBOX_K8S_SERVER/
TOKEN/CAFILE for dev) + NODE_EXTRA_CA_CERTS for CA trust. Egress isolation is
operator-applied (NetworkPolicy + egress proxy); init() logs a reminder.

Phase 2 of the docker+k8s dual-backend plan.
…silience)

Found + fixed by validating the backend end-to-end against a kind cluster
with the REAL tale-sandbox-runtime image (real Python 3.12 execution, PHASE
markers, separated stderr, harvested output — 5/5 consistent):

- imagePullPolicy: IfNotPresent on runner + holder. Default 'Always' on a
  :latest tag re-pulls the image on EVERY execution and ignores a locally
  loaded image (kind load / air-gapped); IfNotPresent matches the docker path
  (pull once at boot, reuse).

- Bun + @kubernetes/client-node abort resilience: the aborted Log.log stream
  rejected with a detached AbortError under Bun and intermittently crashed the
  process — now handled at the source via the log done-callback. Pod-status
  reads and the log-follow connect are wrapped in the same retry as the exec
  ops. server.ts gains an unhandledRejection backstop so one execution's stray
  async error can't take down the long-running spawner.
Lift the pure/shared execution helpers out of spawn.ts into a new
exec-common.ts so both DockerBackend and the planned exec-free
Kubernetes in-Pod stage/harvest modes can reuse them. spawn.ts keeps
only the in-flight registry, cancelExecution, and executeRequest.

Behavior is byte-identical: the moved helpers are unchanged and the
existing spawn-staging / spawn-prior-outputs tests (now importing from
exec-common) still pass. Foundation for the exec-free K8s transport.
Replace the Phase-1 fine-grained createWorkspace/ensureCacheStore/launch
methods on ExecutionBackend with a single execute(cfg, req, opts) →
ExecuteResponse. The local-workspace orchestration (stage → run → phase
parse + live tail → harvest → response assembly) moves verbatim out of
spawn.ts into a shared runLocalWorkspaceExecution helper that
DockerBackend.execute delegates to. spawn.ts is now a thin dispatcher:
request validation + the in-flight registry + cancelExecution.

The fine-grained trio is now the DockerBackend-internal LocalWorkspaceRuntime
contract (still satisfied by both backends). The exec-tar KubernetesBackend
delegates to the same helper for now; the exec-free HTTP-only rewrite that
needs the coarse seam lands in the following commits.

Behavior is byte-identical — the existing spawn-staging / spawn-prior-outputs
tests (136 pass) plus typecheck/lint/format are the parity proof.
Replace the Phase-2 exec-tar transport (which proved unreliable under Bun —
intermittent websocket upgrade failures) with an exec-free design that uses
only plain HTTP K8s operations. The spawner does control-plane only;
workspace I/O moves INTO the Pod via trusted helper containers.

Pod shape (one Pod, shared /workspace emptyDir, no PVC required):
- stage initContainer (spawner image, k8s-stage.ts): downloads inputs from
  presigned URLs via the same stageWorkspace the docker path uses; writes the
  prior-stage attestation. Completes before the runner — no sentinel handshake.
- runner (runtime image): command override runs /entrypoint.sh as a child of
  sh -c (not exec) so the wrapper captures the exit code into a file; stderr
  redirected to a file (K8s logs are stdout-only for clean phase parsing).
  No credentials, no callbacks.
- harvest sidecar (spawner image, k8s-harvest.ts): owns the token + upload
  slots, enforces the user timeout, uploads /workspace/output via the same
  harvestOutputDir, and prints one __TALE_RESULT__ line the spawner reads back
  from its logs.

Security: the per-exec Secret (presigned URLs + SANDBOX_TOKEN + caps) is
mounted ONLY into stage/harvest, never the runner (asserted in pod-spec test).

Scale: the result rides the harvest container's logs, read by the owning
replica — no cross-replica callback, no affinity. Cancel = delete-by-name.

- new: exec-spec.ts (Secret payload), k8s-protocol.ts (result-line contract),
  k8s-stage.ts + k8s-harvest.ts (in-Pod entry modes); createStreamScanner
  extracted to exec-common and shared by both backends.
- k8s-client.ts: drop exec/tar helpers, keep makeK8sClient + log read/follow.
- k8s-pod-spec.ts: 3-container shape; SANDBOX_SPAWNER_IMAGE replaces holderImage.
- removed: k8s-workspace.ts (spawner-local staging is gone).

typecheck/lint/format green; 156 unit tests pass. K8s e2e on a healthy cluster
(the 50+ run reliability bar) is still pending — kind image-load is broken for
fresh clusters, so on-cluster validation is deferred.
- Set TMPDIR (not just HOME) on the stage/harvest helper containers so Bun
  has a writable scratch dir under readOnlyRootFilesystem.
- Retry the harvest log read up to 3x on a missing result line to absorb
  kubelet→API log-propagation lag after the harvest container terminates.
Replace the K8s runner log-follow (Log.log streaming) with discrete polling of
readNamespacedPodLog. @kubernetes/client-node 1.4.0's Log.log pipes the
response body internally and never invokes the done-callback, so aborting the
follow on cancel/timeout left a detached AbortError rejection (survived only by
server.ts's unhandledRejection backstop). Polling makes the ENTIRE k8s path use
only discrete request/response HTTP calls — create/read/read-log/delete Pod —
so there is no long-lived fetch to abort under Bun. Live PHASE + stdout deltas
come from periodic readPodLog (limitBytes-capped, suffix-diffed); the canonical
stdout is the final full read. Drops followLogs + the dead done-callback +
the separate waitForHarvestDone (folded into the same poll loop).

Verified on kind: cancel mid-flight is now clean (no detached AbortError);
reliability sweeps stay 100% (50/50, 60/60) with PHASE/stdout intact.
The k8s backend added @kubernetes/client-node to services/sandbox/package.json
but never regenerated services/sandbox/bun.lock — the standalone lock the
Dockerfile builder stage installs with --frozen-lockfile (it COPYs only the
sandbox package + lock, in isolation from the workspace root lock). The stale
lock made the production image build fail with 'lockfile had changes, but
lockfile is frozen'. Regenerated; --frozen-lockfile now passes in isolation.
…intra-file symbols

k8s-stage.ts / k8s-harvest.ts are launched by absolute path inside the
Pod (STAGE_ENTRY / HARVEST_ENTRY), never imported, so knip flagged them
and everything only they import as unused. Register them as knip
entries; drop the export keyword from symbols only used within their
defining file; trim the backend/index.ts re-exports to HealthResult,
the only type consumed via the index (the rest import ./types.ts
directly).
@kubernetes/client-node pushed the sandbox image to 330 MB, over its
300 MB budget. Strip *.d.ts/*.d.mts/*.d.cts/*.map from production
node_modules and remove bun's install cache in the same layer — bun
hardlinks node_modules out of that cache, so deleting only the
node_modules links frees ~2 MiB while the cache keeps the content;
removing both shrinks the flattened image from 320 MiB to 284 MiB
(measured). Budget moves to 320 MB: ~10% headroom over the optimized
size while staying below the un-stripped 330 MB so a silently broken
strip still trips the check.
…findings)

Fixes the 1 critical + 14 major findings (plus high-value minors) from the
two-round multi-agent review of this branch. Highlights:

- sweepOrphans no longer reaps live executions' pods: terminal reaps gate on
  isLive + a finishedAt grace; stale reaps clamp to the worst-case execution
  lifetime (startup + maxTimeout + harvest backstop + slack). Per-exec Secrets
  get a GC path (sweep lists by label; podless orphans reaped by age).
- Every K8s API call carries an AbortSignal timeout (per-call middleware);
  withRetry stops retrying definitive 4xx; transient readPod failures no
  longer convert a healthy run into SPAWNER_UNAVAILABLE.
- New shared exec-response.ts constructors route every terminal response on
  BOTH backends; normative outcome table documented on
  ExecutionBackend.execute and pinned by exec-response.test.ts.
- Runner-dead short-circuit: a runner container that dies without writing the
  exit-code file (cgroup group OOM kill, eviction) is detected via the new
  __TALE_HARVEST_STARTED__ marker and classified promptly (OOM/137 in ~13s on
  kind, was full-timeout TIMEOUT/124); the marker also recovers the real exit
  code when the harvest later crashes.
- Cancel redesign: local cancel is abort-only (execute() finishes its log
  reads before its finally deletes the pod — cancelled stdout is now
  deterministic; docker kills the container via an in-wait abort listener);
  cross-replica cancel reaches backend.cancel() when the id isn't local.
- Duplicate executionId on another replica can no longer tear down the live
  owner (ownership-gated finally + pod-probing secret-409 handling).
- Harvest hardening: per-entry stat/read guards in harvestOutputDir (timeout
  harvest races live writers), fd-bounded stderr read (unbounded stderr was
  OOM-killing the harvest container), in-harvest crash catch + fatal fallback
  result line, exec-spec numeric validation (NaN-deadline guard), deadline-
  based result-line reads, atomic results.json flushes.
- Classification alignment: staging failures return SPAWNER_UNAVAILABLE on
  both backends with the stage log tail in the message (PRE_STAGE_FAILED
  stays action-side per wire.ts); docker daemon errors stop classifying as
  user code; k8s backstop expiry is TIMEOUT/124, not HARVEST_READ_FAILED.
- Workspace emptyDir bounded by SANDBOX_K8S_WORKSPACE_SIZE_LIMIT (4Gi); boot
  failures exit instead of leaving a zombie process; empty catches resolved;
  RBAC contract gains secrets `list` (docs updated).

206 unit tests green (50 new: contract suite, sweep matrix, duplicate-409
stub flows, classify matrix, walk resilience, protocol markers, retry
predicate). Validated live on kind: 13/13 scenarios including OOM
short-circuit latency, cancel stdout determinism, duplicate dispatch with the
owner unharmed, and a mid-flight sweep sparing the live pod; zero leftover
pods/Secrets after all paths.

Deferred (issues #1847-#1851): stage-attestation forgeability, executeMs
semantics, Bun TLS-knob verification, log-rotation windowing, ulimit parity.
@larryro larryro force-pushed the feat/sandbox-execution-backend branch from 99cfe77 to 4349520 Compare June 11, 2026 01:12
@larryro larryro merged commit 2142edf into main Jun 11, 2026
32 checks passed
@larryro larryro deleted the feat/sandbox-execution-backend branch June 11, 2026 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant