fix(k8s): add per-process ulimit parity with docker backend by larryro · Pull Request #1875 · tale-project/tale

larryro · 2026-06-11T09:09:12Z

Summary

Fixes #1851 — the K8s runner pod now applies the same per-process ulimits as the Docker backend, making resource enforcement behavior consistent across backends.

Docker reference (docker-args.ts:117-135)

Docker flag	ulimit equivalent	Purpose
`--pids-limit=128`	`ulimit -u 128`	Max processes per user
`--ulimit fsize=104857600`	`ulimit -f 204800`	Max file size (100 MB in 512-byte blocks)
`--ulimit cpu=600`	`ulimit -t 600`	CPU time in seconds (soft only)
`--ulimit core=0`	`ulimit -c 0`	No core dumps
`--ulimit nofile=1024:4096`	`ulimit -n 1024`	Open file descriptors (soft only)

Approach

Added ulimit calls to RUNNER_WRAPPER before the entrypoint invocation. This is the most practical option from the issue's list:

Zero dependencies — busybox sh supports ulimit, so no image changes or external tooling needed
Works under both runc and gVisor — doesn't depend on a specific RuntimeClass
Directly mirrors Docker — same limits, same semantics

The pod-level securityContext already prevents ulimit elevation, so the soft limits applied here are authoritative.

Tests

All 19 existing k8s-pod-spec.test.ts tests pass. Added ulimit-specific assertions verifying:

All five ulimit lines are present
They appear before the entrypoint invocation in the wrapper script

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

Bug Fixes
- Improved resource management in the sandbox runner with proper per-process limits (process count, file size, CPU time, and open files)
Tests
- Enhanced test coverage for resource limit validation in the runner environment

The docker runtime enforces --pids-limit=128, --ulimit fsize=100MB, --ulimit cpu=600, --ulimit core=0, and nofile=1024:4096 (docker-args.ts:117-135). The k8s runner pod had no per-process equivalent — a fork-heavy or single-giant-file workload behaved differently across backends (silent file-skip vs prompt RUNTIME_ERROR). Fix: add ulimit calls to RUNNER_WRAPPER before the entrypoint invocation. Busybox sh supports ulimit, making this a zero-dependency mechanism that works under both runc and gVisor without any image changes or external tooling. Map: docker --pids-limit=128 → ulimit -u 128 docker --ulimit fsize=100MB→ ulimit -f 204800 (100 MB in 512-byte blocks) docker --ulimit cpu=600 → ulimit -t 600 docker --ulimit core=0 → ulimit -c 0 docker --ulimit nofile=1024:4096 → ulimit -n 1024 (soft only; the k8s Pod securityContext already prevents ulimit elevation, and the hard limit is bounded by /etc/security/limits.d in the image) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-06-11T09:13:17Z

📝 Walkthrough

Walkthrough

This PR implements per-process resource limits in the Kubernetes runner container to achieve parity with the docker backend's resource constraints. The RUNNER_WRAPPER script now applies ulimit settings for process count, file size, CPU time, core dumps, and open-file limits before invoking the container entrypoint. Corresponding test assertions verify that these limits are correctly configured and ordered.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding ulimit parity between Kubernetes and Docker backends.
Description check	✅ Passed	The description includes summary, Docker reference table, implementation approach, and test details. Pre-merge checklist is present but not filled in.
Linked Issues check	✅ Passed	Code changes fully implement issue `#1851` requirements: all five Docker ulimit equivalents are added to RUNNER_WRAPPER with correct values and assertions verify their presence.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#1851`: modifications only add ulimit calls to RUNNER_WRAPPER and test assertions, with no unrelated alterations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue-1851-k8s-ulimit-parity

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts`:
- Around line 146-149: The test currently only asserts that 'ulimit -u' appears
before '/entrypoint.sh' which misses regressions for the second ulimit; update
the assertion on c.command?.[2] to also verify that 'ulimit -n 1024' appears
before '/entrypoint.sh' (and optionally assert 'ulimit -u' index is less than
'ulimit -n 1024' to preserve order), i.e. compute indices for 'ulimit -u',
'ulimit -n 1024', and '/entrypoint.sh' and assert both ulimit indices are <
entrypointIndex (and ulimitUIndex < ulimitNIndex if you want strict ordering).

In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts`:
- Around line 103-104: The RUNNER_WRAPPER currently runs `ulimit -u ...` which
the target /bin/sh rejects causing startup failure; update the RUNNER_WRAPPER to
avoid failing on unsupported ulimit flags by either removing the `-u` limit,
executing the commands under a shell that supports `-u` (e.g., bash) when
available, or guarding the call so unsupported options do not cause non‑zero
exit (e.g., test shell capability or append a fallback like `|| true` to the
`ulimit -u` invocation). Modify the RUNNER_WRAPPER definition (the
constant/variable named RUNNER_WRAPPER in k8s-pod-spec.ts) to implement one of
these safe approaches so the wrapper does not abort startup when `/bin/sh`
rejects `-u`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c823273b-bd57-4a5e-9040-9d58c3bd2b8f

📥 Commits

Reviewing files that changed from the base of the PR and between d230306 and 9d4b65b.

📒 Files selected for processing (2)

services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts
services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts

coderabbitai · 2026-06-11T09:13:20Z

+    // ulimit lines appear BEFORE the entrypoint invocation.
+    expect(c.command?.[2].indexOf('ulimit -u')).toBeLessThan(
+      c.command?.[2].indexOf('/entrypoint.sh'),
+    );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Order assertion is incomplete for the second ulimit line

Line 147 only verifies ulimit -u appears before /entrypoint.sh. If ulimit -n 1024 moves below entrypoint, this test still passes while parity regresses.

Proposed test hardening

expect(c.command?.[2].indexOf('ulimit -u')).toBeLessThan( c.command?.[2].indexOf('/entrypoint.sh'), ); + expect(c.command?.[2].indexOf('ulimit -n 1024')).toBeLessThan( + c.command?.[2].indexOf('/entrypoint.sh'), + );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// ulimit lines appear BEFORE the entrypoint invocation.

expect(c.command?.[2].indexOf('ulimit -u')).toBeLessThan(

c.command?.[2].indexOf('/entrypoint.sh'),

);

// ulimit lines appear BEFORE the entrypoint invocation.

expect(c.command?.[2].indexOf('ulimit -u')).toBeLessThan(

c.command?.[2].indexOf('/entrypoint.sh'),

);

expect(c.command?.[2].indexOf('ulimit -n 1024')).toBeLessThan(

c.command?.[2].indexOf('/entrypoint.sh'),

);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.test.ts` around lines 146 - 149, The test currently only asserts that 'ulimit -u' appears before '/entrypoint.sh' which misses regressions for the second ulimit; update the assertion on c.command?.[2] to also verify that 'ulimit -n 1024' appears before '/entrypoint.sh' (and optionally assert 'ulimit -u' index is less than 'ulimit -n 1024' to preserve order), i.e. compute indices for 'ulimit -u', 'ulimit -n 1024', and '/entrypoint.sh' and assert both ulimit indices are < entrypointIndex (and ulimitUIndex < ulimitNIndex if you want strict ordering).

coderabbitai · 2026-06-11T09:13:20Z

+  `ulimit -u 128 -f 204800 -t 600 -c 0`,
+  `ulimit -n 1024`,


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Verifies shell behavior for the current combined-vs-split ulimit form. # Expected: combined form fails or behaves unexpectedly; split form succeeds. if command -v busybox >/dev/null 2>&1; then echo "== busybox sh ==" busybox sh -c 'ulimit -u 128 -f 204800 -t 600 -c 0; echo "combined_exit=$?"' || true busybox sh -c 'ulimit -u 128; ulimit -f 204800; ulimit -t 600; ulimit -c 0; ulimit -n 1024; echo "split_exit=$?"' fi echo "== system /bin/sh ==" /bin/sh -c 'ulimit -u 128 -f 204800 -t 600 -c 0; echo "combined_exit=$?"' || true /bin/sh -c 'ulimit -u 128; ulimit -f 204800; ulimit -t 600; ulimit -c 0; ulimit -n 1024; echo "split_exit=$?"'

Repository: tale-project/tale

Length of output: 205

Fix ulimit invocation in RUNNER_WRAPPER to prevent startup failure

In this environment, /bin/sh rejects ulimit -u with Illegal option -u (exit code 2), so the combined form fails and the split form still errors on the -u step—splitting alone won’t resolve it.

Update the wrapper to only set limits supported by the target shell/container (or guard/ignore unsupported ulimit -u), especially if the wrapper uses set -e/fails on non-zero.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox/src/backend/kubernetes/k8s-pod-spec.ts` around lines 103 - 104, The RUNNER_WRAPPER currently runs `ulimit -u ...` which the target /bin/sh rejects causing startup failure; update the RUNNER_WRAPPER to avoid failing on unsupported ulimit flags by either removing the `-u` limit, executing the commands under a shell that supports `-u` (e.g., bash) when available, or guarding the call so unsupported options do not cause non‑zero exit (e.g., test shell capability or append a fallback like `|| true` to the `ulimit -u` invocation). Modify the RUNNER_WRAPPER definition (the constant/variable named RUNNER_WRAPPER in k8s-pod-spec.ts) to implement one of these safe approaches so the wrapper does not abort startup when `/bin/sh` rejects `-u`.

@quant

Fixes found by a live chat -> Claude Code -> GitHub PR run (produced #1875): - bifrost_admin: rewrite the VK mint body for the actual v1.4.8 governance API (name required, provider_configs is a per-provider array, budget is singular and rejects reset_duration 'never', team_id/customer_id are mutually exclusive FKs — attribution moves into the key name, response is {virtual_key:{id,value}}) - toGatewayModelRef: Bifrost routes provider/model and rejects Tale's colon-qualified refs; upstreams don't understand the @quant qualifier either — translate at the gateway boundary only - run_external_agent: split EXTERNAL_AGENT_GATEWAY_URL (gateway as seen from inside the session container) from BIFROST_URL (management plane as seen from convex) so host bun-dev convex doesn't leak a host-only URL into the sandbox; wire the until-now-uncalled Tier-2 credential broker (github grant per turn via sessionEnvPatch) so the agent can push branches and open PRs - run_agent: expose runAgentInSessionImpl for direct same-process calls — a ctx.runAction hop is killed at ~300s in self-hosted Convex (the parent's finally then revokes the VK under the still-running agent); forward systemPromptAppend, which previously threw ArgumentValidationError for any agent with instructions - claude-code.json: add openrouter:deepseek/deepseek-v4-flash to supportedModels (verified working through the /anthropic translation path)

@quant

Fixes found by a live chat -> Claude Code -> GitHub PR run (produced #1875): - bifrost_admin: rewrite the VK mint body for the actual v1.4.8 governance API (name required, provider_configs is a per-provider array, budget is singular and rejects reset_duration 'never', team_id/customer_id are mutually exclusive FKs — attribution moves into the key name, response is {virtual_key:{id,value}}) - toGatewayModelRef: Bifrost routes provider/model and rejects Tale's colon-qualified refs; upstreams don't understand the @quant qualifier either — translate at the gateway boundary only - run_external_agent: split EXTERNAL_AGENT_GATEWAY_URL (gateway as seen from inside the session container) from BIFROST_URL (management plane as seen from convex) so host bun-dev convex doesn't leak a host-only URL into the sandbox; wire the until-now-uncalled Tier-2 credential broker (github grant per turn via sessionEnvPatch) so the agent can push branches and open PRs - run_agent: expose runAgentInSessionImpl for direct same-process calls — a ctx.runAction hop is killed at ~300s in self-hosted Convex (the parent's finally then revokes the VK under the still-running agent); forward systemPromptAppend, which previously threw ArgumentValidationError for any agent with instructions - claude-code.json: add openrouter:deepseek/deepseek-v4-flash to supportedModels (verified working through the /anthropic translation path)

@quant

Fixes found by a live chat -> Claude Code -> GitHub PR run (produced #1875): - bifrost_admin: rewrite the VK mint body for the actual v1.4.8 governance API (name required, provider_configs is a per-provider array, budget is singular and rejects reset_duration 'never', team_id/customer_id are mutually exclusive FKs — attribution moves into the key name, response is {virtual_key:{id,value}}) - toGatewayModelRef: Bifrost routes provider/model and rejects Tale's colon-qualified refs; upstreams don't understand the @quant qualifier either — translate at the gateway boundary only - run_external_agent: split EXTERNAL_AGENT_GATEWAY_URL (gateway as seen from inside the session container) from BIFROST_URL (management plane as seen from convex) so host bun-dev convex doesn't leak a host-only URL into the sandbox; wire the until-now-uncalled Tier-2 credential broker (github grant per turn via sessionEnvPatch) so the agent can push branches and open PRs - run_agent: expose runAgentInSessionImpl for direct same-process calls — a ctx.runAction hop is killed at ~300s in self-hosted Convex (the parent's finally then revokes the VK under the still-running agent); forward systemPromptAppend, which previously threw ArgumentValidationError for any agent with instructions - claude-code.json: add openrouter:deepseek/deepseek-v4-flash to supportedModels (verified working through the /anthropic translation path)

coderabbitai Bot requested changes Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(k8s): add per-process ulimit parity with docker backend#1875

fix(k8s): add per-process ulimit parity with docker backend#1875
larryro wants to merge 1 commit into
mainfrom
fix/issue-1851-k8s-ulimit-parity

larryro commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

larryro commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Docker reference (docker-args.ts:117-135)

Approach

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading