Problem Statement
The openshell-sandbox crate on main is a ~17k-line monolith that bundles three fundamentally different concerns into one binary:
- Network policy enforcement — HTTP CONNECT proxy, OPA evaluation, TLS interception, credential injection, bypass detection (~12k lines across
proxy.rs, opa.rs, l7/, identity.rs, etc.)
- Linux process isolation — network namespaces, Landlock, seccomp BPF, privilege dropping, process spawning, SSH server (~4k lines across
sandbox/linux/, process.rs, ssh.rs)
- Orchestration — gRPC gateway communication, policy loading, supervisor sessions, container lifecycle (~3k lines in
lib.rs, policy.rs, grpc_client.rs)
This monolithic design forces a single deployment topology: all three concerns run in one container. Different environments have fundamentally different isolation requirements:
- A Kubernetes cluster with gVisor or Kata already provides kernel-level sandboxing — the runtime isolation primitives are redundant and add unnecessary privilege requirements.
- A bare-metal deployment needs the full stack.
- A lightweight egress filter (no shell access needed) only needs the proxy.
- An edge deployment might want the proxy on a gateway node and the runtime on worker nodes.
There is no way to use the proxy independently of the runtime, or to deploy them on separate trust boundaries.
Proposed Design
Split openshell-sandbox into three crates with clear, independently deployable boundaries:
openshell-proxy — Network Policy Enforcement (Data Plane)
Standalone HTTP CONNECT proxy deployable as its own container image:
- OPA-based per-request policy evaluation (allow/deny/log)
- TLS interception via in-memory CA for L7 inspection
- Provider credential injection for authorized outbound requests
- Bypass detection and denial aggregation
- L7-aware inference routing (OpenAI, Anthropic, etc.)
- Binary identity binding (TOFU fingerprinting)
- gRPC control API — receives policy, credentials, and routes at runtime
Starts in deny-all mode by default. A controller (gateway, operator, or custom) pushes configuration via the control plane. This makes the proxy usable outside the OpenShell gateway entirely — any system that needs an egress policy filter can use it.
openshell-runtime — Process Isolation Primitives
Linux-specific sandboxing enforcement library:
- Network namespace isolation
- Landlock filesystem restrictions (Linux 5.13+)
- Seccomp BPF syscall filtering
- Privilege dropping and capability management
- Process spawning and entrypoint lifecycle
- Embedded SSH server (russh) for shell/exec access
Introduces a CredentialProvider trait to decouple SSH credential injection from the proxy's internal state.
openshell-sandbox — Orchestration (Composition Layer)
Thin glue that composes proxy + runtime for the traditional single-container topology. Depends on both crates:
- gRPC communication with the gateway
- Policy loading and propagation to both proxy and runtime
- Supervisor session management
- Container lifecycle coordination
No behavioral changes — this is the same binary produced today, just with cleaner internal boundaries.
Control Plane Interface
The proxy exposes a ProxyControl trait with two implementations:
- In-process — zero-copy, direct method calls. Used when proxy + runtime run in the same process.
- gRPC client/server — for remote control. Used when the proxy runs as a sidecar or standalone service.
This trait abstraction allows any topology to push policy/credentials/routes to the proxy uniformly.
Deployment Topologies Enabled
| Topology |
Proxy |
Runtime |
Use Case |
| Single container (status quo) |
In-process |
In-process |
Docker, Podman — full isolation in one unit |
| Two-pod sidecar |
Standalone pod |
Not needed (gVisor/Kata provides isolation) |
Kubernetes with hardware-backed sandboxing |
| Gateway-embedded |
Library dependency |
None |
Lightweight policy-only enforcement |
| Standalone egress filter |
Standalone binary |
None |
Non-OpenShell workloads that need egress control |
| Edge / split-node |
Standalone on gateway node |
Standalone on worker node |
Resource-constrained or multi-node setups |
Pros
- Independent deployability — the proxy runs as its own container image without pulling in Landlock, seccomp, SSH, or any Linux isolation code
- Topology flexibility — operators compose only what their environment needs; no wasted resources
- Smaller attack surface per component — the proxy pod needs no privileged capabilities; the runtime has no network stack surface
- Faster build iteration — changing OPA policy evaluation doesn't recompile SSH; changing seccomp rules doesn't recompile TLS
- Reusability beyond OpenShell — the proxy is a generic egress policy filter usable by any workload (Kubernetes sidecar, systemd service, etc.)
- Clear security boundary — trusted proxy runs at a different trust level than the untrusted workload, which can be enforced by deployment (separate pods, VMs, nodes)
- Focused testing — each crate has a well-defined responsibility with clear unit-test boundaries and fewer integration dependencies
- Container image size — proxy image carries no SSH server, no Landlock, no seccomp; runtime needs no TLS, no OPA, no HTTP parsing
- Version independence — proxy and runtime can be released independently (with control-plane protocol versioning)
Cons
- Coordination complexity — multi-component topologies require service discovery, health-check sequencing, and CA certificate distribution between components
- Operational surface area — more container images to build, version, scan, and ship
- Network hop latency — inter-pod proxy adds a network round-trip vs. in-process loopback (~0.1ms per request, but measurable at high throughput)
- Secret distribution — CA certificates and credentials must be shared via Kubernetes Secrets / volume mounts rather than in-memory handoff
- Distributed debugging — logs span multiple pods; correlating a proxy denial with the process that triggered it requires structured log correlation (trace IDs)
- Version skew risk — independently released proxy and runtime must maintain protocol compatibility; requires a versioned control-plane contract
- Cold-start overhead — two-component topologies need both components scheduled and ready before the sandbox is usable (sequential readiness)
- Increased CI complexity — three crates × multiple images × compatibility matrix increases the test surface
Example: gVisor Two-Pod Topology
As a concrete example, consider a Kubernetes deployment using gVisor for workload isolation:
- Proxy pod — runs the
openshell-proxy binary in a trusted (non-gVisor) pod. No elevated privileges needed. Receives policy via gRPC from the gateway.
- Agent pod — runs the user workload under a gVisor
RuntimeClass. Environment variables (HTTP_PROXY, HTTPS_PROXY, SSL_CERT_FILE) route all traffic through the proxy service.
- NetworkPolicy — K8s native resource restricts agent egress to proxy + kube-dns only. The CNI enforces this below the kernel — gVisor prevents raw socket bypass.
- CA Secret — proxy's generated CA certificate is distributed via a Kubernetes Secret mounted into the agent pod for TLS inspection trust.
This topology is impossible with the monolithic openshell-sandbox because the proxy cannot be deployed without the runtime, and gVisor makes the runtime redundant.
Alternatives Considered
Feature flags in a single crate — Reduces deployable flexibility. Still compiles and ships unused code. Cannot produce a proxy-only container image without cfg complexity.
Microkernel plugin architecture — Over-engineered for two stable, well-defined responsibilities. The proxy/runtime boundary maps directly to trust boundary — it's not arbitrary.
Separate repositories — Too much overhead. These crates share proto definitions, the OCSF logging framework, and release cadence. Monorepo with separate crates provides the right balance.
Agent Investigation
Explored crates/openshell-sandbox/ on main:
src/lib.rs: 3,143 lines — orchestration, gRPC client, policy loading, supervisor sessions. Imports all modules directly.
src/proxy.rs: 4,990 lines — the HTTP CONNECT proxy implementation, connection handling, tunnel management.
src/opa.rs: 4,556 lines — OPA/Rego policy engine, rule compilation, evaluation.
src/l7/: ~3,165 lines — TLS interception (522), L7 routing (1,881), inference (762), plus GraphQL/REST/path modules.
src/ssh.rs: 1,533 lines — embedded russh SSH server, channel handling, credential verification.
src/process.rs: 841 lines — process spawning, lifecycle, signal management.
src/sandbox/linux/: ~2,098 lines — Landlock (480), seccomp (672), netns (946).
- Total monolith: ~17k lines, 82 dependencies in Cargo.toml.
The proxy subsystem (proxy.rs + opa.rs + l7/ + identity.rs + provider_credentials.rs + bypass_monitor.rs + denial_aggregator.rs) accounts for ~70% of the crate by line count but has zero dependency on the Linux isolation modules. The coupling is entirely through lib.rs orchestration — the split boundary is clean.
Problem Statement
The
openshell-sandboxcrate onmainis a ~17k-line monolith that bundles three fundamentally different concerns into one binary:proxy.rs,opa.rs,l7/,identity.rs, etc.)sandbox/linux/,process.rs,ssh.rs)lib.rs,policy.rs,grpc_client.rs)This monolithic design forces a single deployment topology: all three concerns run in one container. Different environments have fundamentally different isolation requirements:
There is no way to use the proxy independently of the runtime, or to deploy them on separate trust boundaries.
Proposed Design
Split
openshell-sandboxinto three crates with clear, independently deployable boundaries:openshell-proxy— Network Policy Enforcement (Data Plane)Standalone HTTP CONNECT proxy deployable as its own container image:
Starts in deny-all mode by default. A controller (gateway, operator, or custom) pushes configuration via the control plane. This makes the proxy usable outside the OpenShell gateway entirely — any system that needs an egress policy filter can use it.
openshell-runtime— Process Isolation PrimitivesLinux-specific sandboxing enforcement library:
Introduces a
CredentialProvidertrait to decouple SSH credential injection from the proxy's internal state.openshell-sandbox— Orchestration (Composition Layer)Thin glue that composes proxy + runtime for the traditional single-container topology. Depends on both crates:
No behavioral changes — this is the same binary produced today, just with cleaner internal boundaries.
Control Plane Interface
The proxy exposes a
ProxyControltrait with two implementations:This trait abstraction allows any topology to push policy/credentials/routes to the proxy uniformly.
Deployment Topologies Enabled
Pros
Cons
Example: gVisor Two-Pod Topology
As a concrete example, consider a Kubernetes deployment using gVisor for workload isolation:
openshell-proxybinary in a trusted (non-gVisor) pod. No elevated privileges needed. Receives policy via gRPC from the gateway.RuntimeClass. Environment variables (HTTP_PROXY,HTTPS_PROXY,SSL_CERT_FILE) route all traffic through the proxy service.This topology is impossible with the monolithic
openshell-sandboxbecause the proxy cannot be deployed without the runtime, and gVisor makes the runtime redundant.Alternatives Considered
Feature flags in a single crate — Reduces deployable flexibility. Still compiles and ships unused code. Cannot produce a proxy-only container image without
cfgcomplexity.Microkernel plugin architecture — Over-engineered for two stable, well-defined responsibilities. The proxy/runtime boundary maps directly to trust boundary — it's not arbitrary.
Separate repositories — Too much overhead. These crates share proto definitions, the OCSF logging framework, and release cadence. Monorepo with separate crates provides the right balance.
Agent Investigation
Explored
crates/openshell-sandbox/onmain:src/lib.rs: 3,143 lines — orchestration, gRPC client, policy loading, supervisor sessions. Imports all modules directly.src/proxy.rs: 4,990 lines — the HTTP CONNECT proxy implementation, connection handling, tunnel management.src/opa.rs: 4,556 lines — OPA/Rego policy engine, rule compilation, evaluation.src/l7/: ~3,165 lines — TLS interception (522), L7 routing (1,881), inference (762), plus GraphQL/REST/path modules.src/ssh.rs: 1,533 lines — embedded russh SSH server, channel handling, credential verification.src/process.rs: 841 lines — process spawning, lifecycle, signal management.src/sandbox/linux/: ~2,098 lines — Landlock (480), seccomp (672), netns (946).The proxy subsystem (
proxy.rs+opa.rs+l7/+identity.rs+provider_credentials.rs+bypass_monitor.rs+denial_aggregator.rs) accounts for ~70% of the crate by line count but has zero dependency on the Linux isolation modules. The coupling is entirely throughlib.rsorchestration — the split boundary is clean.