You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenShell sandbox pods require CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_PTRACE, CAP_SYSLOG, and runAsUser: 0. On Red Hat OpenShift, the default restricted-v2 SecurityContextConstraint drops all capabilities, enforces runAsNonRoot: true, and sets allowPrivilegeEscalation: false. Granting a custom SCC with these capabilities weakens the cluster's security posture and requires cluster-admin approval -- a non-starter for many enterprise deployments.
This means OpenShell cannot be deployed on OpenShift (or any managed Kubernetes platform with enforced pod security standards at the restricted level) without a security exception that many platform teams cannot allow.
Related: #873 (roadmap for local workstation drivers), #882 (Podman driver / CRI-O compatibility), #579 (closed -- reduce SYS_ADMIN/SYS_PTRACE), #586 (closed -- graceful degradation without netns, decided fail-closed), #398 (CDI for GPU injection, prerequisite for OpenShift GPU support).
Proposed Design
Add a Platform variant to the NetworkMode enum. When active, the sandbox supervisor skips network namespace creation, bypass monitoring, and iptables rules, and instead binds the CONNECT proxy to loopback. The Kubernetes driver omits elevated capabilities and runAsUser: 0 from the pod spec. Egress control is enforced by a Kubernetes NetworkPolicy emitted by the driver. The OPA policy engine, L7 inspection, inference routing, and credential injection continue to function through the loopback proxy.
Capability elimination path
Requirement
Current Location
Platform Mode Alternative
Network namespace + veth (SYS_ADMIN, NET_ADMIN)
sandbox/linux/netns.rs
Skip entirely; Kubernetes NetworkPolicy provides L3/L4 egress control
/proc/<pid>/exe resolution (SYS_PTRACE)
procfs.rs
Use shareProcessNamespace: true on the pod; proxy and sandbox share a PID namespace, so same-UID /proc reads work without ptrace
Privilege drop via setuid/setgid (root)
process.rs pre_exec
Container starts as non-root; no privilege drop needed
Landlock PathFd opening (root)
sandbox/linux/mod.rs Phase 1
Run Phase 1 as the pod's non-root user; degrade gracefully via existing best_effort mode for inaccessible paths
dmesg bypass detection (SYSLOG)
bypass_monitor.rs
Disabled; already degrades gracefully. Unnecessary when NetworkPolicy enforces egress outside the pod's trust boundary
Supervisor sideload via hostPath
driver.rs:703-732
Bake supervisor into the sandbox image or use an emptyDir init container
Workspace init container (root)
driver.rs:821-898
Run as the image's default non-root user (image should have read access to /sandbox)
Key architectural change
Move network isolation from "inside the sandbox pod" to "platform-provided, before the pod starts." The CONNECT proxy continues running on 127.0.0.1:3128 for cooperative L7 inspection, OPA policy evaluation, credential injection, and inference routing. Kubernetes NetworkPolicy acts as the hard L3/L4 enforcement backstop at the CNI level.
This is architecturally consistent with how service meshes operate alongside NetworkPolicy in production Kubernetes clusters: the sidecar proxy handles L7 for cooperative traffic, the platform handles L3/L4 for all traffic.
Implementation sketch
Add Platform to the NetworkMode enum in policy.rs and a network_enforcement field to the proto SandboxPolicy message (backward-compatible: default zero value = current Namespace mode)
In run_sandbox() (lib.rs), add a Platform branch that skips netns creation and bypass monitoring, binds proxy to loopback (existing fallback path at proxy.rs:158), and still starts the OPA engine
In spawn_impl() (process.rs), skip setns() when netns_fd is None (already handled), set proxy env vars to 127.0.0.1:3128, skip privilege drop when already non-root (already handled at process.rs:431-454)
In the Kubernetes driver (driver.rs), conditionally omit capabilities.add and runAsUser: 0, and emit an egress NetworkPolicy for sandbox pods
In seccomp.rs, include Platform alongside Proxy in the allow_inet decision (seccomp works without root via no_new_privs)
What still works in Platform mode
Seccomp BPF -- prctl(PR_SET_NO_NEW_PRIVS) and seccomp(SET_MODE_FILTER) do not require any capability
Landlock -- restrict_self() works via no_new_privs path; Phase 1 PathFd opening works for user-readable paths, degrades gracefully otherwise
OPA policy evaluation -- the Rego rules are completely decoupled from the network namespace; they operate on abstract JSON input
L7 inspection -- for cooperative clients honoring HTTP_PROXY
Process identity binding -- preserved via shareProcessNamespace: true
What is reduced in Platform mode
Non-cooperative process enforcement: Processes ignoring HTTP_PROXY can attempt direct connections. NetworkPolicy is the enforcement boundary, not the network namespace. This is the primary security trade-off.
L7 inspection coverage: Only applies to cooperative proxy traffic. Non-proxy traffic gets L3/L4 enforcement only.
Bypass detection: No iptables LOG rules, no /dev/kmsg monitoring. Replaced by NetworkPolicy deny logging at the CNI level.
Scope boundaries:
This does NOT remove the existing InPod mode -- it remains the default for Docker/K3s deployments
Platform mode trades some defense-in-depth (no in-pod netns isolation) for deployability on locked-down platforms
The platform's NetworkPolicy enforcement is the network isolation layer in this mode
Landlock + seccomp remain fully functional (both work under no_new_privs)
Alternatives Considered
Runtime capability probing -- Auto-detect whether CAP_NET_ADMIN is available and fall back to Platform mode. Rejected: implicit behavior is harder to reason about, test, and debug. A failed ip netns add could be transient rather than a capability restriction. Explicit configuration is recommended.
NetworkPolicy-only (no in-pod proxy) -- Eliminate the CONNECT proxy entirely. Rejected: loses OPA per-binary policy evaluation, L7 inspection, inference routing, credential injection, and denial aggregator -- all core OpenShell features.
User namespaces -- Map container root to an unprivileged host UID. Rejected: Kubernetes user namespace support is alpha (KEP-127), not available on OpenShift, and the seccomp filter currently blocks CLONE_NEWUSER.
Custom SCC grant -- Just grant the capabilities. This is what we'd have to do today, but platform teams reject it because it weakens the namespace's security posture. Not a solution for enterprise adoption.
The Rego rules (crates/openshell-sandbox/data/sandbox-policy.rego) have zero dependency on the network namespace. They evaluate against an abstract input JSON object containing host, port, binary_path, and ancestors. The coupling to in-pod networking exists solely in how that input is constructed -- specifically, process identity resolution via /proc. The OPA engine, policy loading, hot-reload, L7 inspection chain, credential injection, and SSRF protection are all mode-agnostic.
Proto extensibility
The SandboxPolicy proto message can be extended with backward-compatible fields:
Default zero value preserves current behavior. The existing DriverSandboxTemplate.platform_config (google.protobuf.Struct) can carry Kubernetes-specific configuration without touching the core policy schema.
Proxy loopback fallback: proxy.rs:158 already handles binding to 127.0.0.1:3128
Helm chart conditional rendering: existing networkpolicy.yaml via .Values.networkPolicy.enabled
OCSF event emission for security state changes
Scope assessment
Complexity: High
Confidence: Medium (core approach is sound; design decisions needed for NetworkPolicy reconciliation, init container alternatives, identity resolution degradation)
Estimated files to change: 12-15
Issue type:feat
Risks & open questions:
NetworkPolicy operates at IP/port level, not per-binary or per-request -- fundamental security downgrade from in-pod proxy model. Proxy on loopback + NetworkPolicy as backstop is the mitigation. How much trust do we place in NetworkPolicy as sole enforcement?
Dynamic NetworkPolicy updates: if OPA network policies change at runtime, the driver needs a reconciliation loop to keep Kubernetes NetworkPolicy in sync. Significant new subsystem.
Init container without root: workspace persistence init container (driver.rs:821-898) uses runAsUser: 0. Needs alternative seeding strategy.
Landlock without root: Phase 1 opens PathFds as root. Without root, BestEffort mode handles inaccessible paths gracefully. Should Platform mode force BestEffort?
hostPath volume for supervisor sideload (driver.rs:703-732) is also blocked by restricted-v2. Supervisor must be baked into the sandbox image or use emptyDir.
Proxy bypass: without netns, processes ignoring HTTP_PROXY connect directly. NetworkPolicy is the only enforcement. Prominently document this trade-off.
Checklist
I've reviewed existing issues and the architecture docs
This is a design proposal, not a "please build this" request
Problem Statement
OpenShell sandbox pods require
CAP_SYS_ADMIN,CAP_NET_ADMIN,CAP_SYS_PTRACE,CAP_SYSLOG, andrunAsUser: 0. On Red Hat OpenShift, the defaultrestricted-v2SecurityContextConstraint drops all capabilities, enforcesrunAsNonRoot: true, and setsallowPrivilegeEscalation: false. Granting a custom SCC with these capabilities weakens the cluster's security posture and requires cluster-admin approval -- a non-starter for many enterprise deployments.This means OpenShell cannot be deployed on OpenShift (or any managed Kubernetes platform with enforced pod security standards at the
restrictedlevel) without a security exception that many platform teams cannot allow.Related: #873 (roadmap for local workstation drivers), #882 (Podman driver / CRI-O compatibility), #579 (closed -- reduce SYS_ADMIN/SYS_PTRACE), #586 (closed -- graceful degradation without netns, decided fail-closed), #398 (CDI for GPU injection, prerequisite for OpenShift GPU support).
Proposed Design
Add a
Platformvariant to theNetworkModeenum. When active, the sandbox supervisor skips network namespace creation, bypass monitoring, and iptables rules, and instead binds the CONNECT proxy to loopback. The Kubernetes driver omits elevated capabilities andrunAsUser: 0from the pod spec. Egress control is enforced by a Kubernetes NetworkPolicy emitted by the driver. The OPA policy engine, L7 inspection, inference routing, and credential injection continue to function through the loopback proxy.Capability elimination path
sandbox/linux/netns.rs/proc/<pid>/exeresolution (SYS_PTRACE)procfs.rsshareProcessNamespace: trueon the pod; proxy and sandbox share a PID namespace, so same-UID/procreads work without ptraceprocess.rspre_execsandbox/linux/mod.rsPhase 1best_effortmode for inaccessible pathsbypass_monitor.rsdriver.rs:703-732driver.rs:821-898/sandbox)Key architectural change
Move network isolation from "inside the sandbox pod" to "platform-provided, before the pod starts." The CONNECT proxy continues running on
127.0.0.1:3128for cooperative L7 inspection, OPA policy evaluation, credential injection, and inference routing. Kubernetes NetworkPolicy acts as the hard L3/L4 enforcement backstop at the CNI level.This is architecturally consistent with how service meshes operate alongside NetworkPolicy in production Kubernetes clusters: the sidecar proxy handles L7 for cooperative traffic, the platform handles L3/L4 for all traffic.
Implementation sketch
Platformto theNetworkModeenum inpolicy.rsand anetwork_enforcementfield to the protoSandboxPolicymessage (backward-compatible: default zero value = currentNamespacemode)run_sandbox()(lib.rs), add aPlatformbranch that skips netns creation and bypass monitoring, binds proxy to loopback (existing fallback path atproxy.rs:158), and still starts the OPA enginespawn_impl()(process.rs), skipsetns()whennetns_fdisNone(already handled), set proxy env vars to127.0.0.1:3128, skip privilege drop when already non-root (already handled atprocess.rs:431-454)driver.rs), conditionally omitcapabilities.addandrunAsUser: 0, and emit an egress NetworkPolicy for sandbox podsseccomp.rs, includePlatformalongsideProxyin theallow_inetdecision (seccomp works without root viano_new_privs)What still works in Platform mode
prctl(PR_SET_NO_NEW_PRIVS)andseccomp(SET_MODE_FILTER)do not require any capabilityrestrict_self()works viano_new_privspath; Phase 1 PathFd opening works for user-readable paths, degrades gracefully otherwiseHTTP_PROXYshareProcessNamespace: trueWhat is reduced in Platform mode
HTTP_PROXYcan attempt direct connections. NetworkPolicy is the enforcement boundary, not the network namespace. This is the primary security trade-off./dev/kmsgmonitoring. Replaced by NetworkPolicy deny logging at the CNI level.Scope boundaries:
InPodmode -- it remains the default for Docker/K3s deploymentsPlatformmode trades some defense-in-depth (no in-pod netns isolation) for deployability on locked-down platformsno_new_privs)Alternatives Considered
Runtime capability probing -- Auto-detect whether
CAP_NET_ADMINis available and fall back to Platform mode. Rejected: implicit behavior is harder to reason about, test, and debug. A failedip netns addcould be transient rather than a capability restriction. Explicit configuration is recommended.NetworkPolicy-only (no in-pod proxy) -- Eliminate the CONNECT proxy entirely. Rejected: loses OPA per-binary policy evaluation, L7 inspection, inference routing, credential injection, and denial aggregator -- all core OpenShell features.
User namespaces -- Map container root to an unprivileged host UID. Rejected: Kubernetes user namespace support is alpha (KEP-127), not available on OpenShift, and the seccomp filter currently blocks
CLONE_NEWUSER.Custom SCC grant -- Just grant the capabilities. This is what we'd have to do today, but platform teams reject it because it weakens the namespace's security posture. Not a solution for enterprise adoption.
gVisor RuntimeClass -- Referenced in Evaluate sandbox isolation options: gVisor runtime, Firecracker microVMs, or cluster-in-VM #4. Would eliminate in-pod namespace manipulation via syscall interception. Not available on OpenShift without a custom RuntimeClass and cluster-admin involvement.
Agent Investigation
Investigation performed with a coding agent pointed at the repo. Skills loaded:
create-spike,generate-sandbox-policy. Full findings below.Architecture overview
The sandbox employs a defense-in-depth model with six layers, three of which require elevated capabilities:
prctl(PR_SET_NO_NEW_PRIVS)+seccomp(SET_MODE_FILTER)restrict_self()best_effortip netns add,ip link add,setns()/proc/<pid>/exe,/proc/<pid>/fd/shareProcessNamespacedmesg --followCode references
crates/openshell-driver-kubernetes/src/driver.rs:1100-1113capabilities.add: ["SYS_ADMIN", "NET_ADMIN", "SYS_PTRACE", "SYSLOG"]crates/openshell-driver-kubernetes/src/driver.rs:748-804apply_supervisor_sideload()forcesrunAsUser: 0crates/openshell-driver-kubernetes/src/driver.rs:821-898runAsUser: 0crates/openshell-driver-kubernetes/src/driver.rs:703-732restricted-v2)crates/openshell-sandbox/src/policy.rs:59-65NetworkModeenum:Block,Proxy,Allow-- noPlatformvariantcrates/openshell-sandbox/src/policy.rs:98-119TryFrom<ProtoSandboxPolicy>unconditionally forcesNetworkMode::Proxycrates/openshell-sandbox/src/lib.rs:376-412NetworkMode::Proxy-- fatal failure if caps unavailablecrates/openshell-sandbox/src/lib.rs:423-481Proxymodecrates/openshell-sandbox/src/process.rs:144-262spawn_impl():setns()at 236,drop_privileges()at 245, Landlock+seccomp at 255crates/openshell-sandbox/src/process.rs:171-193NetworkMode::Proxycrates/openshell-sandbox/src/sandbox/linux/seccomp.rs:28-44prctl(PR_SET_NO_NEW_PRIVS)+apply_filter()-- confirmed no root neededcrates/openshell-sandbox/src/sandbox/linux/seccomp.rs:29allow_inetdecision based on network modecrates/openshell-sandbox/src/sandbox/linux/netns.rs:53-178NetworkNamespace::create()-- requires root + CAP_NET_ADMINcrates/openshell-sandbox/src/sandbox/linux/netns.rs:252-331install_bypass_rules()-- iptables inside netnscrates/openshell-sandbox/src/bypass_monitor.rs:117-292spawn()-- requires CAP_SYSLOGcrates/openshell-sandbox/src/procfs.rs:49-79binary_path()--/proc/<pid>/exe, needs SYS_PTRACE across userscrates/openshell-sandbox/src/procfs.rs:276-315find_pid_by_socket_inode()--/proc/<pid>/fd/scanningcrates/openshell-sandbox/src/proxy.rs:143-159start_with_bind_addr()-- proxy binds to veth host IP or loopbackproto/sandbox.proto:17-28SandboxPolicymessage -- no network mode field currentlyproto/compute_driver.protoDriverSandboxTemplate.platform_config-- existing opaque extensibility pointdeploy/helm/openshell/templates/networkpolicy.yamlOPA/Rego decoupling (from
generate-sandbox-policyinvestigation)The Rego rules (
crates/openshell-sandbox/data/sandbox-policy.rego) have zero dependency on the network namespace. They evaluate against an abstractinputJSON object containinghost,port,binary_path, andancestors. The coupling to in-pod networking exists solely in how that input is constructed -- specifically, process identity resolution via/proc. The OPA engine, policy loading, hot-reload, L7 inspection chain, credential injection, and SSRF protection are all mode-agnostic.Proto extensibility
The
SandboxPolicyproto message can be extended with backward-compatible fields:Default zero value preserves current behavior. The existing
DriverSandboxTemplate.platform_config(google.protobuf.Struct) can carry Kubernetes-specific configuration without touching the core policy schema.Existing patterns followed
NetworkModeenum gating pattern: codebase usesmatches!(policy.network.mode, NetworkMode::Proxy)extensivelyBestEffort, bypass monitorNonereturnproxy.rs:158already handles binding to127.0.0.1:3128networkpolicy.yamlvia.Values.networkPolicy.enabledScope assessment
featRisks & open questions:
runAsUser: 0. Needs alternative seeding strategy.BestEffortmode handles inaccessible paths gracefully. Should Platform mode forceBestEffort?restricted-v2. Supervisor must be baked into the sandbox image or use emptyDir.HTTP_PROXYconnect directly. NetworkPolicy is the only enforcement. Prominently document this trade-off.Checklist