feat: Support restricted SecurityContextConstraints for managed Kubernetes platforms

### Problem Statement

OpenShell sandbox pods require `CAP_SYS_ADMIN`, `CAP_NET_ADMIN`, `CAP_SYS_PTRACE`, `CAP_SYSLOG`, and `runAsUser: 0`. On Red Hat OpenShift, the default `restricted-v2` SecurityContextConstraint drops all capabilities, enforces `runAsNonRoot: true`, and sets `allowPrivilegeEscalation: false`. Granting a custom SCC with these capabilities weakens the cluster's security posture and requires cluster-admin approval -- a non-starter for many enterprise deployments.

This means OpenShell cannot be deployed on OpenShift (or any managed Kubernetes platform with enforced pod security standards at the `restricted` level) without a security exception that many platform teams cannot allow.

Related: #873 (roadmap for local workstation drivers), #882 (Podman driver / CRI-O compatibility), #579 (closed -- reduce SYS_ADMIN/SYS_PTRACE), #586 (closed -- graceful degradation without netns, decided fail-closed), #398 (CDI for GPU injection, prerequisite for OpenShift GPU support).

### Proposed Design

Add a `Platform` variant to the `NetworkMode` enum. When active, the sandbox supervisor skips network namespace creation, bypass monitoring, and iptables rules, and instead binds the CONNECT proxy to loopback. The Kubernetes driver omits elevated capabilities and `runAsUser: 0` from the pod spec. Egress control is enforced by a Kubernetes NetworkPolicy emitted by the driver. The OPA policy engine, L7 inspection, inference routing, and credential injection continue to function through the loopback proxy.

**Capability elimination path**

| Requirement | Current Location | Platform Mode Alternative |
|---|---|---|
| Network namespace + veth (SYS_ADMIN, NET_ADMIN) | `sandbox/linux/netns.rs` | Skip entirely; Kubernetes NetworkPolicy provides L3/L4 egress control |
| `/proc/<pid>/exe` resolution (SYS_PTRACE) | `procfs.rs` | Use `shareProcessNamespace: true` on the pod; proxy and sandbox share a PID namespace, so same-UID `/proc` reads work without ptrace |
| Privilege drop via setuid/setgid (root) | `process.rs` pre_exec | Container starts as non-root; no privilege drop needed |
| Landlock PathFd opening (root) | `sandbox/linux/mod.rs` Phase 1 | Run Phase 1 as the pod's non-root user; degrade gracefully via existing `best_effort` mode for inaccessible paths |
| dmesg bypass detection (SYSLOG) | `bypass_monitor.rs` | Disabled; already degrades gracefully. Unnecessary when NetworkPolicy enforces egress outside the pod's trust boundary |
| Supervisor sideload via hostPath | `driver.rs:703-732` | Bake supervisor into the sandbox image or use an emptyDir init container |
| Workspace init container (root) | `driver.rs:821-898` | Run as the image's default non-root user (image should have read access to `/sandbox`) |

**Key architectural change**

Move network isolation from "inside the sandbox pod" to "platform-provided, before the pod starts." The CONNECT proxy continues running on `127.0.0.1:3128` for cooperative L7 inspection, OPA policy evaluation, credential injection, and inference routing. Kubernetes NetworkPolicy acts as the hard L3/L4 enforcement backstop at the CNI level.

This is architecturally consistent with how service meshes operate alongside NetworkPolicy in production Kubernetes clusters: the sidecar proxy handles L7 for cooperative traffic, the platform handles L3/L4 for all traffic.

**Implementation sketch**

1. Add `Platform` to the `NetworkMode` enum in `policy.rs` and a `network_enforcement` field to the proto `SandboxPolicy` message (backward-compatible: default zero value = current `Namespace` mode)
2. In `run_sandbox()` (`lib.rs`), add a `Platform` branch that skips netns creation and bypass monitoring, binds proxy to loopback (existing fallback path at `proxy.rs:158`), and still starts the OPA engine
3. In `spawn_impl()` (`process.rs`), skip `setns()` when `netns_fd` is `None` (already handled), set proxy env vars to `127.0.0.1:3128`, skip privilege drop when already non-root (already handled at `process.rs:431-454`)
4. In the Kubernetes driver (`driver.rs`), conditionally omit `capabilities.add` and `runAsUser: 0`, and emit an egress NetworkPolicy for sandbox pods
5. In `seccomp.rs`, include `Platform` alongside `Proxy` in the `allow_inet` decision (seccomp works without root via `no_new_privs`)

**What still works in Platform mode**

- Seccomp BPF -- `prctl(PR_SET_NO_NEW_PRIVS)` and `seccomp(SET_MODE_FILTER)` do not require any capability
- Landlock -- `restrict_self()` works via `no_new_privs` path; Phase 1 PathFd opening works for user-readable paths, degrades gracefully otherwise
- OPA policy evaluation -- the Rego rules are completely decoupled from the network namespace; they operate on abstract JSON input
- L7 inspection -- for cooperative clients honoring `HTTP_PROXY`
- Credential injection, inference routing, SSRF protection -- all proxy features, unchanged
- Process identity binding -- preserved via `shareProcessNamespace: true`

**What is reduced in Platform mode**

- **Non-cooperative process enforcement**: Processes ignoring `HTTP_PROXY` can attempt direct connections. NetworkPolicy is the enforcement boundary, not the network namespace. This is the primary security trade-off.
- **L7 inspection coverage**: Only applies to cooperative proxy traffic. Non-proxy traffic gets L3/L4 enforcement only.
- **Bypass detection**: No iptables LOG rules, no `/dev/kmsg` monitoring. Replaced by NetworkPolicy deny logging at the CNI level.

**Scope boundaries:**
- This does NOT remove the existing `InPod` mode -- it remains the default for Docker/K3s deployments
- `Platform` mode trades some defense-in-depth (no in-pod netns isolation) for deployability on locked-down platforms
- The platform's NetworkPolicy enforcement is the network isolation layer in this mode
- Landlock + seccomp remain fully functional (both work under `no_new_privs`)

### Alternatives Considered

1. **Runtime capability probing** -- Auto-detect whether `CAP_NET_ADMIN` is available and fall back to Platform mode. Rejected: implicit behavior is harder to reason about, test, and debug. A failed `ip netns add` could be transient rather than a capability restriction. Explicit configuration is recommended.

2. **NetworkPolicy-only (no in-pod proxy)** -- Eliminate the CONNECT proxy entirely. Rejected: loses OPA per-binary policy evaluation, L7 inspection, inference routing, credential injection, and denial aggregator -- all core OpenShell features.

3. **User namespaces** -- Map container root to an unprivileged host UID. Rejected: Kubernetes user namespace support is alpha (KEP-127), not available on OpenShift, and the seccomp filter currently blocks `CLONE_NEWUSER`.

4. **Custom SCC grant** -- Just grant the capabilities. This is what we'd have to do today, but platform teams reject it because it weakens the namespace's security posture. Not a solution for enterprise adoption.

5. **gVisor RuntimeClass** -- Referenced in #4. Would eliminate in-pod namespace manipulation via syscall interception. Not available on OpenShift without a custom RuntimeClass and cluster-admin involvement.

### Agent Investigation

Investigation performed with a coding agent pointed at the repo. Skills loaded: `create-spike`, `generate-sandbox-policy`. Full findings below.

**Architecture overview**

The sandbox employs a defense-in-depth model with six layers, three of which require elevated capabilities:

| Layer | Mechanism | Requires Elevated Caps | Platform Mode |
|-------|-----------|:---:|---|
| Seccomp BPF | `prctl(PR_SET_NO_NEW_PRIVS)` + `seccomp(SET_MODE_FILTER)` | No | Works unchanged |
| Landlock LSM | Phase 1 PathFds + Phase 2 `restrict_self()` | Phase 1 needs root for restricted paths | Degrades gracefully via `best_effort` |
| Network namespace + veth | `ip netns add`, `ip link add`, `setns()` | Yes (SYS_ADMIN, NET_ADMIN) | Replaced by NetworkPolicy |
| iptables bypass detection | OUTPUT chain LOG + REJECT rules | Yes (NET_ADMIN) | Disabled |
| Process identity via procfs | `/proc/<pid>/exe`, `/proc/<pid>/fd/` | Yes (SYS_PTRACE for cross-user) | Works via `shareProcessNamespace` |
| Bypass monitor via dmesg | `dmesg --follow` | Yes (SYSLOG) | Disabled |

**Code references**

| Location | Description |
|----------|-------------|
| `crates/openshell-driver-kubernetes/src/driver.rs:1100-1113` | Hardcoded `capabilities.add: ["SYS_ADMIN", "NET_ADMIN", "SYS_PTRACE", "SYSLOG"]` |
| `crates/openshell-driver-kubernetes/src/driver.rs:748-804` | `apply_supervisor_sideload()` forces `runAsUser: 0` |
| `crates/openshell-driver-kubernetes/src/driver.rs:821-898` | Workspace init container also `runAsUser: 0` |
| `crates/openshell-driver-kubernetes/src/driver.rs:703-732` | Supervisor sideload via hostPath volume (also blocked by `restricted-v2`) |
| `crates/openshell-sandbox/src/policy.rs:59-65` | `NetworkMode` enum: `Block`, `Proxy`, `Allow` -- no `Platform` variant |
| `crates/openshell-sandbox/src/policy.rs:98-119` | `TryFrom<ProtoSandboxPolicy>` unconditionally forces `NetworkMode::Proxy` |
| `crates/openshell-sandbox/src/lib.rs:376-412` | Netns creation gated on `NetworkMode::Proxy` -- fatal failure if caps unavailable |
| `crates/openshell-sandbox/src/lib.rs:423-481` | Proxy startup, identity cache, OPA engine gated on `Proxy` mode |
| `crates/openshell-sandbox/src/process.rs:144-262` | `spawn_impl()`: `setns()` at 236, `drop_privileges()` at 245, Landlock+seccomp at 255 |
| `crates/openshell-sandbox/src/process.rs:171-193` | Proxy URL env var injection gated on `NetworkMode::Proxy` |
| `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs:28-44` | Seccomp: `prctl(PR_SET_NO_NEW_PRIVS)` + `apply_filter()` -- confirmed no root needed |
| `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs:29` | `allow_inet` decision based on network mode |
| `crates/openshell-sandbox/src/sandbox/linux/netns.rs:53-178` | `NetworkNamespace::create()` -- requires root + CAP_NET_ADMIN |
| `crates/openshell-sandbox/src/sandbox/linux/netns.rs:252-331` | `install_bypass_rules()` -- iptables inside netns |
| `crates/openshell-sandbox/src/bypass_monitor.rs:117-292` | `spawn()` -- requires CAP_SYSLOG |
| `crates/openshell-sandbox/src/procfs.rs:49-79` | `binary_path()` -- `/proc/<pid>/exe`, needs SYS_PTRACE across users |
| `crates/openshell-sandbox/src/procfs.rs:276-315` | `find_pid_by_socket_inode()` -- `/proc/<pid>/fd/` scanning |
| `crates/openshell-sandbox/src/proxy.rs:143-159` | `start_with_bind_addr()` -- proxy binds to veth host IP or loopback |
| `proto/sandbox.proto:17-28` | `SandboxPolicy` message -- no network mode field currently |
| `proto/compute_driver.proto` | `DriverSandboxTemplate.platform_config` -- existing opaque extensibility point |
| `deploy/helm/openshell/templates/networkpolicy.yaml` | Existing ingress-only NetworkPolicy |

**OPA/Rego decoupling (from `generate-sandbox-policy` investigation)**

The Rego rules (`crates/openshell-sandbox/data/sandbox-policy.rego`) have **zero dependency on the network namespace**. They evaluate against an abstract `input` JSON object containing `host`, `port`, `binary_path`, and `ancestors`. The coupling to in-pod networking exists solely in how that input is constructed -- specifically, process identity resolution via `/proc`. The OPA engine, policy loading, hot-reload, L7 inspection chain, credential injection, and SSRF protection are all mode-agnostic.

**Proto extensibility**

The `SandboxPolicy` proto message can be extended with backward-compatible fields:

```protobuf
enum NetworkEnforcementMode {
  NETWORK_ENFORCEMENT_NAMESPACE = 0;  // default, backward-compatible
  NETWORK_ENFORCEMENT_PLATFORM = 1;
}

message PlatformNetworkConfig {
  string network_policy_name = 1;
  string network_policy_namespace = 2;
  bool shared_pid_namespace = 3;
  string proxy_listen_addr = 4;
}
```

Default zero value preserves current behavior. The existing `DriverSandboxTemplate.platform_config` (`google.protobuf.Struct`) can carry Kubernetes-specific configuration without touching the core policy schema.

**Existing patterns followed**

- `NetworkMode` enum gating pattern: codebase uses `matches!(policy.network.mode, NetworkMode::Proxy)` extensively
- Graceful degradation: Landlock `BestEffort`, bypass monitor `None` return
- Proxy loopback fallback: `proxy.rs:158` already handles binding to `127.0.0.1:3128`
- Helm chart conditional rendering: existing `networkpolicy.yaml` via `.Values.networkPolicy.enabled`
- OCSF event emission for security state changes

**Scope assessment**

- **Complexity:** High
- **Confidence:** Medium (core approach is sound; design decisions needed for NetworkPolicy reconciliation, init container alternatives, identity resolution degradation)
- **Estimated files to change:** 12-15
- **Issue type:** `feat`

**Risks & open questions:**
1. NetworkPolicy operates at IP/port level, not per-binary or per-request -- fundamental security downgrade from in-pod proxy model. Proxy on loopback + NetworkPolicy as backstop is the mitigation. How much trust do we place in NetworkPolicy as sole enforcement?
2. Dynamic NetworkPolicy updates: if OPA network policies change at runtime, the driver needs a reconciliation loop to keep Kubernetes NetworkPolicy in sync. Significant new subsystem.
3. Init container without root: workspace persistence init container (driver.rs:821-898) uses `runAsUser: 0`. Needs alternative seeding strategy.
4. Landlock without root: Phase 1 opens PathFds as root. Without root, `BestEffort` mode handles inaccessible paths gracefully. Should Platform mode force `BestEffort`?
5. hostPath volume for supervisor sideload (driver.rs:703-732) is also blocked by `restricted-v2`. Supervisor must be baked into the sandbox image or use emptyDir.
6. Proxy bypass: without netns, processes ignoring `HTTP_PROXY` connect directly. NetworkPolicy is the only enforcement. Prominently document this trade-off.

### Checklist

- [X] I've reviewed existing issues and the architecture docs
- [X] This is a design proposal, not a "please build this" request


Location	Description
`crates/openshell-driver-kubernetes/src/driver.rs:1100-1113`	Hardcoded `capabilities.add: ["SYS_ADMIN", "NET_ADMIN", "SYS_PTRACE", "SYSLOG"]`
`crates/openshell-driver-kubernetes/src/driver.rs:748-804`	`apply_supervisor_sideload()` forces `runAsUser: 0`
`crates/openshell-driver-kubernetes/src/driver.rs:821-898`	Workspace init container also `runAsUser: 0`
`crates/openshell-driver-kubernetes/src/driver.rs:703-732`	Supervisor sideload via hostPath volume (also blocked by `restricted-v2`)
`crates/openshell-sandbox/src/policy.rs:59-65`	`NetworkMode` enum: `Block`, `Proxy`, `Allow` -- no `Platform` variant
`crates/openshell-sandbox/src/policy.rs:98-119`	`TryFrom<ProtoSandboxPolicy>` unconditionally forces `NetworkMode::Proxy`
`crates/openshell-sandbox/src/lib.rs:376-412`	Netns creation gated on `NetworkMode::Proxy` -- fatal failure if caps unavailable
`crates/openshell-sandbox/src/lib.rs:423-481`	Proxy startup, identity cache, OPA engine gated on `Proxy` mode
`crates/openshell-sandbox/src/process.rs:144-262`	`spawn_impl()`: `setns()` at 236, `drop_privileges()` at 245, Landlock+seccomp at 255
`crates/openshell-sandbox/src/process.rs:171-193`	Proxy URL env var injection gated on `NetworkMode::Proxy`
`crates/openshell-sandbox/src/sandbox/linux/seccomp.rs:28-44`	Seccomp: `prctl(PR_SET_NO_NEW_PRIVS)` + `apply_filter()` -- confirmed no root needed
`crates/openshell-sandbox/src/sandbox/linux/seccomp.rs:29`	`allow_inet` decision based on network mode
`crates/openshell-sandbox/src/sandbox/linux/netns.rs:53-178`	`NetworkNamespace::create()` -- requires root + CAP_NET_ADMIN
`crates/openshell-sandbox/src/sandbox/linux/netns.rs:252-331`	`install_bypass_rules()` -- iptables inside netns
`crates/openshell-sandbox/src/bypass_monitor.rs:117-292`	`spawn()` -- requires CAP_SYSLOG
`crates/openshell-sandbox/src/procfs.rs:49-79`	`binary_path()` -- `/proc/<pid>/exe`, needs SYS_PTRACE across users
`crates/openshell-sandbox/src/procfs.rs:276-315`	`find_pid_by_socket_inode()` -- `/proc/<pid>/fd/` scanning
`crates/openshell-sandbox/src/proxy.rs:143-159`	`start_with_bind_addr()` -- proxy binds to veth host IP or loopback
`proto/sandbox.proto:17-28`	`SandboxPolicy` message -- no network mode field currently
`proto/compute_driver.proto`	`DriverSandboxTemplate.platform_config` -- existing opaque extensibility point
`deploy/helm/openshell/templates/networkpolicy.yaml`	Existing ingress-only NetworkPolicy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support restricted SecurityContextConstraints for managed Kubernetes platforms #899

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Requirement	Current Location	Platform Mode Alternative
Network namespace + veth (SYS_ADMIN, NET_ADMIN)	`sandbox/linux/netns.rs`	Skip entirely; Kubernetes NetworkPolicy provides L3/L4 egress control
`/proc/<pid>/exe` resolution (SYS_PTRACE)	`procfs.rs`	Use `shareProcessNamespace: true` on the pod; proxy and sandbox share a PID namespace, so same-UID `/proc` reads work without ptrace
Privilege drop via setuid/setgid (root)	`process.rs` pre_exec	Container starts as non-root; no privilege drop needed
Landlock PathFd opening (root)	`sandbox/linux/mod.rs` Phase 1	Run Phase 1 as the pod's non-root user; degrade gracefully via existing `best_effort` mode for inaccessible paths
dmesg bypass detection (SYSLOG)	`bypass_monitor.rs`	Disabled; already degrades gracefully. Unnecessary when NetworkPolicy enforces egress outside the pod's trust boundary
Supervisor sideload via hostPath	`driver.rs:703-732`	Bake supervisor into the sandbox image or use an emptyDir init container
Workspace init container (root)	`driver.rs:821-898`	Run as the image's default non-root user (image should have read access to `/sandbox`)

Layer	Mechanism	Requires Elevated Caps	Platform Mode
Seccomp BPF	`prctl(PR_SET_NO_NEW_PRIVS)` + `seccomp(SET_MODE_FILTER)`	No	Works unchanged
Landlock LSM	Phase 1 PathFds + Phase 2 `restrict_self()`	Phase 1 needs root for restricted paths	Degrades gracefully via `best_effort`
Network namespace + veth	`ip netns add`, `ip link add`, `setns()`	Yes (SYS_ADMIN, NET_ADMIN)	Replaced by NetworkPolicy
iptables bypass detection	OUTPUT chain LOG + REJECT rules	Yes (NET_ADMIN)	Disabled
Process identity via procfs	`/proc/<pid>/exe`, `/proc/<pid>/fd/`	Yes (SYS_PTRACE for cross-user)	Works via `shareProcessNamespace`
Bypass monitor via dmesg	`dmesg --follow`	Yes (SYSLOG)	Disabled

feat: Support restricted SecurityContextConstraints for managed Kubernetes platforms #899

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions