Kubernetes User Namespace Isolation for Sandbox Pods

### Problem Statement

OpenShell sandbox pods currently run with host-level capabilities (`SYS_ADMIN`, `NET_ADMIN`). While the supervisor drops privileges for child processes, a container e
scape vulnerability would land the attacker as root on the host with these capabilities active. Kubernetes v1.36 graduated user namespace support to GA (`spec.hostUsers: false`), which maps 
container UID 0 to an unprivileged host UID and makes capabilities container-scoped. This is a significant defense-in-depth improvement that OpenShell should support.  

### Proposed Design

**Two-layer configuration** for enabling user namespaces on sandbox pods:

- **Cluster-wide default**: `enable_user_namespaces` field on the server `Config` / `KubernetesComputeConfig`, exposed via the `OPENSHELL_ENABLE_USER_NAMESPACES` environment variable and the `server.enableUserNamespaces` Helm value. Defaults to `false`.
- **Per-sandbox override**: `optional bool user_namespaces` field on the `SandboxTemplate` proto message. When set, overrides the cluster default. Translated to `platform_config.host_users` for the Kubernetes driver.

**Pod spec changes when enabled:**
- `spec.hostUsers: false` is set on sandbox pods, activating Kubernetes user namespace isolation.
- The capability list is extended with `SETUID`, `SETGID`, and `DAC_READ_SEARCH` (matching the Podman driver). These are needed because the bounding set is reset inside a user namespace: `SETUID`/`SETGID` for the supervisor to drop privileges, `DAC_READ_SEARCH` for cross-UID `/proc/<pid>/fd/` access in network policy enforcement.

**What stays the same:**
- Seccomp filters (`CLONE_NEWUSER` block remains — we still don't want nested user namespaces from sandboxed processes).
- Landlock filesystem restrictions (unprivileged, no capabilities needed).
- Supervisor privilege-drop logic.
- Init containers and volume mounts (ID-mapped mounts handle ownership transparently).

**Components involved:**
- `proto/openshell.proto` — `SandboxTemplate.user_namespaces` field
- `crates/openshell-core/src/config.rs` — `Config.enable_user_namespaces`
- `crates/openshell-driver-kubernetes/src/config.rs` — `KubernetesComputeConfig.enable_user_namespaces`
- `crates/openshell-driver-kubernetes/src/driver.rs` — pod spec generation (`hostUsers`, capabilities), new `platform_config_bool` helper
- `crates/openshell-server/src/cli.rs` — CLI arg / env var
- `crates/openshell-server/src/compute/mod.rs` — `build_platform_config` translation
- `crates/openshell-server/src/lib.rs` — config wiring
- `deploy/helm/openshell/values.yaml` and `templates/statefulset.yaml` — Helm plumbing
- `docs/security/best-practices.mdx` — user-facing documentation

**Additional changes:**
- Supervisor hostPath volume type changed from `DirectoryOrCreate` to `Directory` (the path is always pre-provisioned; `DirectoryOrCreate` could fail under user namespaces when the mapped UID can't create host directories).
- A `warn!` is emitted when GPU and user namespaces are both active on the same sandbox (NVIDIA device plugin compatibility is unverified).



### Alternatives Considered

1. **Always-on user namespaces (no opt-in)**: Rejected because user namespaces require Kubernetes 1.33+ (beta) or 1.36+ (GA), a supporting container runtime (containerd 2.0+, CRI-O 1.25+), and Linux 5.12+ with a filesystem that supports ID-mapped mounts. Forcing it on would break existing deployments on older clusters.

2. **Per-sandbox only (no cluster default)**: Rejected because operators deploying to a capable cluster should be able to enable user namespaces once for all sandboxes rather than setting it on each sandbox creation request.

3. **Typed field on `DriverSandboxTemplate` instead of `platform_config` passthrough**: Rejected because `host_users` is Kubernetes-specific. The existing `platform_config` opaque Struct is the correct place for platform-specific knobs, matching the pattern used by `runtime_class_name` and `annotations`.


### Agent Investigation

Explored the full configuration flow from proto through server to K8s driver:
- Pod spec construction in `crates/openshell-driver-kubernetes/src/driver.rs` (`sandbox_template_to_k8s`, `apply_supervisor_sideload`)
- Current capability set (`SYS_ADMIN`, `NET_ADMIN`, `SYS_PTRACE`, `SYSLOG`) and why each is needed
- Podman driver's user namespace handling in `crates/openshell-driver-podman/src/container.rs` (adds `SETUID`, `SETGID`, `DAC_READ_SEARCH` — same pattern adopted here)
- Seccomp filter's `CLONE_NEWUSER` block in `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs` (remains active)
- Network namespace creation in `crates/openshell-sandbox/src/sandbox/linux/netns.rs` (uses `nsenter` instead of `ip netns exec` to avoid sysfs remount, which requires real `CAP_SYS_ADMIN` in the host user namespace)
- Helm chart env var wiring pattern in `deploy/helm/openshell/templates/statefulset.yaml`

**Validated end-to-end on:**
- OCP 4.22 (K8s 1.35.3, CRI-O 1.35, RHEL CoreOS, kernel 5.14): full SSH tunnel, workspace init, sandbox command execution with non-identity UID mapping (`0 → 3285581824`)
- Native K8s v1.37 (CRI-O 1.36, Fedora, kernel 6.19): pod spec and UID mapping verified
- `mise run cluster` (k3s-in-Docker): pod spec verified, runtime fails due to nested overlayfs lacking ID-mapped mount support (expected and documented)

**Known limitations:**
- Does not work in Docker-in-Docker / k3s-in-Docker dev clusters (nested overlayfs lacks `MOUNT_ATTR_IDMAP` support)
- GPU + user namespaces compatibility is unverified (warning emitted)
- Requires Linux 5.12+ and a supporting container runtime


### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes User Namespace Isolation for Sandbox Pods #982

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kubernetes User Namespace Isolation for Sandbox Pods #982

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions