Problem Statement
OpenShell sandbox pods currently run with host-level capabilities (SYS_ADMIN, NET_ADMIN). While the supervisor drops privileges for child processes, a container e
scape vulnerability would land the attacker as root on the host with these capabilities active. Kubernetes v1.36 graduated user namespace support to GA (spec.hostUsers: false), which maps
container UID 0 to an unprivileged host UID and makes capabilities container-scoped. This is a significant defense-in-depth improvement that OpenShell should support.
Proposed Design
Two-layer configuration for enabling user namespaces on sandbox pods:
- Cluster-wide default:
enable_user_namespaces field on the server Config / KubernetesComputeConfig, exposed via the OPENSHELL_ENABLE_USER_NAMESPACES environment variable and the server.enableUserNamespaces Helm value. Defaults to false.
- Per-sandbox override:
optional bool user_namespaces field on the SandboxTemplate proto message. When set, overrides the cluster default. Translated to platform_config.host_users for the Kubernetes driver.
Pod spec changes when enabled:
spec.hostUsers: false is set on sandbox pods, activating Kubernetes user namespace isolation.
- The capability list is extended with
SETUID, SETGID, and DAC_READ_SEARCH (matching the Podman driver). These are needed because the bounding set is reset inside a user namespace: SETUID/SETGID for the supervisor to drop privileges, DAC_READ_SEARCH for cross-UID /proc/<pid>/fd/ access in network policy enforcement.
What stays the same:
- Seccomp filters (
CLONE_NEWUSER block remains — we still don't want nested user namespaces from sandboxed processes).
- Landlock filesystem restrictions (unprivileged, no capabilities needed).
- Supervisor privilege-drop logic.
- Init containers and volume mounts (ID-mapped mounts handle ownership transparently).
Components involved:
proto/openshell.proto — SandboxTemplate.user_namespaces field
crates/openshell-core/src/config.rs — Config.enable_user_namespaces
crates/openshell-driver-kubernetes/src/config.rs — KubernetesComputeConfig.enable_user_namespaces
crates/openshell-driver-kubernetes/src/driver.rs — pod spec generation (hostUsers, capabilities), new platform_config_bool helper
crates/openshell-server/src/cli.rs — CLI arg / env var
crates/openshell-server/src/compute/mod.rs — build_platform_config translation
crates/openshell-server/src/lib.rs — config wiring
deploy/helm/openshell/values.yaml and templates/statefulset.yaml — Helm plumbing
docs/security/best-practices.mdx — user-facing documentation
Additional changes:
- Supervisor hostPath volume type changed from
DirectoryOrCreate to Directory (the path is always pre-provisioned; DirectoryOrCreate could fail under user namespaces when the mapped UID can't create host directories).
- A
warn! is emitted when GPU and user namespaces are both active on the same sandbox (NVIDIA device plugin compatibility is unverified).
Alternatives Considered
-
Always-on user namespaces (no opt-in): Rejected because user namespaces require Kubernetes 1.33+ (beta) or 1.36+ (GA), a supporting container runtime (containerd 2.0+, CRI-O 1.25+), and Linux 5.12+ with a filesystem that supports ID-mapped mounts. Forcing it on would break existing deployments on older clusters.
-
Per-sandbox only (no cluster default): Rejected because operators deploying to a capable cluster should be able to enable user namespaces once for all sandboxes rather than setting it on each sandbox creation request.
-
Typed field on DriverSandboxTemplate instead of platform_config passthrough: Rejected because host_users is Kubernetes-specific. The existing platform_config opaque Struct is the correct place for platform-specific knobs, matching the pattern used by runtime_class_name and annotations.
Agent Investigation
Explored the full configuration flow from proto through server to K8s driver:
- Pod spec construction in
crates/openshell-driver-kubernetes/src/driver.rs (sandbox_template_to_k8s, apply_supervisor_sideload)
- Current capability set (
SYS_ADMIN, NET_ADMIN, SYS_PTRACE, SYSLOG) and why each is needed
- Podman driver's user namespace handling in
crates/openshell-driver-podman/src/container.rs (adds SETUID, SETGID, DAC_READ_SEARCH — same pattern adopted here)
- Seccomp filter's
CLONE_NEWUSER block in crates/openshell-sandbox/src/sandbox/linux/seccomp.rs (remains active)
- Network namespace creation in
crates/openshell-sandbox/src/sandbox/linux/netns.rs (uses nsenter instead of ip netns exec to avoid sysfs remount, which requires real CAP_SYS_ADMIN in the host user namespace)
- Helm chart env var wiring pattern in
deploy/helm/openshell/templates/statefulset.yaml
Validated end-to-end on:
- OCP 4.22 (K8s 1.35.3, CRI-O 1.35, RHEL CoreOS, kernel 5.14): full SSH tunnel, workspace init, sandbox command execution with non-identity UID mapping (
0 → 3285581824)
- Native K8s v1.37 (CRI-O 1.36, Fedora, kernel 6.19): pod spec and UID mapping verified
mise run cluster (k3s-in-Docker): pod spec verified, runtime fails due to nested overlayfs lacking ID-mapped mount support (expected and documented)
Known limitations:
- Does not work in Docker-in-Docker / k3s-in-Docker dev clusters (nested overlayfs lacks
MOUNT_ATTR_IDMAP support)
- GPU + user namespaces compatibility is unverified (warning emitted)
- Requires Linux 5.12+ and a supporting container runtime
Checklist
Problem Statement
OpenShell sandbox pods currently run with host-level capabilities (
SYS_ADMIN,NET_ADMIN). While the supervisor drops privileges for child processes, a container escape vulnerability would land the attacker as root on the host with these capabilities active. Kubernetes v1.36 graduated user namespace support to GA (
spec.hostUsers: false), which mapscontainer UID 0 to an unprivileged host UID and makes capabilities container-scoped. This is a significant defense-in-depth improvement that OpenShell should support.
Proposed Design
Two-layer configuration for enabling user namespaces on sandbox pods:
enable_user_namespacesfield on the serverConfig/KubernetesComputeConfig, exposed via theOPENSHELL_ENABLE_USER_NAMESPACESenvironment variable and theserver.enableUserNamespacesHelm value. Defaults tofalse.optional bool user_namespacesfield on theSandboxTemplateproto message. When set, overrides the cluster default. Translated toplatform_config.host_usersfor the Kubernetes driver.Pod spec changes when enabled:
spec.hostUsers: falseis set on sandbox pods, activating Kubernetes user namespace isolation.SETUID,SETGID, andDAC_READ_SEARCH(matching the Podman driver). These are needed because the bounding set is reset inside a user namespace:SETUID/SETGIDfor the supervisor to drop privileges,DAC_READ_SEARCHfor cross-UID/proc/<pid>/fd/access in network policy enforcement.What stays the same:
CLONE_NEWUSERblock remains — we still don't want nested user namespaces from sandboxed processes).Components involved:
proto/openshell.proto—SandboxTemplate.user_namespacesfieldcrates/openshell-core/src/config.rs—Config.enable_user_namespacescrates/openshell-driver-kubernetes/src/config.rs—KubernetesComputeConfig.enable_user_namespacescrates/openshell-driver-kubernetes/src/driver.rs— pod spec generation (hostUsers, capabilities), newplatform_config_boolhelpercrates/openshell-server/src/cli.rs— CLI arg / env varcrates/openshell-server/src/compute/mod.rs—build_platform_configtranslationcrates/openshell-server/src/lib.rs— config wiringdeploy/helm/openshell/values.yamlandtemplates/statefulset.yaml— Helm plumbingdocs/security/best-practices.mdx— user-facing documentationAdditional changes:
DirectoryOrCreatetoDirectory(the path is always pre-provisioned;DirectoryOrCreatecould fail under user namespaces when the mapped UID can't create host directories).warn!is emitted when GPU and user namespaces are both active on the same sandbox (NVIDIA device plugin compatibility is unverified).Alternatives Considered
Always-on user namespaces (no opt-in): Rejected because user namespaces require Kubernetes 1.33+ (beta) or 1.36+ (GA), a supporting container runtime (containerd 2.0+, CRI-O 1.25+), and Linux 5.12+ with a filesystem that supports ID-mapped mounts. Forcing it on would break existing deployments on older clusters.
Per-sandbox only (no cluster default): Rejected because operators deploying to a capable cluster should be able to enable user namespaces once for all sandboxes rather than setting it on each sandbox creation request.
Typed field on
DriverSandboxTemplateinstead ofplatform_configpassthrough: Rejected becausehost_usersis Kubernetes-specific. The existingplatform_configopaque Struct is the correct place for platform-specific knobs, matching the pattern used byruntime_class_nameandannotations.Agent Investigation
Explored the full configuration flow from proto through server to K8s driver:
crates/openshell-driver-kubernetes/src/driver.rs(sandbox_template_to_k8s,apply_supervisor_sideload)SYS_ADMIN,NET_ADMIN,SYS_PTRACE,SYSLOG) and why each is neededcrates/openshell-driver-podman/src/container.rs(addsSETUID,SETGID,DAC_READ_SEARCH— same pattern adopted here)CLONE_NEWUSERblock incrates/openshell-sandbox/src/sandbox/linux/seccomp.rs(remains active)crates/openshell-sandbox/src/sandbox/linux/netns.rs(usesnsenterinstead ofip netns execto avoid sysfs remount, which requires realCAP_SYS_ADMINin the host user namespace)deploy/helm/openshell/templates/statefulset.yamlValidated end-to-end on:
0 → 3285581824)mise run cluster(k3s-in-Docker): pod spec verified, runtime fails due to nested overlayfs lacking ID-mapped mount support (expected and documented)Known limitations:
MOUNT_ATTR_IDMAPsupport)Checklist