Proposal: Split Supervisor and Agent into Separate Pods with gVisor Isolation

### Problem Statement

Security concerns as documented in [NVIDIA/OpenShell#899](https://github.com/NVIDIA/OpenShell/issues/899).

### Proposed Design

# RFC: Split Supervisor and Agent into Separate Pods with gVisor Isolation

**Status:** Draft
**Date:** 2026-04-26
**Authors:** @marwinski, @gehoern, @kon-angelo

---

# 0. Executive Summary

This proposal extends the Kubernetes compute driver of OpenShell with an alternative deployment mode that significantly strengthens sandbox security while preserving the core OpenShell feature set.

In the current architecture, the supervisor and agent share a single pod. The supervisor runs as root with elevated capabilities — a design that works but creates a wide blast radius if the agent escapes its confinement, and is incompatible with restricted Kubernetes environments such as provided by OpenShift and Gardener.

The proposed **split-pod model** separates the supervisor and agent into distinct pods. The agent pod runs with zero capabilities, as non-root, and optionally under a gVisor or Kata `RuntimeClass` that interposes a userspace kernel between the agent and the host. A Kubernetes NetworkPolicy restricts the agent's network access to the supervisor's HTTP CONNECT proxy — the same proxy that enforces L7 policy, credential injection, and inference routing today.

Key OpenShell features — Landlock filesystem restrictions, SSH access, and the NSSH1 authentication protocol — are retained by injecting a trusted OpenShell binary into the agent pod via the Kubernetes image volumes mechanism. This sideloaded binary runs as the pod entrypoint, configures Landlock (on non-gVisor runtimes), starts the SSH server, and then exec-s the agent process. It requires no elevated capabilities and no cooperation from the agent image. Each feature (sideloading, Landlock, SSH) is independently configurable and can be disabled per sandbox policy.

The existing single-pod (`InPod`) mode remains the default and is unchanged. The split-pod model is opt-in, targeting enterprise deployments, managed Kubernetes platforms with restrictive security policies, and environments where stronger workload isolation is required. The only feature lost in the transition is per-binary destination restrictions, which depended on a shared PID namespace between supervisor and agent.

## 1. Motivation and Criticism of Current Architecture

This RFC addresses security concerns with the current single-pod sandbox architecture, as raised in [NVIDIA/OpenShell#899](https://github.com/NVIDIA/OpenShell/issues/899) and internal review.

### 1.1 Current Architecture

Today, the supervisor and agent workload run in a single Kubernetes pod. The supervisor (PID 1) is privileged (`CAP_SYS_ADMIN`, `CAP_NET_ADMIN`, `CAP_SYS_PTRACE`, `CAP_SYSLOG`, `runAsUser: 0`). It creates a network namespace, spawns the agent process inside it, applies Landlock filesystem restrictions, and seccomp filters. All Linux namespaces are shared between supervisor and agent except the network namespace.

### 1.2 Security Concerns

**Privilege escalation via shared namespaces.** The supervisor runs as root with elevated capabilities in the same pod as the agent. While the agent process itself runs unprivileged, a rogue SUID binary in the container image could allow the agent to escalate to root privileges within the same set of namespaces. The supervisor's elevated capabilities then become reachable.

**Implicit trust in the container image.** The current model requires trusting the agent container image — at least partially. Landlock restricts filesystem access, but the cluster operator must trust that agent builders do not introduce SUID binaries, world-writable sensitive paths, or other misconfigurations. This is a somewhat shaky trust boundary: the platform claims to sandbox untrusted code, yet relies on the image being well-constructed.

**Weak workload isolation.** The agent workload runs directly on the host kernel with only seccomp and Landlock as confinement layers. A kernel exploit from the agent process has direct impact on the host. Stronger isolation — such as that provided by gVisor (syscall interception via a userspace kernel) or Kata Containers (lightweight VM boundary) — would significantly reduce the blast radius of a container escape.

**Enterprise deployment blockers (Issue #899).** OpenShift and other managed Kubernetes platforms enforce `restricted-v2` SecurityContextConstraints that drop all capabilities and enforce `runAsNonRoot`. The current architecture cannot deploy on these platforms without a security exception that most enterprise platform teams are reluctant to grant.

**Log stream conflation.** The supervisor and agent share stdout. OCSF security events, supervisor operational logs, and agent application logs are interleaved in a single stream. A malicious agent could emit lines that mimic OCSF events, poisoning the security audit trail. Splitting into separate pods makes this a non-issue — each pod has its own log stream.

---

## 2. Proposed Architecture

### 2.1 Overview

Split the current single-pod design into two distinct workloads:

```
┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                       │
│                                                             │
│  ┌──────────────────────┐     ┌───────────────────────────┐ │
│  │   Supervisor Pod     │     │     Agent Pod (gVisor)    │ │
│  │                      │     │                           │ │
│  │  ┌─────────────────┐ │     │  ┌─────────────────────┐  │ │
│  │  │ CONNECT Proxy   │◄├─────┤──│ Agent Process       │  │ │
│  │  │ (L7 inspection) │ │     │  │ (untrusted code)    │  │ │
│  │  ├─────────────────┤ │     │  └─────────────────────┘  │ │
│  │  │ OPA Engine      │ │     │  ┌─────────────────────┐  │ │
│  │  ├─────────────────┤ │     │  │ Sideloaded Binary   │  │ │
│  │  │ Credential Vault│ │     │  │ (Landlock, SSH)     │  │ │
│  │  ├─────────────────┤ │     │  └─────────────────────┘  │ │
│  │  │ Inference Router│ │     │                           │ │
│  │  ├─────────────────┤ │     │  RuntimeClass: gvisor     │ │
│  │  │ OCSF Logger     │ │     │  No capabilities          │ │
│  │  └─────────────────┘ │     │  Non-root                 │ │
│  │                      │     │  Untrusted image          │ │
│  │  Trusted image       │     └───────────────────────────┘ │
│  │  Minimal capabilities│                                   │
│  └──────────────────────┘                                   │
│                                                             │
│  ┌─────────────────────────────────────────────────────────┐│
│  │              NetworkPolicy                              ││
│  │  Agent Pod → Supervisor Pod:3128  (ALLOW)               ││
│  │  Agent Pod → *                    (DENY)                ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
```

**Supervisor Pod:** Runs the trusted proxy, OPA engine, credential resolver, inference router, and OCSF logger. This is an OpenShell-controlled image. 
It is meant to run as the peer to a single agent pod (1:1 sidecar), however we envision that it may also run as a single shared instance (with replicas) serving multiple agent pods.

**Agent Pod:** Runs the untrusted agent workload. The container image is fully untrusted. The pod runs with no capabilities, as non-root, under a gVisor `RuntimeClass` (when available), or sandboxed inside a Kata container. It can only reach the supervisor proxy via Kubernetes NetworkPolicy.

### 2.2 Agent Pod

The agent pod is the **untrusted execution environment**. Design principles:

- **No elevated capabilities.** The pod spec requests zero Linux capabilities. Compatible with `restricted-v2` SCC on OpenShift.
- **Non-root.** Runs as the image's default non-root user or a UID specified by the platform.
- **gVisor RuntimeClass.** When available, the pod runs under `runsc` (gVisor's OCI runtime). This interposes a userspace kernel between the agent and the host kernel, providing defense-in-depth against kernel exploits. When gVisor is not available, the pod runs on the standard runtime with reduced isolation (documented trade-off).
- **Untrusted image.** The image is provided by the agent deployer. No assumption is made about its contents. No SUID binaries, no OpenShell components baked in, no trust required.
- **Proxy configuration.** The agent pod is configured with `HTTP_PROXY` / `HTTPS_PROXY` environment variables pointing to the supervisor pod's proxy endpoint (e.g., `http://<supervisor-svc>:3128`). Cooperative clients honor these variables; non-cooperative traffic is blocked by NetworkPolicy.
- **Network isolation.** A Kubernetes NetworkPolicy restricts all egress from the agent pod to **only** the supervisor pod on port 3128. All other egress is denied at the CNI level. This is the hard enforcement boundary.
- **Supervisor sideloading via image volumes.** The platform injects an OpenShell supervisor binary into the agent pod using the Kubernetes [image volumes](https://kubernetes.io/docs/tasks/configure-pod-container/image-volumes/) feature. This binary is sourced from a trusted OpenShell image, runs as the pod entrypoint, and can set up Landlock filesystem restrictions and an SSH server before exec-ing the agent process. The sideloaded binary does not require root or elevated capabilities — `landlock_restrict_self()` is an unprivileged operation. Landlock, SSH, and the sideloaded binary itself are independently configurable and can each be enabled or disabled per sandbox policy.

### 2.3 Supervisor Pod

The supervisor pod is the **trusted control plane** for one or more agent pods. Initially it appears that a 1:1 relationship is easier as it would remove the necessity for the agent to authenticate, however in environments with many agents a 1:N relationship appears to be preferable. It runs an OpenShell-managed image containing:

| Component | Responsibility |
|-----------|---------------|
| HTTP CONNECT Proxy | Receives all agent egress traffic. Terminates TLS (MITM) for L7 inspection. Tunnels allowed traffic to upstream destinations. |
| OPA Policy Engine | Evaluates per-request network policy (host, port, protocol rules). Hot-reloads policy from the gateway. |
| Credential Resolver | Replaces `openshell:resolve:env:*` placeholders in HTTP requests with real secrets fetched from the gateway. Secrets never reach the agent pod. |
| Inference Router | Routes `inference.local` requests to configured LLM backends. Enforces inference-specific policy (model allowlists, rate limits). |
| OCSF Logger | Emits structured security events (network decisions, policy violations, credential usage) to a dedicated log stream. |

**Deployment topologies:**

- **1:1 (sidecar-like):** One supervisor pod per agent pod. Simplest model. Supervisor lifecycle tied to agent lifecycle. Higher resource overhead.
- **1:N (shared supervisor):** One supervisor deployment (with replicas for HA) serving multiple agent pods. Lower resource overhead. Requires routing agent identity through the proxy (e.g., via client certificate or token header). Enables centralized policy management and credential caching.

The choice of topology is a deployment configuration, not an architectural constraint. The proxy protocol is the same in both cases.

### 2.4 Network Policy Enforcement

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-egress-to-supervisor
spec:
  podSelector:
    matchLabels:
      openshell.ai/role: agent
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              openshell.ai/role: supervisor
      ports:
        - protocol: TCP
          port: 3128
    - to:  # DNS resolution
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
```

This NetworkPolicy is the **hard enforcement boundary**. Even if the agent ignores `HTTP_PROXY`, it cannot reach any destination other than the supervisor proxy and cluster DNS. The supervisor then applies L7 policy on top.

**Important:** NetworkPolicy enforcement depends on the CNI plugin. The Kubernetes driver MUST verify that the cluster's CNI supports NetworkPolicy (e.g., Calico, Cilium, Antrea) and warn or fail if it does not.

### 2.5 Supervisor CA Trust

For TLS interception (MITM), the supervisor generates an ephemeral CA certificate. In the current architecture, this CA is injected into the agent's trust store at process spawn time. In the split-pod model:

- **Option A (init container):** An init container in the agent pod fetches the supervisor's CA cert and writes it to a shared volume. The agent container mounts this volume and trusts the CA.
- **Option B (configmap/secret):** The supervisor publishes its CA cert to a Kubernetes Secret. The agent pod mounts it.
- **Option C (cooperative agent image):** The agent image is expected to trust the supervisor CA via `REQUESTS_CA_BUNDLE` or `SSL_CERT_FILE` environment variables pointing to a mounted cert. No image modification needed.

Option C is preferred as it requires no image trust and is purely configuration-driven.

---

## 3. What Changes, What Stays, What Is Lost

### 3.1 Preserved Features

| Feature | Mechanism in Split Model |
|---------|------------------------|
| L7 HTTP inspection | Proxy inspects CONNECT tunnels identically. OPA rules unchanged. |
| Credential injection | Proxy resolves placeholders in HTTP headers/body. Agent never sees real secrets. |
| Inference routing | `inference.local` routed by proxy to configured backends. |
| OCSF security logging | Supervisor emits all network decision events. Clean separation from agent stdout. |
| Seccomp BPF | Applied per-pod by Kubernetes. No capabilities required (`no_new_privs`). Can be specified as a `seccompProfile` in the pod's `securityContext`. |
| OPA policy evaluation | Entirely decoupled from network namespace. Evaluates abstract JSON input. |
| NetworkPolicy L3/L4 enforcement | Hard egress restriction at CNI level. Replaces in-pod netns isolation. |

### 3.2 Changed Features

| Feature | Current | Proposed | Impact |
|---------|---------|----------|--------|
| Network isolation | Network namespace (kernel) | Kubernetes NetworkPolicy (CNI) | Non-cooperative processes blocked by NetworkPolicy instead of netns. L3/L4 only for non-proxy traffic. |
| Process identity binding | `/proc/<pid>/exe` via SYS_PTRACE | Lost in cross-pod model (see §3.3) | Per-binary policy evaluation degraded. |
| Bypass detection | iptables LOG + dmesg monitor | NetworkPolicy deny logging (CNI-dependent) | CNI must support deny logging (Cilium Hubble, Calico logs). |
| Supervisor sideloading | hostPath volume from node | Image volume from trusted OpenShell image | Eliminates hostPath dependency. Cleaner separation. Platform-controlled. |
| Landlock filesystem sandbox | Applied by supervisor in child process | Applied by sideloaded binary in agent pod (see §4) | Optional. Works on standard runtimes (unprivileged); unavailable under gVisor. |
| SSH access | Embedded SSH server in supervisor | Sideloaded binary runs SSH server in agent pod (see §5) | Optional. Deployers can opt out; `kubectl exec` available as alternative. |

### 3.3 Lost Features

**Per-binary destination restrictions.** In the current model, the proxy resolves which binary opened a network connection by reading `/proc/<pid>/exe` from the supervisor (which shares a PID namespace with the agent). In the split-pod model, the supervisor and agent are in different pods with different PID namespaces. The supervisor proxy sees TCP connections from the agent pod's IP but **cannot determine which binary initiated the connection**.

Possible mitigations:

- **Cooperative reporting:** The agent-side proxy client (if present) includes binary identity in a custom HTTP header (e.g., `X-OpenShell-Binary: /usr/bin/curl`). The supervisor can use this for advisory policy evaluation but MUST NOT trust it for enforcement — the agent is untrusted and can forge headers.
- **Shared PID namespace (limited):** Kubernetes `shareProcessNamespace: true` shares PID namespace within a pod, not across pods. This does not help in the split model.
- **eBPF-based identity (future):** A node-level eBPF program could tag network connections with the originating binary's path. This is out of scope for this RFC but noted as a future direction.
- **Accept the trade-off:** Per-binary restrictions are an in-pod defense-in-depth feature. With the agent image treated as fully untrusted anyway, the value of per-binary restrictions is reduced — a malicious image can trivially rename binaries or use LD_PRELOAD to bypass binary identity checks. NetworkPolicy + L7 proxy inspection provides the enforceable boundary.

**Recommendation:** Accept the loss of per-binary enforcement. Document it as a known trade-off. The proxy still enforces per-host/port policy and L7 rules for cooperative traffic. NetworkPolicy enforces L3/L4 for all traffic.

---

## 4. Landlock in the Split Model

### 4.1 Landlock Without Root (Unprivileged Containers)

Landlock is explicitly designed for unprivileged use. The `landlock_restrict_self()` syscall works under `no_new_privs` and does not require any Linux capability. **However**, the current OpenShell implementation uses a two-phase approach:

- **Phase 1 (prepare):** Opens `PathFd` handles to all allowed paths. Currently runs as root to access paths the agent user cannot read (e.g., `/usr/sbin`).
- **Phase 2 (enforce):** Calls `restrict_self()`. Does not require root.

In an unprivileged container, Phase 1 can only open paths readable by the container's UID. Paths outside the UID's access will fail. The existing `best_effort` mode handles this gracefully — inaccessible paths are skipped with a warning, and Landlock is applied for the paths that could be opened.

**Verdict:** Landlock works in unprivileged containers with degraded coverage (best-effort mode). With the image volume sideloading approach (§2.2), the platform injects the OpenShell binary into the agent pod as the entrypoint. This binary sets up Landlock before exec-ing the agent process — the same flow as the current single-pod model, minus root. No action required from the agent deployer.

### 4.2 Landlock Under gVisor

gVisor implements its own syscall table in a userspace kernel (the "sentry"). As of the current gVisor release, **gVisor does not implement the Landlock syscalls** (`landlock_create_ruleset`, `landlock_add_rule`, `landlock_restrict_self`). These syscalls will return `ENOSYS` inside a gVisor-sandboxed container.

This means Landlock cannot be used inside gVisor pods. However, this is not necessarily a problem:

- **gVisor provides its own filesystem isolation.** The gVisor sentry interposes on all filesystem operations. The `runsc` OCI runtime supports restricting filesystem access via OCI spec mounts (read-only, masked paths, etc.).
- **The agent image is untrusted anyway.** In this architecture, we do not trust the agent image to correctly apply Landlock. Filesystem restrictions for the agent pod should come from the pod spec (read-only root filesystem, volume mounts) and gVisor's own isolation, not from in-process Landlock.

### 4.3 Recommendation

| Runtime | Filesystem Restriction Mechanism |
|---------|----------------------------------|
| Standard (runc) | Landlock (best-effort, unprivileged) + seccomp + read-only mounts |
| gVisor (runsc) | gVisor VFS isolation + read-only mounts + seccomp (gVisor-compatible profile) |

Landlock is an **optional, platform-managed** feature. On non-gVisor runtimes, the platform injects the OpenShell sideloaded binary via Kubernetes image volumes. This binary applies Landlock rules at startup (best-effort mode) before exec-ing the agent process. The agent deployer does not need to include any OpenShell components in their image — the platform handles injection transparently. The deployer can disable Landlock via sandbox policy if it interferes with their workload.

For gVisor pods, Landlock is unavailable (gVisor does not implement the Landlock syscalls). Filesystem isolation is configured via the pod spec and gVisor's runtime configuration instead. This is controlled by the platform, not the agent image.

---

## 5. SSH Access

### 5.1 Current Mechanism

The supervisor embeds an SSH server (russh) that listens on port 2222. Users authenticate via the NSSH1 handshake protocol (time-bounded shared secret). The SSH session spawns a shell in the agent's network namespace with dropped privileges.

### 5.2 Sideloaded SSH in Split Model

With the image volume sideloading approach (§2.2), the SSH server is part of the sideloaded OpenShell binary injected into the agent pod by the platform. The binary runs as the pod entrypoint, starts the SSH server, and then exec-s the agent process. This preserves the current NSSH1 authentication flow and user experience without requiring the agent deployer to include any OpenShell components in their image.

**Key properties:**

- **Platform-controlled.** The SSH binary comes from a trusted OpenShell image volume. The agent deployer's image is still fully untrusted.
- **No capabilities required.** The SSH server binds to port 2222 (unprivileged). No root or elevated capabilities needed.
- **Optional / opt-out.** Agent deployers can disable SSH via sandbox policy if they don't need interactive access or want to avoid Falco alerts for SSH activity inside the cluster.
- **Network path.** Inbound SSH connections require an additional NetworkPolicy rule and a Service exposing port 2222 on the agent pod. These are only created when SSH is enabled.

### 5.3 Alternatives

**`kubectl exec` (fallback).** When SSH is disabled, users access the agent via `kubectl exec` or the Kubernetes API. This is the standard Kubernetes-native approach — no image modification, no additional network exposure. However, it requires Kubernetes RBAC and lacks the NSSH1 authentication protocol.

### 5.4 Recommendation

Enable SSH via the sideloaded binary by default. Deployers can opt out via sandbox policy (`ssh: disabled`). When SSH is disabled, `kubectl exec` is the fallback. Document that SSH in the cluster may trigger Falco alerts and provide guidance on configuring exceptions.

---

## 6. Observability Benefits

Splitting supervisor and agent into separate pods provides clear observability wins:

### 6.1 Log Stream Separation

| Stream | Source Pod | Content |
|--------|-----------|---------|
| Supervisor stdout | Supervisor | OCSF security events, proxy decisions, credential usage |
| Supervisor stderr | Supervisor | Operational errors, proxy diagnostics |
| Agent stdout | Agent | Agent application logs |
| Agent stderr | Agent | Agent application errors |

A log shipper (Fluentd, Fluent Bit, Vector) can collect each stream independently. **OCSF events cannot be spoofed by the agent** — they originate from a different pod with a different log source identifier. This eliminates the log poisoning attack vector where a malicious agent emits fake OCSF events to its stdout.

### 6.2 gVisor + Falco Integration

gVisor exposes a runtime monitoring interface that Falco can consume. With the agent running under gVisor, Falco can inspect:

- Syscall patterns (anomalous file access, network attempts, process spawning)
- File integrity monitoring inside the gVisor sandbox
- Unexpected binary execution

This provides the same class of runtime security monitoring that Falco provides for standard containers, but with the added isolation boundary of gVisor's userspace kernel.

### 6.3 Resource Attribution

Separate pods enable accurate resource metering per agent workload:

- CPU/memory usage attributed to the correct pod
- Network traffic metered per pod by CNI
- Kubernetes HPA can scale agent and supervisor independently

---

## 7. Agent Identity in Shared Supervisor Topology

In the 1:N topology (one supervisor serving multiple agents), the supervisor must identify which agent is making each proxy request.

### 7.1 Identification Mechanisms

**Source IP mapping.** The supervisor maintains a mapping of agent pod IPs to agent identities. When a CONNECT request arrives from `10.0.1.42`, the supervisor looks up which agent pod owns that IP. This mapping is maintained via the Kubernetes API (watch pod events) or via the gateway's sandbox registry.

**Client certificate (mTLS).** Each agent pod is provisioned with a unique client certificate (via Kubernetes Secrets or cert-manager). The agent's `HTTP_PROXY` configuration includes the client cert. The supervisor validates the cert and extracts the agent identity from the CN/SAN.

**Token header.** A per-agent bearer token is injected as an environment variable. The agent's HTTP client includes it in a `Proxy-Authorization` header. Simpler than mTLS but less robust (token can be exfiltrated by the agent and replayed from a different context).

### 7.2 Recommendation

Use **token header** as the primary identification mechanism as it is part of the standard.

---

## 8. Deployment and Migration

### 8.1 Backward Compatibility

This RFC does NOT remove the existing single-pod (`InPod`) deployment mode. The current architecture remains the default for:

- Docker Desktop / local development
- K3s clusters managed by `openshell bootstrap`
- Environments where gVisor is not available and elevated capabilities are acceptable

The split-pod model is a new `NetworkMode` variant (e.g., `Platform` or `SplitPod`) selected via configuration.

### 8.2 Configuration

```yaml
# sandbox policy (gateway-side)
sandbox:
  network_mode: platform   # "proxy" (current) | "platform" (split-pod)
  runtime_class: gvisor     # optional, defaults to cluster default
  supervisor:
    topology: shared        # "dedicated" (1:1) | "shared" (1:N)
    replicas: 2
  agent:
    sideload: enabled       # "enabled" | "disabled" — inject OpenShell binary via image volume
    ssh: enabled            # "enabled" | "disabled" — requires sideload: enabled
    landlock: best_effort   # "disabled" | "best_effort" — requires sideload: enabled, non-gVisor only
```

The `sideload` setting controls whether the platform injects the OpenShell binary into the agent pod via Kubernetes image volumes. When enabled, the binary acts as the pod entrypoint and can manage Landlock and SSH. When disabled, the agent pod runs the deployer's image entrypoint directly — Landlock and SSH are unavailable, and the pod relies solely on NetworkPolicy, gVisor, and pod spec restrictions for isolation.

`ssh` and `landlock` are independently togglable but both require `sideload: enabled`. Setting `sideload: disabled` implicitly disables both.

### 8.3 Kubernetes Driver Changes

The Kubernetes driver needs to:

1. **Generate two pod specs** (or a pod + deployment) instead of one when `network_mode: platform`.
2. **Emit NetworkPolicy** restricting agent egress to supervisor.
3. **Omit elevated capabilities** from agent pod spec.
4. **Manage supervisor lifecycle** — create/update/delete supervisor deployment, or create per-agent supervisor pods.
5. **Inject proxy configuration** into agent pod environment (`HTTP_PROXY`, `HTTPS_PROXY`, `NO_PROXY`, `SSL_CERT_FILE`).
6. **Publish supervisor CA cert** via Secret or ConfigMap.
7. **Verify CNI NetworkPolicy support** and warn if unavailable.
8. **Inject sideloaded binary** via image volume when `sideload: enabled`. Override the agent container's command to the sideloaded binary path. Configure the binary with the appropriate Landlock and SSH settings.

---

## 9. Security Analysis

### 9.1 Threat Model Comparison

| Threat | Current (InPod) | Proposed (SplitPod + gVisor) |
|--------|-----------------|------------------------------|
| Kernel exploit from agent | Direct host kernel access. Seccomp + Landlock mitigate. | gVisor interposes userspace kernel. Host kernel attack surface drastically reduced. |
| SUID privilege escalation | Agent shares namespaces with root supervisor. Escalation possible. | Agent pod has no capabilities, no root. SUID binaries ineffective (no_new_privs, gVisor). |
| Container escape | Standard runc isolation. | gVisor provides additional isolation layer. |
| Image supply chain attack | Image partially trusted. | Image fully untrusted. OpenShell binary injected via platform-controlled image volume, not baked into agent image. |
| Network policy bypass (non-cooperative) | netns + iptables block all non-proxy traffic. | NetworkPolicy blocks at CNI level. Equivalent enforcement, different layer. |
| Log poisoning | Agent stdout interleaved with OCSF events. Spoofable. | Separate pods, separate log streams. Not spoofable. |
| Credential exfiltration | Secrets resolved in proxy, never in agent env. | Identical — proxy resolves secrets, agent sees placeholders only. |
| Per-binary policy bypass | Binary identity via `/proc`. Spoofable via symlinks but detectable. | Lost. Agent can use any binary. Mitigated by whole-pod network restriction. |

### 9.2 Trust Boundaries

```
┌─────────────────────────────────────────┐
│            Trust Boundary 1              │
│         (Platform / Cluster)             │
│                                          │
│  ┌────────────────────┐                  │
│  │  Trust Boundary 2  │                  │
│  │   (Supervisor)     │                  │
│  │   - Proxy          │  NetworkPolicy   │
│  │   - OPA            │◄────────────────►│
│  │   - Credentials    │                  │
│  └────────────────────┘                  │
│                                          │
│  ┌────────────────────┐                  │
│  │  Trust Boundary 3  │                  │
│  │   (Agent / gVisor) │                  │
│  │   - UNTRUSTED      │                  │
│  │   - No capabilities│                  │
│  │   - No secrets     │                  │
│  └────────────────────┘                  │
└─────────────────────────────────────────┘
```

### 9.3 Key Security Property

**No mistake by the agent deployer can compromise the overall system.** The agent image is untrusted. The agent pod has no capabilities. Network access is restricted by platform-enforced NetworkPolicy. Secrets are resolved in the supervisor, never exposed to the agent. gVisor provides kernel-level isolation. The only cooperation expected from the agent is honoring `HTTP_PROXY` — and non-cooperation is handled by NetworkPolicy denial, not by trusting the agent to behave.

---

## 10. Open Questions

1. **NetworkPolicy as sole enforcement.** NetworkPolicy depends on the CNI plugin. Not all CNIs implement it (e.g., Flannel without Calico). Should the driver refuse to create agent pods if NetworkPolicy enforcement cannot be verified?

2. **Supervisor scaling in shared topology.** How does the supervisor handle agent pod churn? Does it watch Kubernetes pod events, or does the gateway push agent registrations?

3. **gVisor availability.** gVisor requires the `runsc` runtime to be installed on nodes and a `RuntimeClass` to be defined. How do we handle clusters where gVisor is not available? Fall back to standard runtime with documented reduced isolation?

4. **Init container for workspace seeding.** The current workspace init container runs as root. In the split model, the agent pod is non-root. How is the workspace PVC initialized? Options: (a) the agent image includes workspace content at build time, (b) an init container runs as the agent's non-root UID, (c) the supervisor provisions the PVC via a separate job.

5. **DNS policy.** The NetworkPolicy allows DNS (port 53) egress. A malicious agent could use DNS tunneling for data exfiltration. Should DNS be restricted to supervisor-mediated resolution only?

6. **Latency impact.** In the current model, the proxy is on the loopback or a local veth. In the split model, proxy traffic crosses the pod network (overlay). What is the latency impact for high-throughput agent workloads?

---

## 11. Summary

This RFC proposes splitting the OpenShell sandbox into a **trusted supervisor pod** and an **untrusted agent pod**, connected via Kubernetes NetworkPolicy and an HTTP CONNECT proxy. The agent pod can optionally run under gVisor for strong kernel-level isolation.

The key trade-offs:

| Gained | Trade-off |
|--------|-----------|
| gVisor kernel isolation | Per-binary destination restrictions lost |
| Fully untrusted agent images | In-pod network namespace enforcement replaced by NetworkPolicy |
| OpenShift / restricted SCC compatibility | iptables-based bypass detection replaced by CNI-level deny logging |
| Clean log stream separation | Single-pod simplicity |
| Falco runtime monitoring via gVisor | Landlock unavailable under gVisor (replaced by gVisor VFS + pod spec) |
| Independent resource scaling | — |
| No elevated capabilities for agent | — |
| Landlock preserved via sideloaded binary (non-gVisor, optional) | — |
| SSH preserved via sideloaded binary (optional, opt-out) | — |

The existing `InPod` mode is preserved as the default for environments where gVisor is unavailable and elevated capabilities are acceptable. The split-pod model is an opt-in deployment mode for enterprise and security-sensitive environments.

### Alternatives Considered

We did compare it with the current design.

### Agent Investigation

We used an agent to investigate the codebase and prepare the design. We also did a lot of"human" review that makes us believe that the proposal is plausible and implementable.

We are ready and willing to implement this proposal.

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Component	Responsibility
HTTP CONNECT Proxy	Receives all agent egress traffic. Terminates TLS (MITM) for L7 inspection. Tunnels allowed traffic to upstream destinations.
OPA Policy Engine	Evaluates per-request network policy (host, port, protocol rules). Hot-reloads policy from the gateway.
Credential Resolver	Replaces `openshell:resolve:env:*` placeholders in HTTP requests with real secrets fetched from the gateway. Secrets never reach the agent pod.
Inference Router	Routes `inference.local` requests to configured LLM backends. Enforces inference-specific policy (model allowlists, rate limits).
OCSF Logger	Emits structured security events (network decisions, policy violations, credential usage) to a dedicated log stream.

Feature	Mechanism in Split Model
L7 HTTP inspection	Proxy inspects CONNECT tunnels identically. OPA rules unchanged.
Credential injection	Proxy resolves placeholders in HTTP headers/body. Agent never sees real secrets.
Inference routing	`inference.local` routed by proxy to configured backends.
OCSF security logging	Supervisor emits all network decision events. Clean separation from agent stdout.
Seccomp BPF	Applied per-pod by Kubernetes. No capabilities required (`no_new_privs`). Can be specified as a `seccompProfile` in the pod's `securityContext`.
OPA policy evaluation	Entirely decoupled from network namespace. Evaluates abstract JSON input.
NetworkPolicy L3/L4 enforcement	Hard egress restriction at CNI level. Replaces in-pod netns isolation.

Feature	Current	Proposed	Impact
Network isolation	Network namespace (kernel)	Kubernetes NetworkPolicy (CNI)	Non-cooperative processes blocked by NetworkPolicy instead of netns. L3/L4 only for non-proxy traffic.
Process identity binding	`/proc/<pid>/exe` via SYS_PTRACE	Lost in cross-pod model (see §3.3)	Per-binary policy evaluation degraded.
Bypass detection	iptables LOG + dmesg monitor	NetworkPolicy deny logging (CNI-dependent)	CNI must support deny logging (Cilium Hubble, Calico logs).
Supervisor sideloading	hostPath volume from node	Image volume from trusted OpenShell image	Eliminates hostPath dependency. Cleaner separation. Platform-controlled.
Landlock filesystem sandbox	Applied by supervisor in child process	Applied by sideloaded binary in agent pod (see §4)	Optional. Works on standard runtimes (unprivileged); unavailable under gVisor.
SSH access	Embedded SSH server in supervisor	Sideloaded binary runs SSH server in agent pod (see §5)	Optional. Deployers can opt out; `kubectl exec` available as alternative.

Runtime	Filesystem Restriction Mechanism
Standard (runc)	Landlock (best-effort, unprivileged) + seccomp + read-only mounts
gVisor (runsc)	gVisor VFS isolation + read-only mounts + seccomp (gVisor-compatible profile)

Stream	Source Pod	Content
Supervisor stdout	Supervisor	OCSF security events, proxy decisions, credential usage
Supervisor stderr	Supervisor	Operational errors, proxy diagnostics
Agent stdout	Agent	Agent application logs
Agent stderr	Agent	Agent application errors

Threat	Current (InPod)	Proposed (SplitPod + gVisor)
Kernel exploit from agent	Direct host kernel access. Seccomp + Landlock mitigate.	gVisor interposes userspace kernel. Host kernel attack surface drastically reduced.
SUID privilege escalation	Agent shares namespaces with root supervisor. Escalation possible.	Agent pod has no capabilities, no root. SUID binaries ineffective (no_new_privs, gVisor).
Container escape	Standard runc isolation.	gVisor provides additional isolation layer.
Image supply chain attack	Image partially trusted.	Image fully untrusted. OpenShell binary injected via platform-controlled image volume, not baked into agent image.
Network policy bypass (non-cooperative)	netns + iptables block all non-proxy traffic.	NetworkPolicy blocks at CNI level. Equivalent enforcement, different layer.
Log poisoning	Agent stdout interleaved with OCSF events. Spoofable.	Separate pods, separate log streams. Not spoofable.
Credential exfiltration	Secrets resolved in proxy, never in agent env.	Identical — proxy resolves secrets, agent sees placeholders only.
Per-binary policy bypass	Binary identity via `/proc`. Spoofable via symlinks but detectable.	Lost. Agent can use any binary. Mitigated by whole-pod network restriction.

Gained	Trade-off
gVisor kernel isolation	Per-binary destination restrictions lost
Fully untrusted agent images	In-pod network namespace enforcement replaced by NetworkPolicy
OpenShift / restricted SCC compatibility	iptables-based bypass detection replaced by CNI-level deny logging
Clean log stream separation	Single-pod simplicity
Falco runtime monitoring via gVisor	Landlock unavailable under gVisor (replaced by gVisor VFS + pod spec)
Independent resource scaling	—
No elevated capabilities for agent	—
Landlock preserved via sideloaded binary (non-gVisor, optional)	—
SSH preserved via sideloaded binary (optional, opt-out)	—

Proposal: Split Supervisor and Agent into Separate Pods with gVisor Isolation #981

Description

Problem Statement

Proposed Design

RFC: Split Supervisor and Agent into Separate Pods with gVisor Isolation

0. Executive Summary

1. Motivation and Criticism of Current Architecture

1.1 Current Architecture

1.2 Security Concerns

2. Proposed Architecture

2.1 Overview

2.2 Agent Pod

2.3 Supervisor Pod

2.4 Network Policy Enforcement

2.5 Supervisor CA Trust

3. What Changes, What Stays, What Is Lost

3.1 Preserved Features

3.2 Changed Features

3.3 Lost Features

4. Landlock in the Split Model

4.1 Landlock Without Root (Unprivileged Containers)

4.2 Landlock Under gVisor

4.3 Recommendation

5. SSH Access

5.1 Current Mechanism

5.2 Sideloaded SSH in Split Model

5.3 Alternatives

5.4 Recommendation

6. Observability Benefits

6.1 Log Stream Separation

6.2 gVisor + Falco Integration

6.3 Resource Attribution

7. Agent Identity in Shared Supervisor Topology

7.1 Identification Mechanisms

7.2 Recommendation

8. Deployment and Migration

8.1 Backward Compatibility

8.2 Configuration

8.3 Kubernetes Driver Changes

9. Security Analysis

9.1 Threat Model Comparison

9.2 Trust Boundaries

9.3 Key Security Property

10. Open Questions

11. Summary

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions