Improve UX when spawning the agent pod fails

Part of the OSS mirrord flow is creating a k8s Job running a Pod with the mirrord-agent container ([here](https://github.com/metalbear-co/mirrord/blob/81e715586e3ff04307bf4ce50ea4d31e6a674a4e/mirrord/kube/src/api/container/job.rs#L31)).

Starting that pod might fail for multiple reasons. Our logic there is not the best, we basically only wait until `status.phase == "Running"`. When the agent pod cannot be spawned, most of the time this results in a generic timeout error (the timeout is enforced somewhere up the call stack).

We should:
1. Fail early if the agent pod moves to `Failed` phase. This can happen due to cluster conditions. We should extract `status.reason` and `status.message` and include them in the error message presented to the user.
2. Fail early if the agent pod moves to `Succeeded` phase. This should never happen, and should be reported to the user as a bug.
3. Fail early if the agent pod is deleted while in `Pending` phase. This usually means that the user does not have sufficient permissions to spawn the agent pod in the cluster. The error message presented to the user should mention [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) as a probable cause of the failure, and suggest trying out mirrord for Teams (similar to [this](https://github.com/metalbear-co/mirrord/blob/81e715586e3ff04307bf4ce50ea4d31e6a674a4e/mirrord/cli/src/error.rs#L317)).
4. For every 10s while the agent pod is stuck in the `Pending` phase, we should issue a `Progress::warning`. The warning should state that the agent pod startup takes longer than expected, and contain info about `status.containerStatuses.[].state` of the agent container. See [container states](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states) for reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve UX when spawning the agent pod fails #3578

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve UX when spawning the agent pod fails #3578

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions