docs: update GPU operator guide with Civo-tuned flags and H100 NVLink… by jokestax · Pull Request #211 · civo/docs

jokestax · 2026-04-20T07:39:46Z

Summary

Updates the GPU Clusters doc (kubernetes/advanced/gpu-config.md) so customers get a reliable install on first try on Civo's GPU image, and have clear guidance for the single-GPU H100 case that currently fails without a workaround.

What changed

Install command tuned for Civo's image. Replaced the bare helm install --generate-name snippet with a helm upgrade --install that passes toolkit.enabled=false (the NVIDIA container toolkit is already baked into Civo's GPU image), plus driver.enabled=true, devicePlugin.enabled=true, gfd.enabled=true, operator.defaultRuntime=containerd, and validator.cuda.runtimeClassName=nvidia. Added a table explaining what each flag does.
New section: Single-GPU H100 nodes — NVLink workaround. On a node with only one H100, the driver fails to load because NVLink has no peer. Documented the nvidia-kernel-config ConfigMap with NVreg_NvLinkDisable=1 and the matching driver.kernelModuleConfig.name helm flag. Includes a warning not to apply this on multi-H100 nodes.
B200 added to the supported GPU list.
GPU Operator version pinned — called out chart/app v25.10.1 as the version Civo has validated end-to-end, with a --version 25.10.1 tip.
Troubleshooting expanded — H100 CrashLoopBackOff now points at the NVLink section; added guidance for pods stuck Pending on nvidia.com/gpu.

Why

The previous install example produced a working-but-suboptimal setup on Civo clusters (the Operator would layer its own container toolkit on top of the one Civo already provides), and single-GPU H100 nodes failed with no customer-facing explanation. These edits bring the doc in line with what our internal testing has validated.

… workaround

hlts2

LGTM

docs: update GPU operator guide with Civo-tuned flags and H100 NVLink…

d3ba569

… workaround

hlts2 approved these changes Apr 20, 2026

View reviewed changes

jokestax merged commit 5060ebe into main Apr 20, 2026
3 checks passed

jokestax deleted the docs/gpu-operator-h100-nvlink branch April 20, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update GPU operator guide with Civo-tuned flags and H100 NVLink…#211

docs: update GPU operator guide with Civo-tuned flags and H100 NVLink…#211
jokestax merged 1 commit intomainfrom
docs/gpu-operator-h100-nvlink

jokestax commented Apr 20, 2026

Uh oh!

hlts2 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jokestax commented Apr 20, 2026

Summary

What changed

Why

Uh oh!

hlts2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants