NVIDIA GPUs on Kubernetes: How They Work Under the Hood
Author
Nikolay Penkov
Date Published

Modern AI, ML, and compute workloads rely heavily on GPU acceleration. While NVIDIA makes GPU support in Docker relatively straightforward, configuring GPUs with containerd (the default runtime in Kubernetes and Minikube) requires a few careful steps.
This guide will help you understanding how GPUs are handled under the hood in Kubernetes and walks you through installing NVIDIA drivers, configuring the NVIDIA Container Toolkit and integrating it with containerd.
The big picture
The Kubernetes device plugin advertises GPUs as extended resources (like nvidia.com/gpu, or nvidia.com/mig-1g.5gb for MIG). Kubelet updates node capacity; the scheduler picks nodes that satisfy requests. Afterwards at start time kubelet asks the device plugin to Allocate, then hands the container runtime a ready-to-run spec.
On the node, you need the NVIDIA kernel driver and the NVIDIA Container Toolkit. The toolkit provides nvidia-container-runtime and the hooks/CLI (libnvidia-container) that inject device nodes, libraries, and env into the container.
Newer clusters commonly rely on CDI (Container Device Interface) so the runtime can add GPU access using standard device descriptions. Using the NVIDIA CDI this reduces the need for a special runtime class.
Many teams install everything via the NVIDIA GPU Operator, which automates drivers, the toolkit, the device plugin, GPU Feature Discovery (node labels), DCGM exporter, and optional MIG manager. You can check our blog post that goes in detail how to setup the GPU operator on minikube.

What gets exposed to pods?
Resource model seen by Kubernetes
Extended resources on the node:
- Whole GPUs: nvidia.com/gpu
- MIG slices (A100/H100, etc.): nvidia.com/mig-1g.5gb, nvidia.com/mig-2g.10gb, … (depends on how the GPU is partitioned).
You request/limit these the same way you request CPUs/memory. The scheduler only places the pod where capacity exists.
Inside the container
Character devices: /dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm, etc.
CUDA/NVML libraries mounted in, plus env like NVIDIA_VISIBLE_DEVICES (older path) or CDI device references (newer path). These are wired in by the NVIDIA container toolkit (runtime/hook/CLI) or by CDI-aware runtimes.
Install Nvidia GPU drivers
Before installing anything, ensure your GPU is recognized by Ubuntu. You can check for the latest driver version with the following command:
1sudo ubuntu-drivers list
You will see something like:
1nvidia-driver-4702nvidia-driver-470-server3nvidia-driver-5354nvidia-driver-535-open5nvidia-driver-535-server6nvidia-driver-535-server-open7nvidia-driver-5508nvidia-driver-550-open9nvidia-driver-550-server10nvidia-driver-550-server-open
Pick a valid version for your system and install the appropriate NVIDIA Driver. We are going to let it be handled automatically with the following command:
1sudo ubuntu-drivers install
Verify GPU setup
After the installation finishes reboot your system and check if NVIDIA is working correctly:
1nvidia-smi
This command reports GPU usage statistics like temperature, memory, and power consumption, and can be also used to control GPU settings such as power limits and compute modes. If you see something like this you can ensure that the installation was successfull:
1+-----------------------------------------------------------------------------------------+2| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |3+-----------------------------------------+------------------------+----------------------+4| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |5| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |6| | | MIG M. |7|=========================================+========================+======================|8| 0 NVIDIA GeForce RTX 3060 Off | 00000000:26:00.0 On | N/A |9| 0% 44C P8 13W / 170W | 9MiB / 12288MiB | 0% Default |10| | | N/A |11+-----------------------------------------+------------------------+----------------------+1213+-----------------------------------------------------------------------------------------+14| Processes: |15| GPU GI CI PID Type Process name GPU Memory |16| ID ID Usage |17|=========================================================================================|18| No running processes found |19+-----------------------------------------------------------------------------------------+
Verify Kernel Modules and Driver Files
NVIDIA kernel modules are drivers that connect the Linux kernel to NVIDIA GPU hardware, enabling graphics rendering and GPU computation. The main module nvidia manages core GPU operations and communication with user-space tools. Supporting modules like nvidia_modeset and nvidia_drm handle display configuration and integrate with the Linux graphics stack, while nvidia_uvm provides unified memory access for CUDA workloads. Together, they ensure your system can fully utilize the GPU for both display and compute tasks.
Check loaded NVIDIA kernel modules
After installing the NVIDIA drivers, it’s important to verify that the kernel modules have been correctly loaded. These modules form the critical link between the Linux kernel and your GPU hardware — without them, the GPU won’t be accessible to container runtimes or CUDA applications.
To check that the modules are loaded, run:
1lsmod | grep nvidia
If the kernel modules are loaded correctly you should see something like:
1nvidia_uvm 2179072 282nvidia_drm 139264 03nvidia_modeset 1814528 1 nvidia_drm4nvidia 14381056 36 nvidia_uvm,nvidia_modeset
Check driver version
Verifying the installed driver version ensures compatibility with your GPU hardware and the CUDA or container runtime you plan to use. To check the current NVIDIA driver version, run:
1cat /proc/driver/nvidia/version
This command displays details about the loaded driver, including its version number and build information. Confirm that it matches the version you intended to install — mismatched or outdated drivers can cause issues with GPU detection or containerized workloads. Let's check the output of proper configuration below:
1NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.95.05 Release Build (dvs-builder@U22-I3-B17-02-5) Tue Sep 23 09:55:41 UTC 20252GCC version: gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
Install the Container Runtime CLI (crictl)
The Container Runtime CLI is a lightweight command-line tool for interacting directly with container runtimes such as containerd or CRI-O. It’s especially useful for debugging Kubernetes nodes, checking container statuses, and inspecting images when kubectl alone isn’t sufficient. You can think of it as the equivalent of docker for low-level container runtimes.
While docker interacts with the Docker Engine (which bundles its own runtime and tooling), crictl communicates directly with CRI-compatible runtimes like containerd or CRI-O — the same runtimes Kubernetes uses under the hood.
Let's install crictl v1.34.0 with:
1VERSION="v1.34.0"2wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz3sudo tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin4rm -f crictl-$VERSION-linux-amd64.tar.gz
Verify it with:
1sudo crictl info | grep runtimeType
You’ll get output showing which container runtime crictl is connected to:
1// On a system using containerd, you’ll typically see2"runtimeType": "io.containerd.runc.v2"34// If you’re using a different runtime like CRI-O it might show something like5"runtimeType": "cri-o"
Install the NVIDIA Container Toolkit
The NVIDIA Container Toolkit provides container runtimes (hooks and libraries) that make GPUs visible inside containers.
Follow NVIDIA’s official instructions:
1# Add NVIDIA repo and key2curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \3| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg45curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \6| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \7| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list89# Update and install10sudo apt update11sudo apt install -y nvidia-container-toolkit
Configure containerd to Use the NVIDIA Runtime
By default, containerd starts containers with a generic runtime that doesn’t expose GPUs. NVIDIA GPU support requires a special runtime layer (from the NVIDIA Container Toolkit) that:
- Mounts GPU device nodes (e.g., /dev/nvidia0, /dev/nvidia-uvm) into containers.
- Injects NVIDIA user-space libraries (libcuda, libnvidia-ml, etc.) and driver-matched components.
- Applies the right OCI hooks and cgroup settings so CUDA and drivers work reliably and securely inside containers.
1# Configure containerd to recognize the NVIDIA runtime2sudo nvidia-ctk runtime configure --runtime=containerd34# Restart containerd5sudo systemctl restart containerd6sudo systemctl status containerd78# (Optional) Make NVIDIA the default runtime9sudo nvidia-ctk runtime configure --runtime=containerd --set-as-default
Lastly we can check if the NVIDIA runtime was added by running the command below:
1sudo cat /etc/containerd/config.toml | grep "containerd.runtimes.nvidia"
Expected output:
1 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]2 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]3 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi]4 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi.options]5 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy]6 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy.options]
Reference
* [NVIDIA Container Toolkit Docs](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
* [NVIDIA GPU Operator GitHub](https://github.com/NVIDIA/gpu-operator)
* [NVIDIA CUDA Docker Hub](https://hub.docker.com/r/nvidia/cuda/tags)
* [Enabling GPUs in the Container Runtime Ecosystem (NVIDIA Blog)](https://developer.nvidia.com/blog/gpu-containers-runtime/)
* [Kubernetes Device Plugin Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)