K3s is a lightweight Kubernetes distribution designed for simplicity, scalability, and fast deployments. It is perfect for running Kubernetes clusters on resource-constrained environments or edge devices. When combined with the NVIDIA GPU Operator, it unlocks the full potential of GPUs in containerized workloads, allowing developers to run GPU-accelerated applications efficiently.

This guide provides a step-by-step tutorial on installing K3s and configuring it with NVIDIA GPU Operator on Ubuntu 22.04.

Prerequisites

  • An Ubuntu 22.04 GPU Server
  • At least 8GB of RAM and 20GB of free disk space (for GPU-accelerated workloads).
  • A root or sudo privileges

Step 1: Verify GPU Availability

Ensuring that your NVIDIA GPU is detected and operational is a critical first step. This guarantees compatibility with the NVIDIA Container Toolkit and GPU Operator.

nvidia-smi

Output.

Mon Nov 25 04:45:16 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40-2Q                  On  |   00000000:06:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       1MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

If no GPU is detected, ensure the GPU is properly installed and powered.

Step 2: Disable Swap

Kubernetes requires swap to be disabled to manage resources efficiently. Swap conflicts with Kubernetes’ resource scheduling and can lead to unpredictable behavior.

swapoff -a

To disable swap permanently, edit the /etc/fstab file and comment out the swap entry.

Disabling swap ensures that Kubernetes nodes use actual memory for resource allocation. This improves stability and avoids node taints.

Step 3: Install K3s

K3s is a lightweight Kubernetes distribution, designed for edge devices and development environments. Its streamlined architecture eliminates unnecessary components, making it faster and easier to deploy than traditional Kubernetes.

Run the following command to download and execute the K3s installation script.

curl -sfL https://get.k3s.io | sh -

This command installs containerd as the default container runtime and configures essential services.

Check the status of the K3s service:

systemctl status k3s

Output.

● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-11-25 04:46:26 UTC; 33s ago
       Docs: https://k3s.io
    Process: 9098 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS)
    Process: 9100 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 9101 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 9102 (k3s-server)
      Tasks: 86
     Memory: 1.3G
        CPU: 21.304s
     CGroup: /system.slice/k3s.service
             ├─ 9102 "/usr/local/bin/k3s server"

Export the KUBECONFIG environment variable to access the K3s cluster.

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

List the nodes in your K3s cluster.

kubectl get nodes

Output.

NAME     STATUS   ROLES                  AGE   VERSION
ubuntu   Ready    control-plane,master   83s   v1.30.6+k3s1

Step 4: Install NVIDIA Container Toolkit

The NVIDIA Container Toolkit allows containerized applications to access GPU resources. It is a critical component for running GPU workloads in Kubernetes.

Add the NVIDIA repository.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the repository index.

apt-get update

Install the container toolkit.

apt-get install -y nvidia-container-toolkit

Check the installed version of the NVIDIA Container CLI:

nvidia-container-cli --version

Output.

cli-version: 1.17.2
lib-version: 1.17.2

Configure Docker for GPU support.

nvidia-ctk runtime configure --runtime=docker

Restart Docker to apply the configuration:

systemctl restart docker

Run a container with GPU access to verify the setup:

docker run --rm --gpus all nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04 nvidia-smi

Output.

==========
== CUDA ==
==========

CUDA Version 12.6.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Mon Nov 25 04:57:39 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40-2Q                  On  |   00000000:06:00.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       1MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Step 5: Install Helm

Helm simplifies Kubernetes deployments by managing application packaging and upgrades.

Install Helm using the script below.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify the Helm installation.

helm version

Output.

version.BuildInfo{Version:"v3.16.3", GitCommit:"cfd07493f46efc9debd9cc1b02a0961186df7fdf", GitTreeState:"clean", GoVersion:"go1.22.7"}

Step 6: Deploy NVIDIA GPU Operator

The NVIDIA GPU Operator automates the management of NVIDIA GPU drivers, monitoring tools, and runtime in Kubernetes.

Add the NVIDIA Helm repository.

helm repo add nvidia https://nvidia.github.io/gpu-operator

Update the repository.

helm repo update

Install GPU operator.

helm install --wait --generate-name nvidia/gpu-operator

Verify the deployment.

kubectl get pods | grep nvidia

This command will list all NVIDIA-related pods in the cluster:

Please note: it may take some time for the pods to start

nvidia-container-toolkit-daemonset-cl2s5                          1/1     Running    0          2m36s
nvidia-cuda-validator-r6pl4                                       0/1     Evicted    0          2m26s
nvidia-dcgm-exporter-bt8jp                                        1/1     Running    0          2m35s
nvidia-device-plugin-daemonset-gtgwv                              1/1     Running    0          2m35s
nvidia-operator-validator-cbd9z                                   0/1     Init:2/4   0          2m36s

Ensure that the NVIDIA GPU Operator has configured the nodes correctly.

kubectl describe nodes | grep nvidia

You should see the information about the GPU resources managed by the NVIDIA GPU Operator.

Conclusion

You’ve successfully installed K3s with the NVIDIA GPU Operator on Ubuntu 22.04. This setup allows you to run GPU-accelerated workloads on a lightweight Kubernetes cluster, enabling efficient deployment of machine learning, AI, and other compute-intensive applications. Try to deploy K3s on Atlantic.Net GPU Hosting.