Nvidia GPU Driver Setup: Essential Steps for AI Developers
In this AI-boom era, LLM is probably what every company talks about. Many places want LLM Solutions to play a bigger role in their company, whether it's creating Chatbots, RAG, etc. What follows is that these solutions need to be in the company's Infrastructure, whether On-premises or Cloud provider, based on the requirement that data shouldn't be sent to external LLM providers like OpenAI for processing.
Therefore, new tasks for infrastructure people like us will be to provision GPU machines. However, after getting the machine, there are still many things to configure, such as installing drivers and other tools. In this article, we'll introduce basic machine preparation methods for Infra teams to ensure GPU Instances we create are as ready to use as possible.
I'll preface by saying that the GPUs we mainly use are from Nvidia, which is one of the market leaders with the most users right now. So all content will use Nvidia as the storyteller. Ready? Let's go!!
Install Driver
Installing Nvidia GPU Driver isn't as difficult or complex as you might think. We can follow Nvidia's Document, but we need to adjust some parameters to match our chosen OS and CPU Architecture. In this article, we might not cover every topic listed in the Document, but we'll teach basic installation methods that allow GPU usage.
Nvidia driver installation document
1. Prepare Required Parameters
From the Supported Linux Distributions table, which tells us which Linux versions support Driver installation.
3 parameters we need to note for use in the next steps:
- $distro
- $arch
- $arch_ext
Suppose we use Linux Ubuntu 22.04 LTS on x86 machine. When we compare values in the table:

The values will be: $distro = ubuntu2204 $arch = x86_64 $arch_ext = amd64
2. Choose Installation Guide According to Linux Distribution
This step is choosing the Driver installation method according to our OS. From my example choosing Ubuntu, we'll look at section 10 which explains Ubuntu installation methods.

What we need to do is:
- Follow all Pre-installation steps
- Install kernel headers and development packages
sudo apt install linux-headers-$(uname -r)
- Choose installation method between Local Repository or Network Repository. I'll choose Network Repository.
In URL https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
Replace $distro and $arch with values from the table
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
Install the new cuda-keyring package
- Install Driver
sudo apt install nvidia-open
After completing this step, we'll have Nvidia Driver and CUDA Library installed on our machine. We can verify with command:
nvidia-smi
The output will show the number of active GPUs along with basic utilization.

nvidia-smi output
- Install CUDA Toolkit
apt install cuda-toolkit
CUDA Toolkit Installation
That's it! We now have a Linux Ubuntu machine with GPU ready to use.
NVIDIA Container Toolkit
Simply put, it's Tools and Libraries that allow containers to use GPUs. We'll start with installing NVIDIA Container Toolkit. This example focuses on Docker containers which I believe many people use most.
From this example, we'll use Ubuntu as the installation example.
Prerequisites:
- Container engine (Docker, Containerd)
- Nvidia GPU Driver
- Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- Update the packages list from the repository
sudo apt-get update
- Install the NVIDIA Container Toolkit packages
sudo apt-get install -y nvidia-container-toolkit
From these 3 steps, we'll have nvidia container toolkit installed. The next step is configuring our container engine to use this toolkit.
Docker Configuration
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
For other container engines, you can check the Document for more details.
For Kubernetes, I'll make it another major topic because there's another specific Tool more suitable for K8S than installing GPU Driver and configuring toolkit on each Node.
Monitoring Tools
After installation and usage, the next thing we should do is monitor usage. The simplest initial monitoring might just use nvidia-smi to see how many GPUs there are and how much usage, but that's not detailed enough. So I'd like to recommend other Tools as usage options.
Nvitop
Interactive CLI that can view detailed GPU usage. Installation is very easy with:
pip3 install --upgrade nvitop
Then just use the command:
nvitop
And you'll get an Interactive UI displayed through terminal


Where we can Drill-Down to view each running process.
NVIDIA DCGM
Official Tool from Nvidia developed with Golang, acting as an API for extracting various GPU Cluster metrics.
For installation on regular VMs, you need GPU Driver, Docker engine along with Nvidia Container toolkit installed first to use DCGM.
For usage methods, I'll save them to write as a separate full article because using DCGM to its fullest requires using it with several other Tools like Prometheus and Grafana Dashboard.
Final Summary
By now, I think everyone who's read to the end should be able to provision VMs with GPUs ready for team use. If we gradually understand it, I think it's easier than installing some Services. As for pending content, please keep following. It should help you build infrastructure from dev to production.