Self-Hosted LLMs for Enterprise #1
In an era where Generative AI has become a daily assistant for many people, whether for writing code, answering questions, or summarizing reports, many organizations are becoming interested in installing and using LLMs internally for privacy, flexibility, and cost control. This series will guide you through setting up the system step by step, from installing drivers to running LLMs via API on your own Ubuntu machine.
The Infrastructure chosen in this article will be from AWS, and we'll use EC2 as the machine for our Demo.
For Instance Type, we'll use g5g.xlarge which has a GPU.
1. Find $distro and $arch values matching our system
Open the comparison table from Official Document
From our demo machine example:
- Ubuntu 24.04 LTS
- Architecture: arm64
We get the values:
$distro = ubuntu2404$arch = sbsa$arch_ext = sbsa
If using different machine specs, check values to match your machine.
2. Install NVIDIA keyring with $distro and $arch values from previous step
# Example: If using Ubuntu 24.04 + ARM64 (from step 1)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/sbsa/cuda-keyring_1.1-1_all.deb
# Install keyring
sudo dpkg -i cuda-keyring_1.1-1_all.deb
# Update apt index
sudo apt update
3. Install NVIDIA Proprietary Driver and CUDA Toolkit
sudo apt install cuda-drivers
sudo apt install cuda-toolkit
4. Check Driver Operation
nvidia-smi
Part 1 Summary
In this part, you will have:
- Checked system information to select the correct driver version
- Connected Ubuntu to NVIDIA Repository
- Installed NVIDIA proprietary GPU driver easily with
aptcommand - Verified GPU operation with
nvidia-smi
If you followed this, your machine is now ready for GPU use.
Next: Using GPU with Docker Container
In the next part, we'll look at how to:
- Configure Docker to use GPU correctly
- Install
nvidia-container-toolkit - Prepare environment for running LLM API Work-from-Home or within organizations
Don't miss the next part!