GPU Workloads: VMs vs Containers
GPU computing for AI/ML has unique requirements. Let's explore how VMs and Containers handle GPUs and which approach works best for different scenarios.
GPU Virtualization Technologies
GPU Passthrough (VMs)
The entire GPU is assigned to a single VM:
┌─────────────────────────────────────┐
│ VM with GPU │
│ ┌─────────────────────────────┐ │
│ │ AI/ML Application │ │
│ ├─────────────────────────────┤ │
│ │ CUDA / cuDNN │ │
│ ├─────────────────────────────┤ │
│ │ NVIDIA Driver │ │
│ └─────────────────────────────┘ │
├─────────────────────────────────────┤
│ Hypervisor │
│ ↓ │
│ [PCIe Passthrough] │
│ ↓ │
│ GPU (A100) │
└─────────────────────────────────────┘
Pros:
- Full GPU performance (100%)
- All GPU features available
- Mature, stable technology
Cons:
- One GPU per VM only
- GPU memory not shareable
- Expensive (dedicated GPU per workload)
vGPU (Virtual GPU)
GPU is partitioned and shared among multiple VMs:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ VM1 │ │ VM2 │ │ VM3 │
│ vGPU │ │ vGPU │ │ vGPU │
│ (8GB) │ │ (8GB) │ │ (8GB) │
└────┬────┘ └────┬────┘ └────┬────┘
└───────────┼───────────┘
┌──────┴──────┐
│ vGPU │
│ Manager │
└──────┬──────┘
┌──────┴──────┐
│ A100 (24GB)│
└─────────────┘
Technologies:
- NVIDIA vGPU (Grid)
- NVIDIA MIG (Multi-Instance GPU)
- AMD MxGPU
Pros:
- Share expensive GPU hardware
- Isolated GPU memory per VM
- Good for inference workloads
Cons:
- Licensing costs (NVIDIA vGPU)
- Performance overhead (5-15%)
- Not all GPUs support it
NVIDIA MIG (Multi-Instance GPU)
Hardware-level GPU partitioning (A100, H100):
┌─────────────────────────────────────┐
│ NVIDIA A100 │
├───────────┬───────────┬─────────────┤
│ MIG 1 │ MIG 2 │ MIG 3 │
│ (1g.5gb) │ (2g.10gb)│ (3g.20gb) │
│ │ │ │
│ └─VM1 │ └─VM2 │ └─VM3 │
└───────────┴───────────┴─────────────┘
Pros:
- Hardware isolation
- Guaranteed resources
- No vGPU licensing
Cons:
- Only high-end GPUs (A100, H100)
- Fixed partition sizes
- Limited flexibility
GPUs in Containers
NVIDIA Container Toolkit
Containers access GPUs through the NVIDIA Container Toolkit:
┌─────────────────────────────────────┐
│ Container │
│ ┌─────────────────────────────┐ │
│ │ AI/ML Application │ │
│ ├─────────────────────────────┤ │
│ │ CUDA / cuDNN │ │
│ └─────────────────────────────┘ │
├─────────────────────────────────────┤
│ NVIDIA Container Runtime │
├─────────────────────────────────────┤
│ NVIDIA Driver (Host) │
├─────────────────────────────────────┤
│ GPU │
└─────────────────────────────────────┘
# Run container with GPU access
docker run --gpus all nvidia/cuda:12.0-base nvidia-smi
Pros:
- Near-native GPU performance
- Easy to set up
- Works with any NVIDIA GPU
Cons:
- Shared GPU memory space
- No hardware isolation
- Requires careful resource management
Time-Slicing GPUs
Multiple containers share GPU via time-slicing:
Timeline:
├── Container A ──┤── Container B ──┤── Container A ──┤
(100ms) (100ms) (100ms)
Good for:
- Development environments
- Light inference workloads
- Cost optimization
Not good for:
- Training large models
- Real-time inference
- Predictable latency
Performance Comparison
GPU Passthrough VM vs Container
| Metric | VM (Passthrough) | Container |
|---|---|---|
| GPU Performance | 100% | ~100% |
| Setup Complexity | High | Low |
| Flexibility | Low | High |
| Isolation | Hardware | Process |
| Memory Overhead | High (VM OS) | Low |
Benchmark: Training BERT
Environment: NVIDIA A100, 40GB
Workload: Fine-tuning BERT-large
┌──────────────────────────────────────┐
│ VM (GPU Passthrough) │ 98.5% │
│ Container (native) │ 99.8% │
│ Container (time-slice) │ 45-70% │
│ VM (vGPU) │ 85-92% │
└──────────────────────────────────────┘
Performance vs Bare Metal
Use Case Recommendations
Choose VMs with GPU Passthrough When:
- Multi-tenant GPU cloud - Different customers need isolation
- Compliance requirements - PCI-DSS, HIPAA workloads
- Long-running training - Dedicated GPU for days/weeks
- Windows GPU workloads - CUDA on Windows
Choose VMs with vGPU/MIG When:
- Inference services - Many small models
- Development environments - Shared GPU for developers
- GPU oversubscription - More users than GPUs
Choose Containers for GPU When:
- Kubernetes deployments - Cloud-native ML platforms
- CI/CD pipelines - Quick GPU testing
- Batch processing - Short-lived GPU jobs
- Microservices - Inference APIs at scale
Float16's Approach
At Float16, we offer multiple GPU access patterns:
IaaS (VM-style)
- Full GPU passthrough
- SSH access
- Install anything
- Best for: Training, custom environments
PaaS (Container-style)
- Managed containers
- GPU-enabled pods
- Pre-configured environments
- Best for: Deployment, scaling
AaaS (API-style)
- No GPU management
- REST API access
- Pay per request
- Best for: Quick inference, prototypes
Best Practices for GPU Workloads
1. Match Workload to Platform
| Workload | Recommended | Why |
|---|---|---|
| LLM Training | VM/IaaS | Long-running, needs stability |
| Model Serving | Container | Scale up/down quickly |
| Development | Container/vGPU | Cost-efficient sharing |
| Production Inference | Container + K8s | Auto-scaling, orchestration |
2. Optimize GPU Memory
# Don't load everything into GPU memory
model = load_model()
model.to("cuda") # Only when needed
# Use gradient checkpointing for training
model.gradient_checkpointing_enable()
# Clear cache when done
torch.cuda.empty_cache()
3. Use GPU Monitoring
Track GPU utilization to right-size instances:
# Real-time monitoring
nvidia-smi dmon -s u
# Key metrics to watch
# - GPU Utilization
# - Memory Usage
# - Temperature
Summary
| Approach | Isolation | Performance | Flexibility | Cost |
|---|---|---|---|---|
| VM + Passthrough | Excellent | 100% | Low | High |
| VM + vGPU | Good | 85-95% | Medium | Medium |
| VM + MIG | Excellent | 95-100% | Low | Medium |
| Container | Process | 99-100% | High | Low |
| Container + Time-slice | Process | Variable | High | Lowest |
For most AI/ML workloads today, containers with proper GPU access provide the best balance of performance, flexibility, and cost. Use VMs when strong isolation or specific OS requirements are needed.
Congratulations!
You've completed the VM vs Container course! You now understand:
- How VMs and Containers work
- Their strengths and weaknesses
- How they handle GPU workloads
- When to use each approach
Ready to apply this knowledge? Explore Float16's GPU platform to deploy your AI workloads using the right technology for your needs.