Dedicated GPU Instances Deep Dive
Dedicated GPU instances give you exclusive access to GPU hardware. Let's understand when and how to use them effectively.
What is a Dedicated GPU Instance?
┌─────────────────────────────────────────────────────────┐
│ Your Dedicated GPU Instance │
│ │
│ Hardware: NVIDIA A100 80GB │
│ vCPUs: 32 │
│ RAM: 256GB │
│ Storage: 1TB NVMe │
│ │
│ Access: │
│ • Full SSH access │
│ • Root privileges │
│ • Install any software │
│ • Persistent storage │
│ │
│ Billing: $X.XX per hour │
│ (Charged whether you use it or not) │
└─────────────────────────────────────────────────────────┘
Types of Dedicated GPU Instances
On-Demand Instances
┌─────────────────────────────────────────────────────────┐
│ ON-DEMAND │
│ │
│ • Start anytime (if available) │
│ • Stop anytime │
│ • Pay by the hour │
│ • No commitment │
│ • Highest flexibility │
│ • Highest per-hour cost │
│ │
│ Example: $4.00/hour for A100 │
└─────────────────────────────────────────────────────────┘
Reserved Instances
┌─────────────────────────────────────────────────────────┐
│ RESERVED │
│ │
│ • Commit for 1-3 years │
│ • Guaranteed availability │
│ • 30-60% discount vs on-demand │
│ • Pay upfront or monthly │
│ • Best for predictable workloads │
│ │
│ Example: $2.00/hour for A100 (1-year commitment) │
└─────────────────────────────────────────────────────────┘
Spot/Preemptible Instances
┌─────────────────────────────────────────────────────────┐
│ SPOT/PREEMPTIBLE │
│ │
│ • Use provider's spare capacity │
│ • 60-90% discount vs on-demand │
│ • Can be terminated with notice │
│ • Best for fault-tolerant workloads │
│ • Requires checkpointing │
│ │
│ Example: $0.80/hour for A100 (when available) │
└─────────────────────────────────────────────────────────┘
Dedicated Instance Workflow
Typical Usage Pattern
1. PROVISION
Request instance → Wait for allocation → Instance ready
(Seconds to minutes)
2. CONFIGURE
SSH in → Install dependencies → Download models
(Minutes to hours)
3. RUN WORKLOADS
Training, inference, experiments
(Hours to days)
4. TERMINATE OR KEEP RUNNING
Stop to save costs or keep for continuous use
Infrastructure as Code
# Using infrastructure as code
instance = cloud.create_instance(
gpu_type="A100-80GB",
gpu_count=8,
cpu_cores=128,
memory_gb=1024,
storage_gb=4000,
region="us-east-1"
)
# SSH access
instance.ssh("nvidia-smi")
# Run training
instance.run("python train.py")
Full Control Benefits
1. Custom Environment
# Install anything you need
sudo apt install custom-package
pip install specific-version==1.2.3
# Compile from source
git clone https://github.com/custom/repo
cd repo && make install
# Configure system
sudo sysctl -w vm.swappiness=10
2. Persistent Storage
┌─────────────────────────────────────────────────────────┐
│ Persistent Storage │
│ │
│ /data/models/ ← Pre-loaded models │
│ /data/datasets/ ← Training data │
│ /data/checkpoints/ ← Saved training states │
│ /data/outputs/ ← Results │
│ │
│ Persists across reboots │
│ Can be snapshotted │
└─────────────────────────────────────────────────────────┘
3. Networking Control
┌─────────────────────────────────────────────────────────┐
│ Network Configuration │
│ │
│ • Private VPC/Subnet │
│ • Custom firewall rules │
│ • VPN connectivity │
│ • Direct connect to data center │
│ • Multi-GPU communication (NVLink, InfiniBand) │
└─────────────────────────────────────────────────────────┘
Dedicated Instance Pricing
Cost Components
┌─────────────────────────────────────────────────────────┐
│ Monthly Cost Breakdown │
│ │
│ GPU Instance (A100): $2,880/month ($4/hr × 720hr) │
│ Storage (1TB): $100/month │
│ Network Egress (500GB): $40/month │
│ Backups/Snapshots: $50/month │
│ ───────────────────────────────────────────────── │
│ Total: $3,070/month │
└─────────────────────────────────────────────────────────┘
Cost Comparison by GPU
| GPU | VRAM | On-Demand/hr | Reserved/hr | Spot/hr |
|---|---|---|---|---|
| T4 | 16GB | $0.50 | $0.30 | $0.15 |
| A10 | 24GB | $1.50 | $0.90 | $0.45 |
| A100-40GB | 40GB | $3.00 | $1.80 | $0.80 |
| A100-80GB | 80GB | $4.00 | $2.40 | $1.00 |
| H100 | 80GB | $6.00 | $3.60 | $1.50 |
When Dedicated Instances Excel
1. Training Workloads
Model Training Characteristics:
• Runs for hours/days continuously
• 100% GPU utilization
• Needs consistent performance
• Requires checkpointing
• Large data I/O
Dedicated Benefits:
• No cold starts
• Consistent performance
• Can optimize infrastructure
• Cost-effective at high utilization
2. High-Throughput Inference
Scenario: 1000+ requests/second
Serverless:
• Auto-scaling overhead
• Cold start queuing
• Variable latency
Dedicated:
• Consistent sub-50ms latency
• Pre-warmed models
• Optimized batching
• Predictable capacity
3. Multi-GPU Training
┌─────────────────────────────────────────────────────────┐
│ 8x A100 Training Cluster │
│ │
│ GPU 0 ←──NVLink──→ GPU 1 ←──NVLink──→ GPU 2 ... │
│ │
│ Distributed Training: │
│ • Data parallelism │
│ • Model parallelism │
│ • Pipeline parallelism │
│ │
│ Requires dedicated, connected instances │
└─────────────────────────────────────────────────────────┘
4. Latency-Critical Applications
Real-time Requirements:
├── Gaming: < 16ms (60fps)
├── Video: < 33ms (30fps)
├── Voice: < 100ms
└── Interactive Chat: < 500ms
Dedicated provides:
• Guaranteed capacity
• No cold starts
• Predictable latency
• No request queuing
5. Sensitive/Regulated Workloads
Compliance Requirements:
• Data cannot leave your control
• Audit trail requirements
• Specific security configurations
• Data residency requirements
Dedicated provides:
• Full infrastructure control
• Encryption configuration
• Network isolation
• Audit logging
Optimization Strategies
1. Right-Sizing
Don't over-provision:
Workload needs 20GB VRAM
├── A100-80GB → 40GB wasted, $4/hr
├── A100-40GB → 20GB wasted, $3/hr
└── A10-24GB → 4GB headroom, $1.50/hr ✓
Save 62% by choosing right GPU
2. Time-Based Scheduling
# Stop instances during off-hours
schedule = {
"weekdays": {"start": "08:00", "stop": "20:00"},
"weekends": "off"
}
# 12 hours × 5 days = 60 hours/week
# vs 168 hours/week (always on)
# Save 64% on compute costs
3. Spot Instance Strategy
┌─────────────────────────────────────────────────────────┐
│ Spot Instance Best Practices │
│ │
│ 1. Checkpoint frequently │
│ save_checkpoint(model, optimizer, epoch, step) │
│ │
│ 2. Use multiple availability zones │
│ Try zone-a, fallback to zone-b, zone-c │
│ │
│ 3. Mix spot + on-demand │
│ Base capacity: On-demand │
│ Burst capacity: Spot │
│ │
│ 4. Automate recovery │
│ Detect termination → Save state → Restart │
└─────────────────────────────────────────────────────────┘
4. Multi-Tenancy Within Instance
Run multiple workloads on one GPU:
┌─────────────────────────────────────────────────────────┐
│ A100 80GB │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Model A (20GB) │ │ Model B (15GB) │ │
│ │ Inference API │ │ Batch Processing│ │
│ └──────────────────┘ └──────────────────┘ │
│ │
│ 45GB still available │
│ Share GPU cost across workloads │
└─────────────────────────────────────────────────────────┘
Summary
DEDICATED GPU IS GREAT FOR:
✓ Training workloads
✓ High-throughput inference
✓ Multi-GPU clusters
✓ Latency-critical applications
✓ Regulated/sensitive workloads
✓ High utilization (>50%)
DEDICATED GPU CHALLENGES:
✗ Higher management overhead
✗ Pay for idle time
✗ Requires capacity planning
✗ Manual scaling
✗ Infrastructure expertise needed
What's Next?
In the next chapter, we'll build a decision framework to help you choose between serverless and dedicated GPU for your specific use case.