Chapter 3: Dedicated GPU

Dedicated GPU Instances Deep Dive

Understanding dedicated GPU instances - full control, predictable performance, and when reserved GPU capacity makes economic sense.

Dedicated GPU Instances Deep Dive

Dedicated GPU instances give you exclusive access to GPU hardware. Let's understand when and how to use them effectively.

What is a Dedicated GPU Instance?

┌─────────────────────────────────────────────────────────┐
│              Your Dedicated GPU Instance                 │
│                                                         │
│  Hardware: NVIDIA A100 80GB                             │
│  vCPUs: 32                                              │
│  RAM: 256GB                                             │
│  Storage: 1TB NVMe                                      │
│                                                         │
│  Access:                                                │
│  • Full SSH access                                      │
│  • Root privileges                                      │
│  • Install any software                                 │
│  • Persistent storage                                   │
│                                                         │
│  Billing: $X.XX per hour                                │
│  (Charged whether you use it or not)                    │
└─────────────────────────────────────────────────────────┘

Types of Dedicated GPU Instances

On-Demand Instances

┌─────────────────────────────────────────────────────────┐
│                    ON-DEMAND                             │
│                                                         │
│  • Start anytime (if available)                         │
│  • Stop anytime                                         │
│  • Pay by the hour                                      │
│  • No commitment                                        │
│  • Highest flexibility                                  │
│  • Highest per-hour cost                               │
│                                                         │
│  Example: $4.00/hour for A100                           │
└─────────────────────────────────────────────────────────┘

Reserved Instances

┌─────────────────────────────────────────────────────────┐
│                    RESERVED                              │
│                                                         │
│  • Commit for 1-3 years                                 │
│  • Guaranteed availability                              │
│  • 30-60% discount vs on-demand                        │
│  • Pay upfront or monthly                              │
│  • Best for predictable workloads                      │
│                                                         │
│  Example: $2.00/hour for A100 (1-year commitment)       │
└─────────────────────────────────────────────────────────┘

Spot/Preemptible Instances

┌─────────────────────────────────────────────────────────┐
│                   SPOT/PREEMPTIBLE                       │
│                                                         │
│  • Use provider's spare capacity                        │
│  • 60-90% discount vs on-demand                        │
│  • Can be terminated with notice                       │
│  • Best for fault-tolerant workloads                   │
│  • Requires checkpointing                              │
│                                                         │
│  Example: $0.80/hour for A100 (when available)          │
└─────────────────────────────────────────────────────────┘

Dedicated Instance Workflow

Typical Usage Pattern

1. PROVISION
   Request instance → Wait for allocation → Instance ready
   (Seconds to minutes)

2. CONFIGURE
   SSH in → Install dependencies → Download models
   (Minutes to hours)

3. RUN WORKLOADS
   Training, inference, experiments
   (Hours to days)

4. TERMINATE OR KEEP RUNNING
   Stop to save costs or keep for continuous use

Infrastructure as Code

# Using infrastructure as code
instance = cloud.create_instance(
    gpu_type="A100-80GB",
    gpu_count=8,
    cpu_cores=128,
    memory_gb=1024,
    storage_gb=4000,
    region="us-east-1"
)

# SSH access
instance.ssh("nvidia-smi")

# Run training
instance.run("python train.py")

Full Control Benefits

1. Custom Environment

# Install anything you need
sudo apt install custom-package
pip install specific-version==1.2.3

# Compile from source
git clone https://github.com/custom/repo
cd repo && make install

# Configure system
sudo sysctl -w vm.swappiness=10

2. Persistent Storage

┌─────────────────────────────────────────────────────────┐
│              Persistent Storage                          │
│                                                         │
│  /data/models/          ← Pre-loaded models             │
│  /data/datasets/        ← Training data                 │
│  /data/checkpoints/     ← Saved training states         │
│  /data/outputs/         ← Results                       │
│                                                         │
│  Persists across reboots                                │
│  Can be snapshotted                                     │
└─────────────────────────────────────────────────────────┘

3. Networking Control

┌─────────────────────────────────────────────────────────┐
│              Network Configuration                       │
│                                                         │
│  • Private VPC/Subnet                                   │
│  • Custom firewall rules                                │
│  • VPN connectivity                                     │
│  • Direct connect to data center                        │
│  • Multi-GPU communication (NVLink, InfiniBand)         │
└─────────────────────────────────────────────────────────┘

Dedicated Instance Pricing

Cost Components

┌─────────────────────────────────────────────────────────┐
│              Monthly Cost Breakdown                      │
│                                                         │
│  GPU Instance (A100):     $2,880/month ($4/hr × 720hr) │
│  Storage (1TB):           $100/month                    │
│  Network Egress (500GB):  $40/month                     │
│  Backups/Snapshots:       $50/month                     │
│  ─────────────────────────────────────────────────      │
│  Total:                   $3,070/month                  │
└─────────────────────────────────────────────────────────┘

Cost Comparison by GPU

GPU VRAM On-Demand/hr Reserved/hr Spot/hr
T4 16GB $0.50 $0.30 $0.15
A10 24GB $1.50 $0.90 $0.45
A100-40GB 40GB $3.00 $1.80 $0.80
A100-80GB 80GB $4.00 $2.40 $1.00
H100 80GB $6.00 $3.60 $1.50

When Dedicated Instances Excel

1. Training Workloads

Model Training Characteristics:
• Runs for hours/days continuously
• 100% GPU utilization
• Needs consistent performance
• Requires checkpointing
• Large data I/O

Dedicated Benefits:
• No cold starts
• Consistent performance
• Can optimize infrastructure
• Cost-effective at high utilization

2. High-Throughput Inference

Scenario: 1000+ requests/second

Serverless:
• Auto-scaling overhead
• Cold start queuing
• Variable latency

Dedicated:
• Consistent sub-50ms latency
• Pre-warmed models
• Optimized batching
• Predictable capacity

3. Multi-GPU Training

┌─────────────────────────────────────────────────────────┐
│           8x A100 Training Cluster                       │
│                                                         │
│  GPU 0 ←──NVLink──→ GPU 1 ←──NVLink──→ GPU 2 ...       │
│                                                         │
│  Distributed Training:                                  │
│  • Data parallelism                                     │
│  • Model parallelism                                    │
│  • Pipeline parallelism                                 │
│                                                         │
│  Requires dedicated, connected instances                │
└─────────────────────────────────────────────────────────┘

4. Latency-Critical Applications

Real-time Requirements:
├── Gaming: < 16ms (60fps)
├── Video: < 33ms (30fps)
├── Voice: < 100ms
└── Interactive Chat: < 500ms

Dedicated provides:
• Guaranteed capacity
• No cold starts
• Predictable latency
• No request queuing

5. Sensitive/Regulated Workloads

Compliance Requirements:
• Data cannot leave your control
• Audit trail requirements
• Specific security configurations
• Data residency requirements

Dedicated provides:
• Full infrastructure control
• Encryption configuration
• Network isolation
• Audit logging

Optimization Strategies

1. Right-Sizing

Don't over-provision:

Workload needs 20GB VRAM
├── A100-80GB → 40GB wasted, $4/hr
├── A100-40GB → 20GB wasted, $3/hr
└── A10-24GB  → 4GB headroom, $1.50/hr ✓

Save 62% by choosing right GPU

2. Time-Based Scheduling

# Stop instances during off-hours
schedule = {
    "weekdays": {"start": "08:00", "stop": "20:00"},
    "weekends": "off"
}

# 12 hours × 5 days = 60 hours/week
# vs 168 hours/week (always on)
# Save 64% on compute costs

3. Spot Instance Strategy

┌─────────────────────────────────────────────────────────┐
│           Spot Instance Best Practices                   │
│                                                         │
│  1. Checkpoint frequently                               │
│     save_checkpoint(model, optimizer, epoch, step)      │
│                                                         │
│  2. Use multiple availability zones                     │
│     Try zone-a, fallback to zone-b, zone-c             │
│                                                         │
│  3. Mix spot + on-demand                               │
│     Base capacity: On-demand                           │
│     Burst capacity: Spot                               │
│                                                         │
│  4. Automate recovery                                   │
│     Detect termination → Save state → Restart          │
└─────────────────────────────────────────────────────────┘

4. Multi-Tenancy Within Instance

Run multiple workloads on one GPU:

┌─────────────────────────────────────────────────────────┐
│                    A100 80GB                             │
│  ┌──────────────────┐  ┌──────────────────┐            │
│  │  Model A (20GB)  │  │  Model B (15GB)  │            │
│  │  Inference API   │  │  Batch Processing│            │
│  └──────────────────┘  └──────────────────┘            │
│                                                         │
│  45GB still available                                   │
│  Share GPU cost across workloads                        │
└─────────────────────────────────────────────────────────┘

Summary

DEDICATED GPU IS GREAT FOR:
✓ Training workloads
✓ High-throughput inference
✓ Multi-GPU clusters
✓ Latency-critical applications
✓ Regulated/sensitive workloads
✓ High utilization (>50%)

DEDICATED GPU CHALLENGES:
✗ Higher management overhead
✗ Pay for idle time
✗ Requires capacity planning
✗ Manual scaling
✗ Infrastructure expertise needed

What's Next?

In the next chapter, we'll build a decision framework to help you choose between serverless and dedicated GPU for your specific use case.