Chapter 4 of 8•1 min read

CPU vs GPU for LLM Serving

การเปรียบเทียบ CPU และ GPU สำหรับ LLM inference

Chapter 4: CPU vs GPU for LLM Serving

การเปรียบเทียบ CPU และ GPU สำหรับ LLM inference

GPU Advantages

Parallel Processing - Thousands of cores for matrix operations
High Memory Bandwidth - H100: 3.35 TB/s vs CPU: ~100 GB/s
Specialized Tensor Cores - Optimized for AI workloads

CPU Use Cases

Small models (< 1B parameters)
Low traffic - Cost effective for sporadic usage
Edge deployment - No GPU required
Development/testing - Easier setup

Performance Comparison

Metric	GPU (H100)	CPU (x86)	Ratio
Tokens/sec (7B)	500+	20-50	10-25x
Memory Bandwidth	3.35 TB/s	100 GB/s	33x
Cost/hour	$2-4	$0.1-0.5	1/10x

Decision Framework

Use GPU when:
- Model > 7B parameters
- Throughput > 100 req/min
- Latency < 500ms required

Use CPU when:
- Model < 3B parameters
- Traffic is sporadic
- Budget constraints