- Home
- /
- Learn
- /
- LLM Deployment from an Instance to Cluster
- /
- CPU vs GPU for LLM Serving
Chapter 4 of 8•1 min read
CPU vs GPU for LLM Serving
การเปรียบเทียบ CPU และ GPU สำหรับ LLM inference
Chapter 4: CPU vs GPU for LLM Serving
การเปรียบเทียบ CPU และ GPU สำหรับ LLM inference
GPU Advantages
- Parallel Processing - Thousands of cores for matrix operations
- High Memory Bandwidth - H100: 3.35 TB/s vs CPU: ~100 GB/s
- Specialized Tensor Cores - Optimized for AI workloads
CPU Use Cases
- Small models (< 1B parameters)
- Low traffic - Cost effective for sporadic usage
- Edge deployment - No GPU required
- Development/testing - Easier setup
Performance Comparison
| Metric | GPU (H100) | CPU (x86) | Ratio |
|---|---|---|---|
| Tokens/sec (7B) | 500+ | 20-50 | 10-25x |
| Memory Bandwidth | 3.35 TB/s | 100 GB/s | 33x |
| Cost/hour | $2-4 | $0.1-0.5 | 1/10x |
Decision Framework
Use GPU when:
- Model > 7B parameters
- Throughput > 100 req/min
- Latency < 500ms required
Use CPU when:
- Model < 3B parameters
- Traffic is sporadic
- Budget constraints