- Zero Infrastructure Management.
- Forget about provisioning, drivers, or docker builds. Just upload your code or model — we handle the setup, drivers, and runtime on high-performance H100s.
- Instant Start, No Cold Boots.
- Launch jobs in seconds with near-zero init time. All runtimes are optimized to skip cold starts — perfect for both real-time inference and quick experiments.
- Efficient, Pay-Per-Use Execution.
- Only pay when your code is running. Whether it’s a 10-second inference or multi-hour training, serverless scheduling with on-demand and spot pricing keeps costs under control.
Serverless, But Actually Built for AI
Most serverless platforms weren’t designed for the needs of AI workloads. Float16 changes that with GPU-native, developer-friendly serverless that just works.
Feature | Traditional Serverless | Float16 Serverless |
---|---|---|
Startup Time | Cold starts (slow, minutes) | ⚡ Instant start, no cold boots |
GPU Access | Limited or unavailable | ✅ High-performance NVIDIA H100s |
Code Compatibility | Requires container/image setup | 🧠 Run `.py` scripts directly |
Model & Weight Handling | Manual load in every run | 🪄 Pre-loaded weights and cache |
Pricing Model | Flat rate / idle costs | 💰 True pay-per-use |
Dev Workflow | Designed for generic workloads | 🎯 Built for AI training & inference |
Batch / Spot Job Support | Rare / manual configuration | 🖥️ Built-in spot mode support |
Traditional Serverless | Float16 Serverless |
---|---|
Startup Time | |
Cold starts (slow, minutes) | ⚡ Instant start, no cold boots |
GPU Access | |
Limited or unavailable | ✅ High-performance NVIDIA H100s |
Code Compatibility | |
Requires container/image setup | 🧠 Run `.py` scripts directly |
Model & Weight Handling | |
Manual load in every run | 🪄 Pre-loaded weights and cache |
Pricing Model | |
Flat rate / idle costs | 💰 True pay-per-use |
Dev Workflow | |
Designed for generic workloads | 🎯 Built for AI training & inference |
Batch / Spot Job Support | |
Rare / manual configuration | 🖥️ Built-in spot mode support |
How It Works
Get Started with Simple Steps
Deploy Mode: Get an endpoint for continuous access
float16 deploy app.py
Run Mode: Quick compute and get results
float16 run app.py
Perfect For
AI Development & Testing
Fast iteration on your ML experiments. Perfect for quick model adjustments and rapid testing cycles without infrastructure overhead.
Periodic Model Inference
Run predictions exactly when needed. Ideal for batch processing and occasional inference tasks without paying for idle time.
Research Projects
Focus on research, not infrastructure. Great for academic work and experiments with varying computational demands.
Prototype Deployment
Test ideas without long-term commitments. Suitable for MVPs and proof-of-concepts that need professional-grade GPU power.
Serverless GPUs withTrue Pay-Per-Use Pricing
Start instantly with per-second billing on H100 GPUs and pay only for what you use — no setup, no idle costs. Whether you're deploying LLMs or running batch training jobs, our pricing is designed to scale with your workload.
Price
GPU Types
On-demand
Spot
H100
$0.006 / sec
$0.0012 / sec
Storage
$5.184 / GB / Month
CPU & Memory
included