🌟 New service Serverless GPU → Read more

Serverless GPU

GPU Computing on Demand - Pay Only When You Compute

Transform your AI workflow with instant GPU access - no server management, no hourly fees

Zero Infrastructure Management.
Forget about provisioning, drivers, or docker builds. Just upload your code or model — we handle the setup, drivers, and runtime on high-performance H100s.
Instant Start, No Cold Boots.
Launch jobs in seconds with near-zero init time. All runtimes are optimized to skip cold starts — perfect for both real-time inference and quick experiments.
Efficient, Pay-Per-Use Execution.
Only pay when your code is running. Whether it’s a 10-second inference or multi-hour training, serverless scheduling with on-demand and spot pricing keeps costs under control.

Serverless, But Actually Built for AI

Most serverless platforms weren’t designed for the needs of AI workloads. Float16 changes that with GPU-native, developer-friendly serverless that just works.

Traditional ServerlessFloat16 Serverless
Startup Time
Cold starts (slow, minutes)⚡ Instant start, no cold boots
GPU Access
Limited or unavailable✅ High-performance NVIDIA H100s
Code Compatibility
Requires container/image setup🧠 Run `.py` scripts directly
Model & Weight Handling
Manual load in every run🪄 Pre-loaded weights and cache
Pricing Model
Flat rate / idle costs💰 True pay-per-use
Dev Workflow
Designed for generic workloads🎯 Built for AI training & inference
Batch / Spot Job Support
Rare / manual configuration🖥️ Built-in spot mode support

How It Works

Get Started with Simple Steps

Deploy Mode: Get an endpoint for continuous access

float16 deploy app.py

Run Mode: Quick compute and get results

float16 run app.py

Perfect For

AI Development & Testing

Fast iteration on your ML experiments. Perfect for quick model adjustments and rapid testing cycles without infrastructure overhead.

Periodic Model Inference

Run predictions exactly when needed. Ideal for batch processing and occasional inference tasks without paying for idle time.

Research Projects

Focus on research, not infrastructure. Great for academic work and experiments with varying computational demands.

Prototype Deployment

Test ideas without long-term commitments. Suitable for MVPs and proof-of-concepts that need professional-grade GPU power.

Serverless GPUs withTrue Pay-Per-Use Pricing

Start instantly with per-second billing on H100 GPUs and pay only for what you use — no setup, no idle costs. Whether you're deploying LLMs or running batch training jobs, our pricing is designed to scale with your workload.

Price

GPU Types

On-demand

Spot

H100

$0.006 / sec

$0.0012 / sec

Storage

$5.184 / GB / Month

CPU & Memory

included

Your First Serverless GPU Task

Zero setup. Zero idle costs. Pure computing power.