GPU Platform Overview
Float16 Cloud's GPU Instance service provides dedicated GPU resources with full SSH access for AI and machine learning workloads.
Available GPU Types
| GPU | Provider | Region | On-Demand | Spot | Storage |
|---|---|---|---|---|---|
| H100 | Float16 | Thailand | $4.32/hr | $2.16/hr | $1.00/GB/mo |
View current pricing at GPU Instance > Pricing in the dashboard.
Creating an Instance
Via Dashboard
- Navigate to GPU Instance > Create Instance
- Choose deployment type:
- Base VM - Full SSH access to a GPU instance
- One-Click Deployment - Deploy vLLM models instantly
- Configure your instance:
- Project Name (optional) - A friendly name for your project
- Instance Type - Select GPU type (e.g., H100)
- Volume Size - 50GB to 10,000GB persistent storage
- Click Create Instance
Base VM
Base VM provides full SSH access to a GPU instance with pre-configured CUDA and ML frameworks. Ideal for:
- Custom development environments
- Training jobs
- Running custom services
One-Click Deployment
Deploy vLLM models instantly with preset or custom models. See One-Click Deployment for details.
Instance Lifecycle
GPU instances support lifecycle management:
| Action | Description |
|---|---|
| Start | Launch a new instance |
| Stop | Pause compute (only storage cost charged) |
| Resume | Continue from where you left off |
| Terminate | Permanently delete instance and resources |
Cost Savings with Stop/Resume
When an instance is stopped:
- No compute cost is charged
- Only volume storage cost applies ($1.00/GB/mo)
- Your data and environment are preserved
This allows you to save costs while preserving your work.
Storage
Persistent Volumes
Each instance includes persistent storage that survives instance restarts:
- Size: 50GB to 10,000GB
- Backed by: NetApp Trident
- Cost: $1.00/GB/month
Volume Management
Manage volumes at GPU Instance > Volume:
- View total volumes and storage usage
- Create standalone volumes
- Monitor volume health status
Connecting to Your Instance
After creating a Base VM instance:
- Go to GPU Instance > Instances
- Find your running instance
- Copy the SSH command provided
- Connect via terminal:
ssh root@<your-instance-ip>
Endpoint Proxy
Access services running on your GPU instances via secure proxy endpoints:
- Format:
https://proxy-instance.float16.cloud/{task_id}/{port}/{path} - Ports: 3000-4000 supported
- Compatible with: vLLM, custom APIs, Jupyter, and more
For vLLM deployments, use the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://proxy-instance.float16.cloud/{task_id}/3000/v1"
)
Billing
- Billing starts when the instance is created
- Billing stops when the instance is terminated
- Minimum increment: 1 minute for compute
- Stopped instances: Only storage cost charged
Next Steps
- One-Click Deployment - Deploy LLM models instantly
- Volumes & Storage - Learn more about storage options