AI Services Overview
Float16 provides AI capabilities through two main services: GPU Instance for dedicated LLM deployments and Serverless GPU for pre-configured AI templates.
AI Capabilities
One-Click Deployment
Deploy vLLM models instantly on dedicated GPU instances:
- Preset Models: Qwen, Llama, Typhoon, Gemma, GLM, and more
- Custom Models: Deploy any HuggingFace model
- vLLM Framework: High-throughput serving with PagedAttention
- OpenAI-Compatible API: Use with OpenAI Python SDK
Access via GPU Instance > Create Instance > One-Click Deployment
Learn more about One-Click Deployment
vLLM Playground
Interactive environment for testing deployed models:
- Tool Calling: Test function calling with example tools
- Structured Outputs: JSON Schema, Regex patterns, Choice constraints
- Typhoon OCR: Extract text from Thai/English documents
- View Code: Copy Python, cURL, or JSON examples
The playground is available after deploying a model via One-Click Deployment.
Blueprints
Pre-configured AI templates for Serverless GPU:
| Category | Templates |
|---|---|
| Text | Qwen3 30B A3B, Gemma3-27b, Typhoon2.1-gemma3-12b |
| OCR | Thai Document OCR |
| Genomics | Parabricks fq2bam |
Access via Serverless GPU > Blueprint
Typhoon OCR
Extract text from Thai and English documents:
- Thai Document OCR Blueprint: Available in Serverless GPU
- Typhoon OCR in vLLM Playground: Test with deployed models
- Document Processing: Images and scanned documents
Resources
Float16 provides additional AI tools:
| Resource | Description | URL |
|---|---|---|
| Chatbot | Chat interface to interact with LLM models | chat.float16.cloud |
| Quantize | Compare LLM inference speeds with different quantization | quantize.float16.cloud |
| Colab | Online Python code editor | colab.float16.cloud |
| Prompt | Create, run and share prompts | prompt.float16.cloud |
Getting Started
Deploy an LLM
- Sign in at app.float16.cloud
- Navigate to GPU Instance > Create Instance
- Select the One-Click Deployment tab
- Choose a preset model or add a custom HuggingFace model
- Configure volume size and click Create Instance
- Access your model via the endpoint proxy
Use a Blueprint
- Navigate to Serverless GPU > Blueprint
- Browse templates by category (Text, OCR, Parabricks)
- Click on a blueprint to view details
- Test with the chat interface
- Click Deploy Blueprint
Pricing
| Service | Pricing |
|---|---|
| GPU Instance (H100) | $4.32/hr (On-Demand), $2.16/hr (Spot) |
| Serverless GPU (H100) | $4.32/hr ($0.0012/sec) |
| Storage | $1.00/GB/month |
View pricing at GPU Instance > Pricing or Serverless GPU > Pricing.
Next Steps
- One-Click Deployment - Deploy vLLM models
- vLLM Playground - Test your models
- Blueprints - Use pre-configured templates
- Typhoon OCR - Process documents