AI Services Overview

Float16 provides AI capabilities through two main services: GPU Instance for dedicated LLM deployments and Serverless GPU for pre-configured AI templates.

AI Capabilities

One-Click Deployment

Deploy vLLM models instantly on dedicated GPU instances:

Preset Models: Qwen, Llama, Typhoon, Gemma, GLM, and more
Custom Models: Deploy any HuggingFace model
vLLM Framework: High-throughput serving with PagedAttention
OpenAI-Compatible API: Use with OpenAI Python SDK

Access via GPU Instance > Create Instance > One-Click Deployment

Learn more about One-Click Deployment

vLLM Playground

Interactive environment for testing deployed models:

Tool Calling: Test function calling with example tools
Structured Outputs: JSON Schema, Regex patterns, Choice constraints
Typhoon OCR: Extract text from Thai/English documents
View Code: Copy Python, cURL, or JSON examples

The playground is available after deploying a model via One-Click Deployment.

Explore the vLLM Playground

Blueprints

Pre-configured AI templates for Serverless GPU:

Category	Templates
Text	Qwen3 30B A3B, Gemma3-27b, Typhoon2.1-gemma3-12b
OCR	Thai Document OCR
Genomics	Parabricks fq2bam

Access via Serverless GPU > Blueprint

Learn about Blueprints

Typhoon OCR

Extract text from Thai and English documents:

Thai Document OCR Blueprint: Available in Serverless GPU
Typhoon OCR in vLLM Playground: Test with deployed models
Document Processing: Images and scanned documents

Learn about Typhoon OCR

Resources

Float16 provides additional AI tools:

Resource	Description	URL
Chatbot	Chat interface to interact with LLM models	chat.float16.cloud
Quantize	Compare LLM inference speeds with different quantization	quantize.float16.cloud
Colab	Online Python code editor	colab.float16.cloud
Prompt	Create, run and share prompts	prompt.float16.cloud

Getting Started

Deploy an LLM

Sign in at app.float16.cloud
Navigate to GPU Instance > Create Instance
Select the One-Click Deployment tab
Choose a preset model or add a custom HuggingFace model
Configure volume size and click Create Instance
Access your model via the endpoint proxy

Use a Blueprint

Navigate to Serverless GPU > Blueprint
Browse templates by category (Text, OCR, Parabricks)
Click on a blueprint to view details
Test with the chat interface
Click Deploy Blueprint

Pricing

Service	Pricing
GPU Instance (H100)	$4.32/hr (On-Demand), $2.16/hr (Spot)
Serverless GPU (H100)	$4.32/hr ($0.0012/sec)
Storage	$1.00/GB/month

View pricing at GPU Instance > Pricing or Serverless GPU > Pricing.

Next Steps

One-Click Deployment - Deploy vLLM models
vLLM Playground - Test your models
Blueprints - Use pre-configured templates
Typhoon OCR - Process documents