Documentation

AI Services Overview

Explore Float16's AI capabilities including LLM deployment, vLLM Playground, and OCR

AI Services Overview

Float16 provides AI capabilities through two main services: GPU Instance for dedicated LLM deployments and Serverless GPU for pre-configured AI templates.

AI Capabilities

One-Click Deployment

Deploy vLLM models instantly on dedicated GPU instances:

  • Preset Models: Qwen, Llama, Typhoon, Gemma, GLM, and more
  • Custom Models: Deploy any HuggingFace model
  • vLLM Framework: High-throughput serving with PagedAttention
  • OpenAI-Compatible API: Use with OpenAI Python SDK

Access via GPU Instance > Create Instance > One-Click Deployment

Learn more about One-Click Deployment

vLLM Playground

Interactive environment for testing deployed models:

  • Tool Calling: Test function calling with example tools
  • Structured Outputs: JSON Schema, Regex patterns, Choice constraints
  • Typhoon OCR: Extract text from Thai/English documents
  • View Code: Copy Python, cURL, or JSON examples

The playground is available after deploying a model via One-Click Deployment.

Explore the vLLM Playground

Blueprints

Pre-configured AI templates for Serverless GPU:

Category Templates
Text Qwen3 30B A3B, Gemma3-27b, Typhoon2.1-gemma3-12b
OCR Thai Document OCR
Genomics Parabricks fq2bam

Access via Serverless GPU > Blueprint

Learn about Blueprints

Typhoon OCR

Extract text from Thai and English documents:

  • Thai Document OCR Blueprint: Available in Serverless GPU
  • Typhoon OCR in vLLM Playground: Test with deployed models
  • Document Processing: Images and scanned documents

Learn about Typhoon OCR

Resources

Float16 provides additional AI tools:

Resource Description URL
Chatbot Chat interface to interact with LLM models chat.float16.cloud
Quantize Compare LLM inference speeds with different quantization quantize.float16.cloud
Colab Online Python code editor colab.float16.cloud
Prompt Create, run and share prompts prompt.float16.cloud

Getting Started

Deploy an LLM

  1. Sign in at app.float16.cloud
  2. Navigate to GPU Instance > Create Instance
  3. Select the One-Click Deployment tab
  4. Choose a preset model or add a custom HuggingFace model
  5. Configure volume size and click Create Instance
  6. Access your model via the endpoint proxy

Use a Blueprint

  1. Navigate to Serverless GPU > Blueprint
  2. Browse templates by category (Text, OCR, Parabricks)
  3. Click on a blueprint to view details
  4. Test with the chat interface
  5. Click Deploy Blueprint

Pricing

Service Pricing
GPU Instance (H100) $4.32/hr (On-Demand), $2.16/hr (Spot)
Serverless GPU (H100) $4.32/hr ($0.0012/sec)
Storage $1.00/GB/month

View pricing at GPU Instance > Pricing or Serverless GPU > Pricing.

Next Steps

Tags:aillmocrinferenceoverview
Last updated: February 1, 20253 min read