Quick Start Guide

This guide will help you deploy your first GPU instance on Float16 Cloud in just a few minutes.

Prerequisites

Before you begin, make sure you have:

A Google account to sign in to Float16 Cloud
Basic familiarity with terminal/command line (for GPU Instance SSH access)
An SSH key pair for GPU Instance access (we'll show you how to create one if needed)

Step 1: Create Your Account

Visit app.float16.cloud
Click Sign in with Google to create your account using Google SSO
Authorize Float16 Cloud to access your Google account
You'll be automatically redirected to the dashboard

Step 2: Explore the Dashboard

After signing in, you'll see the Float16 dashboard with two main services:

Services

Serverless GPU - Instantly access powerful GPUs, pay only for usage, and run AI workloads without managing infrastructure
GPU Instance - Create dedicated GPU instances with SSH access for development and training

Resources

Colab - Online tool to write and execute Python code through the browser
Quantize - Compare LLM inference speeds with different quantization techniques
Chatbot - Chat interface to interact with LLM models
Prompt - Create, run and share prompts with your colleagues
Apply Coupon - Redeem coupon codes to get credits

Step 3: Create Your First GPU Instance

Option A: Serverless GPU (Recommended for Quick Start)

Click Serverless GPU in the sidebar
Go to Projects tab
Click Create Project
Select a GPU instance type from the dropdown
Enter a project name (optional)
Click Create Project

Option B: GPU Instance with SSH Access

Click GPU Instance in the sidebar
Go to Create Instance tab
Choose between:
- Base VM - Full SSH access to a GPU instance
- One-Click Deployment - Deploy vLLM models instantly
Select an instance type (e.g., H100)
Configure volume size (50-10000 GB)
Click Create Instance

Step 4: Connect to GPU Instance (Base VM)

For GPU Instance with SSH access:

Go to GPU Instance > Instances
Find your running instance
Copy the SSH command provided
Connect via terminal:

ssh root@<your-instance-ip>

Step 5: Run Your First GPU Workload

Test that everything is working:

nvidia-smi

You should see your GPU information displayed.

Run a simple PyTorch test

import torch

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

# Run a simple tensor operation
x = torch.randn(1000, 1000).cuda()
y = torch.matmul(x, x)
print(f"Tensor shape: {y.shape}")

Alternative Quick Starts

Deploy an LLM with One-Click Deployment

Deploy vLLM models instantly without managing infrastructure:

Navigate to GPU Instance > Create Instance
Select the One-Click Deployment tab
Choose from preset models:
- Qwen - Alibaba's large language models (8B to 235B)
- Llama - Meta's Llama 3.3 70B
- Typhoon - SCB10X models including OCR
- GLM - ZAI's GLM 4.7 Flash
Or add a custom HuggingFace model
Configure volume size and click Create Instance

Access your model via the proxy endpoint:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://proxy-instance.float16.cloud/{task_id}/3000/v1"
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Full LLM Deployment Guide

Extract Text with OCR

Process documents with Typhoon OCR:

Navigate to AI Services > OCR
Upload a document or image
View extracted text

Or use the API:

curl -X POST https://api.float16.cloud/v1/ocr/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf"

OCR Documentation

Train with ML Frameworks

Launch a pre-configured training environment:

Navigate to Instances > Create
Select a framework image:
- float16/tao:6.0 - Computer vision
- float16/monai:1.5.1 - Medical imaging
- float16/nemo:2.6.1 - Speech/NLP
Choose your GPU and launch

ML Training Overview

Next Steps

Now that you have your first instance running, explore:

GPU Platform Guide - Learn advanced features
Serverless Deployment - Deploy models as APIs
AI Services - LLMs, OCR, and more
ML Training - TAO, MONAI, NeMo
One-Click Deployment - Deploy models instantly
API Reference - Automate with our API

Quick Start Guide

Quick Start Guide

Prerequisites

Step 1: Create Your Account

Step 2: Explore the Dashboard

Services

Resources

Step 3: Create Your First GPU Instance

Option A: Serverless GPU (Recommended for Quick Start)

Option B: GPU Instance with SSH Access

Step 4: Connect to GPU Instance (Base VM)

Step 5: Run Your First GPU Workload

Run a simple PyTorch test

Alternative Quick Starts

Deploy an LLM with One-Click Deployment

Extract Text with OCR

Train with ML Frameworks

Next Steps

Troubleshooting

Can't connect via SSH?

GPU not detected?