Typhoon OCR

Typhoon OCR is a vision-language OCR model for extracting text from Thai and English documents. Deploy via One-Click Deployment and use through the Typhoon OCR Playground.

Features

Thai and English Support: Native support for both languages
Structured Markdown Output: Clean markdown with proper formatting
HTML Tables: Tables rendered in HTML format
LaTeX Equations: Mathematical equations in LaTeX syntax
Figure Descriptions: Images and charts described in Thai
Checkbox Support: Handles form checkboxes

Deploying Typhoon OCR

Navigate to GPU Instance > Create Instance
Select the One-Click Deployment tab
Choose Typhoon-ocr1.5-2b from the preset models
Configure volume size (minimum 100 GB)
Click Create Instance

Model Information

Property	Value
Model	Typhoon-ocr1.5-2b
Provider	SCB10X
Capabilities	text, vision, typhoon-ocr
Size	6 GB
Recommended Image Size	1800px dimension

Using the OCR Playground

Navigate to GPU Instance > Instances
Click View on your Typhoon OCR instance (vLLM tag)
Select the Playground tab
Upload your document (PNG, JPG, or PDF)
View the extracted text in markdown format

Playground Interface

Setting	Description
Server Status	Health indicator with refresh button
Port	Port selector (default 3900)
Model	Shows `/model/typhoon-ai/typhoon-ocr1.5-2b`
Clear	Reset the current document

Supported File Types

Format	Notes
PNG	Best for screenshots and digital documents
JPG	Good for photos and scanned documents
PDF	Document files

Recommended: Use images with maximum 1800px dimension for optimal results.

API Usage

Access Typhoon OCR via the OpenAI-compatible API.

Python Example

from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://proxy-instance.float16.cloud/{instance_id}/3900/v1",
    api_key="not-needed"  # vLLM doesn't require API key
)

# Read and encode your image
def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

image_base64 = encode_image("your_document.png")

# Typhoon OCR prompt
prompt = """Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap in <figure>...</figure> with descriptions in Thai.
- Page Numbers: Wrap in <page_number>...</page_number>.
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""

response = client.chat.completions.create(
    model="/model/typhoon-ai/typhoon-ocr1.5-2b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_base64}"}
                }
            ]
        }
    ],
    max_tokens=10000,
    temperature=0,
    stream=True
)

# Stream the OCR result
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

cURL Example

# First, encode your image to base64:
# base64 -i your_document.png -o image_base64.txt

curl -X POST "https://proxy-instance.float16.cloud/{instance_id}/3900/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "/model/typhoon-ai/typhoon-ocr1.5-2b",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Extract all text from the image..."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<YOUR_BASE64_IMAGE>"
          }
        }
      ]
    }
  ],
  "max_tokens": 10000,
  "temperature": 0,
  "stream": true
}'

Output Format

Typhoon OCR outputs structured markdown with special formatting:

Tables

Tables are rendered in HTML format:

<table>
  <tr>
    <th>Header 1</th>
    <th>Header 2</th>
  </tr>
  <tr>
    <td>Data 1</td>
    <td>Data 2</td>
  </tr>
</table>

Equations

Mathematical equations use LaTeX syntax:

Inline: $E = mc^2$
Block: $$\int_0^\infty e^{-x^2} dx$$

Figures

Images and charts are wrapped with descriptions:

<figure>
  Description of the image/chart in Thai
</figure>

Page Numbers

Page numbers are marked with custom tags:

<page_number>1</page_number>

Checkboxes

Form checkboxes use Unicode characters:

Unchecked: ☐
Checked: ☑

Best Practices

For Best Results

Image Size: Use images with maximum 1800px dimension
Image Quality: Higher resolution produces better results
Lighting: Ensure even lighting without shadows
Orientation: Documents should be upright

API Parameters

Parameter	Recommended Value	Description
`max_tokens`	10000	Allow sufficient tokens for full extraction
`temperature`	0	Use deterministic output for consistent results
`stream`	true	Enable streaming for real-time output

Pricing

Typhoon OCR uses GPU Instance pricing:

Instance	On-Demand	Spot (Save 50%)	Storage
H100	$4.32/hr	$2.16/hr	$1.00/GB/mo

View current pricing at GPU Instance > Pricing.

Next Steps

LLM Deployment - Deploy other models
vLLM Playground - Test your models
AI Services Overview - Explore all AI capabilities