Documentation

Typhoon OCR

Extract text from Thai and English documents using vision-language OCR

Typhoon OCR

Typhoon OCR is a vision-language OCR model for extracting text from Thai and English documents. Deploy via One-Click Deployment and use through the Typhoon OCR Playground.

Features

  • Thai and English Support: Native support for both languages
  • Structured Markdown Output: Clean markdown with proper formatting
  • HTML Tables: Tables rendered in HTML format
  • LaTeX Equations: Mathematical equations in LaTeX syntax
  • Figure Descriptions: Images and charts described in Thai
  • Checkbox Support: Handles form checkboxes

Deploying Typhoon OCR

  1. Navigate to GPU Instance > Create Instance
  2. Select the One-Click Deployment tab
  3. Choose Typhoon-ocr1.5-2b from the preset models
  4. Configure volume size (minimum 100 GB)
  5. Click Create Instance

Model Information

Property Value
Model Typhoon-ocr1.5-2b
Provider SCB10X
Capabilities text, vision, typhoon-ocr
Size 6 GB
Recommended Image Size 1800px dimension

Using the OCR Playground

  1. Navigate to GPU Instance > Instances
  2. Click View on your Typhoon OCR instance (vLLM tag)
  3. Select the Playground tab
  4. Upload your document (PNG, JPG, or PDF)
  5. View the extracted text in markdown format

Playground Interface

Setting Description
Server Status Health indicator with refresh button
Port Port selector (default 3900)
Model Shows /model/typhoon-ai/typhoon-ocr1.5-2b
Clear Reset the current document

Supported File Types

Format Notes
PNG Best for screenshots and digital documents
JPG Good for photos and scanned documents
PDF Document files

Recommended: Use images with maximum 1800px dimension for optimal results.

API Usage

Access Typhoon OCR via the OpenAI-compatible API.

Python Example

from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://proxy-instance.float16.cloud/{instance_id}/3900/v1",
    api_key="not-needed"  # vLLM doesn't require API key
)

# Read and encode your image
def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

image_base64 = encode_image("your_document.png")

# Typhoon OCR prompt
prompt = """Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap in <figure>...</figure> with descriptions in Thai.
- Page Numbers: Wrap in <page_number>...</page_number>.
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""

response = client.chat.completions.create(
    model="/model/typhoon-ai/typhoon-ocr1.5-2b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_base64}"}
                }
            ]
        }
    ],
    max_tokens=10000,
    temperature=0,
    stream=True
)

# Stream the OCR result
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

cURL Example

# First, encode your image to base64:
# base64 -i your_document.png -o image_base64.txt

curl -X POST "https://proxy-instance.float16.cloud/{instance_id}/3900/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "/model/typhoon-ai/typhoon-ocr1.5-2b",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Extract all text from the image..."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<YOUR_BASE64_IMAGE>"
          }
        }
      ]
    }
  ],
  "max_tokens": 10000,
  "temperature": 0,
  "stream": true
}'

Output Format

Typhoon OCR outputs structured markdown with special formatting:

Tables

Tables are rendered in HTML format:

<table>
  <tr>
    <th>Header 1</th>
    <th>Header 2</th>
  </tr>
  <tr>
    <td>Data 1</td>
    <td>Data 2</td>
  </tr>
</table>

Equations

Mathematical equations use LaTeX syntax:

  • Inline: $E = mc^2$
  • Block: $$\int_0^\infty e^{-x^2} dx$$

Figures

Images and charts are wrapped with descriptions:

<figure>
  Description of the image/chart in Thai
</figure>

Page Numbers

Page numbers are marked with custom tags:

<page_number>1</page_number>

Checkboxes

Form checkboxes use Unicode characters:

  • Unchecked: ☐
  • Checked: ☑

Best Practices

For Best Results

  1. Image Size: Use images with maximum 1800px dimension
  2. Image Quality: Higher resolution produces better results
  3. Lighting: Ensure even lighting without shadows
  4. Orientation: Documents should be upright

API Parameters

Parameter Recommended Value Description
max_tokens 10000 Allow sufficient tokens for full extraction
temperature 0 Use deterministic output for consistent results
stream true Enable streaming for real-time output

Pricing

Typhoon OCR uses GPU Instance pricing:

Instance On-Demand Spot (Save 50%) Storage
H100 $4.32/hr $2.16/hr $1.00/GB/mo

View current pricing at GPU Instance > Pricing.

Next Steps

Tags:ocrdocumentthaiextractiontyphoonvision
Last updated: February 1, 20254 min read