Typhoon OCR
Typhoon OCR is a vision-language OCR model for extracting text from Thai and English documents. Deploy via One-Click Deployment and use through the Typhoon OCR Playground.
Features
- Thai and English Support: Native support for both languages
- Structured Markdown Output: Clean markdown with proper formatting
- HTML Tables: Tables rendered in HTML format
- LaTeX Equations: Mathematical equations in LaTeX syntax
- Figure Descriptions: Images and charts described in Thai
- Checkbox Support: Handles form checkboxes
Deploying Typhoon OCR
- Navigate to GPU Instance > Create Instance
- Select the One-Click Deployment tab
- Choose Typhoon-ocr1.5-2b from the preset models
- Configure volume size (minimum 100 GB)
- Click Create Instance
Model Information
| Property | Value |
|---|---|
| Model | Typhoon-ocr1.5-2b |
| Provider | SCB10X |
| Capabilities | text, vision, typhoon-ocr |
| Size | 6 GB |
| Recommended Image Size | 1800px dimension |
Using the OCR Playground
- Navigate to GPU Instance > Instances
- Click View on your Typhoon OCR instance (vLLM tag)
- Select the Playground tab
- Upload your document (PNG, JPG, or PDF)
- View the extracted text in markdown format
Playground Interface
| Setting | Description |
|---|---|
| Server Status | Health indicator with refresh button |
| Port | Port selector (default 3900) |
| Model | Shows /model/typhoon-ai/typhoon-ocr1.5-2b |
| Clear | Reset the current document |
Supported File Types
| Format | Notes |
|---|---|
| PNG | Best for screenshots and digital documents |
| JPG | Good for photos and scanned documents |
| Document files |
Recommended: Use images with maximum 1800px dimension for optimal results.
API Usage
Access Typhoon OCR via the OpenAI-compatible API.
Python Example
from openai import OpenAI
import base64
client = OpenAI(
base_url="https://proxy-instance.float16.cloud/{instance_id}/3900/v1",
api_key="not-needed" # vLLM doesn't require API key
)
# Read and encode your image
def encode_image(image_path: str) -> str:
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
image_base64 = encode_image("your_document.png")
# Typhoon OCR prompt
prompt = """Extract all text from the image.
Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.
Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap in <figure>...</figure> with descriptions in Thai.
- Page Numbers: Wrap in <page_number>...</page_number>.
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""
response = client.chat.completions.create(
model="/model/typhoon-ai/typhoon-ocr1.5-2b",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}
]
}
],
max_tokens=10000,
temperature=0,
stream=True
)
# Stream the OCR result
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
cURL Example
# First, encode your image to base64:
# base64 -i your_document.png -o image_base64.txt
curl -X POST "https://proxy-instance.float16.cloud/{instance_id}/3900/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "/model/typhoon-ai/typhoon-ocr1.5-2b",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract all text from the image..."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,<YOUR_BASE64_IMAGE>"
}
}
]
}
],
"max_tokens": 10000,
"temperature": 0,
"stream": true
}'
Output Format
Typhoon OCR outputs structured markdown with special formatting:
Tables
Tables are rendered in HTML format:
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>Data 1</td>
<td>Data 2</td>
</tr>
</table>
Equations
Mathematical equations use LaTeX syntax:
- Inline:
$E = mc^2$ - Block:
$$\int_0^\infty e^{-x^2} dx$$
Figures
Images and charts are wrapped with descriptions:
<figure>
Description of the image/chart in Thai
</figure>
Page Numbers
Page numbers are marked with custom tags:
<page_number>1</page_number>
Checkboxes
Form checkboxes use Unicode characters:
- Unchecked: ☐
- Checked: ☑
Best Practices
For Best Results
- Image Size: Use images with maximum 1800px dimension
- Image Quality: Higher resolution produces better results
- Lighting: Ensure even lighting without shadows
- Orientation: Documents should be upright
API Parameters
| Parameter | Recommended Value | Description |
|---|---|---|
max_tokens |
10000 | Allow sufficient tokens for full extraction |
temperature |
0 | Use deterministic output for consistent results |
stream |
true | Enable streaming for real-time output |
Pricing
Typhoon OCR uses GPU Instance pricing:
| Instance | On-Demand | Spot (Save 50%) | Storage |
|---|---|---|---|
| H100 | $4.32/hr | $2.16/hr | $1.00/GB/mo |
View current pricing at GPU Instance > Pricing.
Next Steps
- LLM Deployment - Deploy other models
- vLLM Playground - Test your models
- AI Services Overview - Explore all AI capabilities