Structured Outputs

Structured Outputs enable LLMs to generate responses that conform to specific formats. Test structured outputs in the vLLM Playground or integrate via the OpenAI-compatible API.

Overview

With structured outputs, you can:

Guarantee valid JSON that matches your schema
Constrain outputs to match regex patterns
Limit responses to specific choices
Eliminate parsing errors in your applications

Supported Models

Deploy models with structured output support via GPU Instance > Create Instance > One-Click Deployment:

Model	Provider	Capabilities
GPT-OSS-120B	OpenAI	text, reasoning, tools, grammar
GPT-OSS-20B	OpenAI	text, reasoning, tools, grammar
Qwen3-VL-235B-A22B-Instruct	Alibaba	text, vision, tools, grammar
Qwen3-VL-30B-A3B-Instruct	Alibaba	text, vision, tools, grammar
Qwen3-VL-32B-Instruct	Alibaba	text, vision, tools, grammar
Qwen3-VL-8B-Instruct	Alibaba	text, vision, tools, grammar
Llama 3.3 70B Instruct	Meta	text, tools, grammar
Typhoon2.5-qwen3-30b-a3b	SCB10X	text, tools, grammar
GLM 4.7 Flash	ZAI	text, tools, grammar

Testing in vLLM Playground

Test structured outputs with pre-configured format presets in the Playground.

Accessing Structured Outputs

Deploy a model with grammar support
Navigate to GPU Instance > Instances
Click View on your vLLM instance
Select the Playground tab
Toggle Structured Output to ON

Output Format Types

Type	Description
JSON Schema	Define object structure with properties and types
Regex	Constrain output to match a regex pattern
Choice	Limit output to specific options

Available Presets

The Playground includes pre-configured format presets:

Preset	Type	Description
Person Info	JSON Schema	Extract name, age, occupation, email
Sentiment Analysis	JSON Schema	Analyze sentiment with confidence score
Product Review	JSON Schema	Extract product review details
Simple JSON	JSON Schema	Basic key-value structure
Email Pattern	Regex	Match email format
Yes/No Choice	Choice	Binary response constraint
Rating Choice	Choice	Rating scale constraint

Try It

"Extract info: John Smith is a 32-year-old software engineer at john.smith@example.com"
"Analyze sentiment: This product is amazing, I love it!"
"Rate this: The service was okay but could be better"

API Usage

Integrate structured outputs via the OpenAI-compatible API.

JSON Schema Format

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="https://proxy-instance.float16.cloud/{instance_id}/3900/v1"
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {"role": "user", "content": "Extract info: John is 30 years old and works as an engineer"}
    ],
    extra_body={
        "guided_json": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"},
                "occupation": {"type": "string"}
            },
            "required": ["name", "age", "occupation"]
        }
    }
)

print(response.choices[0].message.content)
# {"name": "John", "age": 30, "occupation": "engineer"}

Regex Pattern

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {"role": "user", "content": "Generate an email address for John Smith"}
    ],
    extra_body={
        "guided_regex": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    }
)

print(response.choices[0].message.content)
# john.smith@example.com

Choice Constraint

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {"role": "user", "content": "Is this review positive or negative: 'Great product!'"}
    ],
    extra_body={
        "guided_choice": ["positive", "negative", "neutral"]
    }
)

print(response.choices[0].message.content)
# positive

Schema Examples

Sentiment Analysis

schema = {
    "type": "object",
    "properties": {
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"]
        },
        "confidence": {
            "type": "number"
        },
        "summary": {
            "type": "string"
        }
    },
    "required": ["sentiment", "confidence"]
}

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {"role": "user", "content": "Analyze: This product exceeded my expectations!"}
    ],
    extra_body={"guided_json": schema}
)

Entity Extraction

schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "type": {
                        "type": "string",
                        "enum": ["person", "organization", "location", "date"]
                    }
                },
                "required": ["text", "type"]
            }
        }
    },
    "required": ["entities"]
}

Product Review

schema = {
    "type": "object",
    "properties": {
        "product_name": {"type": "string"},
        "rating": {"type": "integer"},
        "pros": {
            "type": "array",
            "items": {"type": "string"}
        },
        "cons": {
            "type": "array",
            "items": {"type": "string"}
        },
        "recommendation": {"type": "boolean"}
    },
    "required": ["product_name", "rating", "recommendation"]
}

Best Practices

Keep Schemas Simple

Start with minimal required fields:

# Good - Start simple
schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["answer"]
}

Use Enums for Categorical Values

# Good - Use enums
"status": {
    "type": "string",
    "enum": ["pending", "approved", "rejected"]
}

Handle Optional Fields

{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "nickname": {"type": ["string", "null"]}
    },
    "required": ["name"]
}

Comparison of Output Formats

Feature	JSON Schema	Regex	Choice
Structured data	Yes	No	No
Pattern matching	Limited	Yes	No
Fixed options	Via enum	No	Yes
Nested objects	Yes	No	No
Arrays	Yes	No	No

Pricing

Structured outputs use GPU Instance pricing:

Instance	On-Demand	Spot (Save 50%)	Storage
H100	$4.32/hr	$2.16/hr	$1.00/GB/mo

View current pricing at GPU Instance > Pricing.

Next Steps

vLLM Playground - Test structured outputs interactively
Tool Calling - Implement function calling
LLM Deployment - Deploy models with grammar support