High output tokens in a structured output query #2693
Replies: 3 comments
-
|
High output token usage with structured outputs can happen for a few reasons: Since you're using GPT-5, it might be generating internal reasoning tokens that count toward Also, you're passing a PDF file with input_file. The model might be processing/analyzing that Try adding some debugging to see what's actually happening: You can also try capping the output with max_tokens: If you're still seeing 4000 tokens for such a small output, that does seem excessive. Could you |
Beta Was this translation helpful? Give feedback.
-
|
High output tokens in structured output — common issue! Here's why: Causes:
# Complex nested schema = more tokens
class Response(BaseModel):
analysis: str # Can be verbose
items: list[Item] # Each item adds tokens
metadata: dict # Unbounded
# Better: Add constraints
class Response(BaseModel):
summary: str = Field(max_length=200)
items: list[Item] = Field(max_items=10)
# Add to prompt:
"Be concise. Limit analysis to 2-3 sentences."Solutions: # 1. Use max_tokens
response = client.chat.completions.create(
...,
max_tokens=500, # Hard limit
response_format={"type": "json_object"}
)
# 2. Simplify schema
class ConciseResponse(BaseModel):
answer: str = Field(max_length=100)
confidence: float
# 3. Two-stage: generate then summarize
full = generate_full()
summary = summarize(full, max_tokens=200)Debug: print(f"Output tokens: {response.usage.completion_tokens}")We optimize token usage at RevolutionAI. What's your schema structure? |
Beta Was this translation helpful? Give feedback.
-
|
Hi! The reason you’re seeing a high output token count (~4000) even though your schema has at most ~20 small fields is mostly related to how structured outputs are processed internally. When you request a structured output with a JSON Schema, the model still has to:
A couple of points to keep in mind:
If you need to minimize output token use:
Hope that helps explain why a seemingly small structured output request can still result in high output token usage! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello.
I've implemented a query to gpt5 API using a structured output (no more than 20 fields of less than 3 words each). When inspecting the token usage of my queries, I am using 1000 input tokens and 4000 output tokens for each query. Why am I getting that high output tokens?
Beta Was this translation helpful? Give feedback.
All reactions