High output tokens in a structured output query #2693

jesuspascuall · 2025-10-15T10:43:16Z

jesuspascuall
Oct 15, 2025

Hello.
I've implemented a query to gpt5 API using a structured output (no more than 20 fields of less than 3 words each). When inspecting the token usage of my queries, I am using 1000 input tokens and 4000 output tokens for each query. Why am I getting that high output tokens?

create_payload = {
    "model": "gpt-5",
    "instructions": system_prompt,
    "input": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Extract the required fields and follow the JSON Schema.",
                },
                {"type": "input_file", "file_id": pdf_file.id},
            ],
        }
    ],
    "text": {
        "format": {
            "type": "json_schema",
            "name": "Extraction",
            "schema": schema_body,
            "strict": True,
        }
    },
}

yashwantbezawada · 2025-11-03T04:35:32Z

yashwantbezawada
Nov 3, 2025

High output token usage with structured outputs can happen for a few reasons:

Since you're using GPT-5, it might be generating internal reasoning tokens that count toward
your output but aren't visible in the final response. GPT-5 models can use a lot of tokens
"thinking" before producing the structured output.

Also, you're passing a PDF file with input_file. The model might be processing/analyzing that
file content internally which adds to token count even if your final output is just 20 fields.

Try adding some debugging to see what's actually happening:

  response = client.responses.create(**create_payload)

  print(f"Input tokens: {response.usage.input_tokens}")
  print(f"Output tokens: {response.usage.output_tokens}")
  print(f"Response length: {len(str(response.text))}")

You can also try capping the output with max_tokens:

  create_payload = {
      "model": "gpt-5",
      "instructions": system_prompt,
      "max_tokens": 500,  # Add this
      # ... rest of your config
  }

If you're still seeing 4000 tokens for such a small output, that does seem excessive. Could you
check what's actually in response.text? That might show if there's hidden content being
generated

0 replies

xXMrNidaXx · 2026-02-23T17:17:51Z

xXMrNidaXx
Feb 23, 2026

High output tokens in structured output — common issue! Here's why:

Causes:

Schema complexity

# Complex nested schema = more tokens
class Response(BaseModel):
    analysis: str  # Can be verbose
    items: list[Item]  # Each item adds tokens
    metadata: dict  # Unbounded

No length constraints

# Better: Add constraints
class Response(BaseModel):
    summary: str = Field(max_length=200)
    items: list[Item] = Field(max_items=10)

Prompt encouraging verbosity

# Add to prompt:
"Be concise. Limit analysis to 2-3 sentences."

Solutions:

# 1. Use max_tokens
response = client.chat.completions.create(
    ...,
    max_tokens=500,  # Hard limit
    response_format={"type": "json_object"}
)

# 2. Simplify schema
class ConciseResponse(BaseModel):
    answer: str = Field(max_length=100)
    confidence: float

# 3. Two-stage: generate then summarize
full = generate_full()
summary = summarize(full, max_tokens=200)

Debug:

print(f"Output tokens: {response.usage.completion_tokens}")

We optimize token usage at RevolutionAI. What's your schema structure?

0 replies

antoninche · 2026-02-25T17:46:00Z

antoninche
Feb 25, 2026

Hi!

The reason you’re seeing a high output token count (~4000) even though your schema has at most ~20 small fields is mostly related to how structured outputs are processed internally.

When you request a structured output with a JSON Schema, the model still has to:

Generate all intermediate reasoning and structure tokens that satisfy the JSON schema — it isn’t just the final JSON.
Structured outputs constrain the model to produce schema-compliant tokens, which involves parser-like decoding and additional tokens produced internally to fulfill schema constraints.
Reflect the JSON syntax faithfully — braces, field names, separators, and any nested values all count as tokens. Even if your schema is small, the JSON structure itself + schema enforcement overhead expands into many tokens before it’s even validated.
Depending on the model used, the model might also spend tokens on reasoning or internal decisions about how to structure the output even if not visible in the final JSON. Larger models sometimes allocate tokens “in reasoning space” that end up counted in output usage.

A couple of points to keep in mind:

Structured output doesn’t mean only the final JSON tokens are billed as output. All tokens generated during the constrained decoding process that result in schema compliance count toward output usage.
The OpenAI API token accounting includes everything the model emits for that request, and JSON syntax itself adds a non-trivial number of tokens even with short values.
If your schema is very deeply nested or uses long field names, that contributes to token usage too — even if the actual field values are small.

If you need to minimize output token use:

You can set a lower max_output_tokens limit — the model will try to truncate once that limit is hit (but may not fully satisfy complex schema).
Reduce schema verbosity (shorter field names, fewer optional fields).
Consider a simpler schema or use text.format instead of json_schema if strict schema enforcement isn’t critical.

Hope that helps explain why a seemingly small structured output request can still result in high output token usage!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High output tokens in a structured output query #2693

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

High output tokens in a structured output query #2693

Uh oh!

jesuspascuall Oct 15, 2025

Replies: 3 comments

Uh oh!

yashwantbezawada Nov 3, 2025

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

antoninche Feb 25, 2026

jesuspascuall
Oct 15, 2025

yashwantbezawada
Nov 3, 2025

xXMrNidaXx
Feb 23, 2026

antoninche
Feb 25, 2026