7.5.4 Structured Output

Learning Objectives

Understand why structured output is very important for LLM applications
Learn how to design a simple but clear JSON output format
Understand field design, constraint instructions, and validation logic
Read a minimal closed loop from Prompt to JSON parsing
Distinguish the differences and relationship between “structured output” and “Function Calling”

Why Isn’t Natural Language Enough?

A Very Common Fragile Scenario

Suppose you want the model to identify user intent:

User input:

“I want to learn about the refund policy”

If the model returns:

“This user is probably asking about refunds; suggest routing to the refund module.”

A human can understand it. But it is hard for a program to use this text stably.

Because what the program really wants is:

{
  "intent": "refund_policy",
  "confidence": 0.92
}

What Is the Real Problem?

The problem is not that the model cannot answer, but that:

Natural-language output is too free-form, so programs have a hard time consuming it reliably.

So when the model’s output needs to be passed to:

the frontend
the backend
a workflow
a database

structured output almost becomes a must-have.

What Exactly Is Structured Output?

A Simple Definition

Structured output = making the model output results according to pre-agreed fields and format.

The most common formats include:

JSON
lists
tables
fixed-field objects

Why Is JSON the Most Common?

Because it satisfies all of these at the same time:

humans can read it
programs can parse it
the structure is clear

So in LLM applications, JSON is usually the first choice for structured output.

Terms you should understand before writing schemas

Term	Plain meaning	Practical use
JSON	A lightweight data format made of objects, arrays, strings, numbers, booleans, and null	It lets the model output something a program can parse with `json.loads()`
Schema	The expected shape of the output: field names, field types, allowed values, and required fields	It is the contract between the Prompt and the downstream program
Field	One named piece of data, such as `intent` or `confidence`	Stable field names let backend code read the result without guessing
Validation	Program checks that the output is parseable, complete, and typed correctly	It catches bad model output before it breaks the next workflow
Enum	A fixed set of allowed values, such as `refund_policy / certificate / other`	It prevents the model from inventing many similar labels

What Is the Most Core Design Point of Structured Output?

Keep Fields Few and Clear

A mistake beginners often make is:

designing 20 fields at the start
but each field has unstable meaning

A better principle is:

First use the fewest fields to express the most important result.

For example, for intent recognition:

{
  "intent": "refund_policy",
  "confidence": 0.92
}

is already enough.

Field Names Must Be Stable

If today it is called:

intent

tomorrow:

user_intent

and the day after:

task_type

then the program side will become more and more confused.

So one of the first principles of structured output is:

Field names must be stable.

A Minimal Runnable Example: From String JSON to Program Parsing

First Look at Minimal Parsing

import json

text = '{"intent": "refund_policy", "confidence": 0.92}'
data = json.loads(text)

print(data)
print("intent =", data["intent"])
print("confidence =", data["confidence"])

Expected output:

{'intent': 'refund_policy', 'confidence': 0.92}
intent = refund_policy
confidence = 0.92

This Code Is Simple, but Very Meaningful

It teaches you:

Structured output is not just “looking like JSON”; it must be truly parseable
After parsing, the program can stably retrieve fields

In other words, the value of structured output is not “better looking,” but:

The downstream program can actually use it.

A Smaller Example Closer to a Real Task: User Intent Recognition

Suppose You Ask the Model to Output This Structure

{
  "intent": "refund_policy",
  "needs_human": false,
  "confidence": 0.92
}

Simulated Model Output + Program Parsing

import json

mock_model_output = """
{
  "intent": "refund_policy",
  "needs_human": false,
  "confidence": 0.92
}
"""

data = json.loads(mock_model_output)

if data["intent"] == "refund_policy" and not data["needs_human"]:
    print("Enter the automatic refund policy processing flow")
else:
    print("Route to a human or another flow")

print(data)

Expected output:

Enter the automatic refund policy processing flow
{'intent': 'refund_policy', 'needs_human': False, 'confidence': 0.92}

This is already a typical use case of structured output in a real workflow.

How Should the Prompt Be Written So Structured Output Is More Stable?

Don’t Just Say “Please Output JSON”

A more stable way usually includes:

explicit field names
explicit field types
explicit instruction to output only JSON
explicit instruction not to add explanations

For example:

Please perform intent recognition based on the user input and strictly output JSON.

Field requirements:
- intent: string, possible values are refund_policy / certificate / other
- needs_human: boolean
- confidence: float, range 0 to 1

Do not output any extra explanation. Only output JSON.

Why Is This More Stable?

Because you are not just “stating a request,” but:

Defining an output contract for the model.

The clearer the contract, the more stable the result.

Why Do Structured Outputs Still Need Validation?

Because the Model Is Not a Compiler

Even if your prompt is written very well, the model may still:

miss fields
use the wrong type
output extra explanatory text
produce invalid JSON syntax

Structured output contract and validation loop

A Minimal Validation Example

import json

def validate_output(text):
    try:
        data = json.loads(text)
    except Exception:
        return False, "invalid_json"

    required = ["intent", "needs_human", "confidence"]
    for field in required:
        if field not in data:
            return False, f"missing_{field}"

    if not isinstance(data["intent"], str):
        return False, "intent_type_error"
    if not isinstance(data["needs_human"], bool):
        return False, "needs_human_type_error"
    if not isinstance(data["confidence"], (int, float)):
        return False, "confidence_type_error"

    return True, data

good = '{"intent":"refund_policy","needs_human":false,"confidence":0.92}'
bad = '{"intent":"refund_policy","confidence":"high"}'

print(validate_output(good))
print(validate_output(bad))

Expected output:

(True, {'intent': 'refund_policy', 'needs_human': False, 'confidence': 0.92})
(False, 'missing_needs_human')

Structured output validation result map

This step is especially important because it changes your system from:

“the model will probably output something like this”

to:

“the program clearly knows whether the output is valid”

What Is the Relationship Between Structured Output and Function Calling?

Similarity

They are both doing the same thing:

turning model output from free text into a format that programs can more easily receive.

Difference

Roughly speaking:

Structured output: broader, focused on “stable result format”
Function Calling: one step further, focused on “the output is a tool-calling intent”

For example:

Structured output: output a classification result JSON
Function Calling: output {name, arguments} to call a tool

So you can understand it like this:

Function Calling is a more execution-oriented form of structured output.

If Your Goal Is to Generate Fixed-Format Word / PPT, How Should the Schema Be Designed?

If your goal is to:

generate support triage reports
generate release review reports
generate documents with fixed sections

then the most important step in structured output is often not “telling the model to output JSON,” but first designing the schema clearly.

A minimal schema suitable for an incident-review report often looks like this:

{
  "title": "Password Reset Incident Review",
  "audience": "Support operations team",
  "objective": ["Identify root cause", "Define follow-up actions"],
  "sections": [
    {"type": "summary", "heading": "Incident Summary", "items": ["Users could not receive reset emails between 09:10 and 09:40"]},
    {"type": "evidence", "heading": "Evidence", "items": ["Email queue latency peaked at 14 minutes"]},
    {"type": "action", "heading": "Follow-up Actions", "items": ["Add queue-latency alert and publish status-page update template"]}
  ],
  "source_refs": [{"doc_id": "incident_042", "page_or_slide": 3}]
}

The most important thing for beginners to notice about this schema is:

More fields are not always better
Instead, the fields should be just enough to drive later template rendering and source tracing

The Most Common Pitfalls in Real Projects

Too Many Fields

The more fields you have, the easier it is for the model to make mistakes, and the more complex post-processing becomes.

Unstable Field Meaning

For example, if confidence sometimes means 0 to 1 and sometimes means a percentage, that design is very dangerous.

No Parsing or Validation

Many demos seem to work, but once connected to a program they break. The problem is usually here.

The Output Structure Is Detached from the Business Flow

If the JSON is complete but cannot directly drive the downstream flow, then structured output is not really serving the business.

Structured Output Acceptance Checklist

Structured output is not successful just because it “looks like JSON”; it must be stably consumable by the program. After designing a schema, you can use the checklist below to verify it.

Check Item	Passing Behavior	Common Problem
Parseable	`json.loads()` can parse it directly	Explanatory text appears before or after, JSON is not closed properly
Complete fields	All required fields are present	Missing fields, too many field-name variants
Correct types	Stable types such as string, boolean, number, array	`confidence` is sometimes a number and sometimes “high”
Controlled enum	Classification fields stay within allowed values	`intent` outputs many similar but inconsistent terms
Business usable	Output can directly drive the next process	JSON is complete, but the backend doesn’t know how to use it
Failure identifiable	The program can detect `invalid_json`, `missing_field`, `type_error`	All failures are only shown as “parse failed”

If this table is not passed, prioritize fixing the schema and validation logic, rather than repeatedly changing the Prompt wording.

Why Prompt Version Management Matters

When you start optimizing structured output, the Prompt itself should also have versions like code. Otherwise, it becomes hard to answer: which change improved the output, and which change introduced a new problem?

Field	Example	Purpose
`prompt_version`	`intent_schema_v2`	Marks the current Prompt version
`change_reason`	Add `needs_human` field	Explains why it was changed
`test_inputs`	20 fixed inputs	Compare stability with the same sample set
`pass_rate`	18/20	Record the structured output pass rate
`failure_cases`	2 missing-field cases	Keep evidence for the next optimization round

A simple record can look like this:

Version: intent_schema_v2
Change: Added the needs_human field, and required confidence to be a number from 0 to 1
Evaluation: 18 out of 20 test inputs passed parsing and validation
Failures: 2 outputs used confidence="high"
Conclusion: Keep the field, but emphasize the confidence type in the prompt

This habit will turn Prompt engineering from “let’s try it” into “iterate with records.”

How to Record Structured Output Failure Samples

It is recommended to record failure samples by type, rather than only saying “the model did not follow the format.”

Failure Type	Example	Fix Direction
`invalid_json`	Missing the right brace	Require outputting only JSON and add retry on parse failure
`missing_field`	Missing `needs_human`	Mark required fields in the field requirements
`type_error`	`confidence` is output as a string	Clarify the type and range
`enum_error`	`intent` outputs `refund` instead of `refund_policy`	Provide allowed values and forbid inventing categories
`extra_text`	Explanations are added before and after JSON	Explicitly forbid any extra explanation

The clearer the failure samples, the easier regression testing becomes later. In real projects, the stability of structured output is often not guaranteed by one perfect Prompt, but by schema, validation, failure logging, and regression samples working together.

Evidence to Keep

Keep this page’s proof of learning as a small evidence card:

Schema: required fields and allowed types
Parser: output is parsed, not trusted visually
Valid Case: one output accepted by validation
Invalid Case: missing field or wrong type rejected
Repair Rule: retry, fallback, or ask for clarification

Summary

The most important thing in this section is not memorizing JSON syntax, but understanding:

The essence of structured output is turning the model’s answer into an intermediate result that programs can consume reliably.

When you start connecting models into real systems, this is often more important than “making the answer prettier.”

Exercises

Design a JSON output format for a “course Q&A routing” task, and include at least intent, confidence, and needs_human.
Intentionally construct a JSON object with a missing field and see whether the validator can catch it.
Think about it: when should you use structured output, and when is plain natural language enough?
Explain in your own words: why is structured output a key step in the engineering transformation of Prompt engineering?

Solution approach and explanation

A reasonable JSON shape is {"intent": "billing|course_help|technical_issue|other", "confidence": 0.0, "needs_human": false, "reason": "short explanation"}.
If intent, confidence, or needs_human is required, a missing field should fail validation. That failure is the point: bad output should be caught before it reaches product logic.
Use structured output when another program must route, store, score, or trigger actions from the answer. Natural language is enough when the answer is only for human reading.
Structured output turns a prompt response into an interface contract. That is what lets prompt work become testable, automatable, and maintainable.