Skip to content

7.5.4 Structured Output

  • Understand why structured output is very important for LLM applications
  • Learn how to design a simple but clear JSON output format
  • Understand field design, constraint instructions, and validation logic
  • Read a minimal closed loop from Prompt to JSON parsing
  • Distinguish the differences and relationship between “structured output” and “Function Calling”

Suppose you want the model to identify user intent:

User input:

“I want to learn about the refund policy”

If the model returns:

“This user is probably asking about refunds; suggest routing to the refund module.”

A human can understand it. But it is hard for a program to use this text stably.

Because what the program really wants is:

{
"intent": "refund_policy",
"confidence": 0.92
}

The problem is not that the model cannot answer, but that:

Natural-language output is too free-form, so programs have a hard time consuming it reliably.

So when the model’s output needs to be passed to:

  • the frontend
  • the backend
  • a workflow
  • a database

structured output almost becomes a must-have.


Structured output = making the model output results according to pre-agreed fields and format.

The most common formats include:

  • JSON
  • lists
  • tables
  • fixed-field objects

Because it satisfies all of these at the same time:

  • humans can read it
  • programs can parse it
  • the structure is clear

So in LLM applications, JSON is usually the first choice for structured output.

Terms you should understand before writing schemas

Section titled “Terms you should understand before writing schemas”
TermPlain meaningPractical use
JSONA lightweight data format made of objects, arrays, strings, numbers, booleans, and nullIt lets the model output something a program can parse with json.loads()
SchemaThe expected shape of the output: field names, field types, allowed values, and required fieldsIt is the contract between the Prompt and the downstream program
FieldOne named piece of data, such as intent or confidenceStable field names let backend code read the result without guessing
ValidationProgram checks that the output is parseable, complete, and typed correctlyIt catches bad model output before it breaks the next workflow
EnumA fixed set of allowed values, such as refund_policy / certificate / otherIt prevents the model from inventing many similar labels

What Is the Most Core Design Point of Structured Output?

Section titled “What Is the Most Core Design Point of Structured Output?”

A mistake beginners often make is:

  • designing 20 fields at the start
  • but each field has unstable meaning

A better principle is:

First use the fewest fields to express the most important result.

For example, for intent recognition:

{
"intent": "refund_policy",
"confidence": 0.92
}

is already enough.

If today it is called:

  • intent

tomorrow:

  • user_intent

and the day after:

  • task_type

then the program side will become more and more confused.

So one of the first principles of structured output is:

Field names must be stable.


A Minimal Runnable Example: From String JSON to Program Parsing

Section titled “A Minimal Runnable Example: From String JSON to Program Parsing”
import json
text = '{"intent": "refund_policy", "confidence": 0.92}'
data = json.loads(text)
print(data)
print("intent =", data["intent"])
print("confidence =", data["confidence"])

Expected output:

Terminal window
{'intent': 'refund_policy', 'confidence': 0.92}
intent = refund_policy
confidence = 0.92

It teaches you:

  1. Structured output is not just “looking like JSON”; it must be truly parseable
  2. After parsing, the program can stably retrieve fields

In other words, the value of structured output is not “better looking,” but:

The downstream program can actually use it.


A Smaller Example Closer to a Real Task: User Intent Recognition

Section titled “A Smaller Example Closer to a Real Task: User Intent Recognition”

Suppose You Ask the Model to Output This Structure

Section titled “Suppose You Ask the Model to Output This Structure”
{
"intent": "refund_policy",
"needs_human": false,
"confidence": 0.92
}
import json
mock_model_output = """
{
"intent": "refund_policy",
"needs_human": false,
"confidence": 0.92
}
"""
data = json.loads(mock_model_output)
if data["intent"] == "refund_policy" and not data["needs_human"]:
print("Enter the automatic refund policy processing flow")
else:
print("Route to a human or another flow")
print(data)

Expected output:

Terminal window
Enter the automatic refund policy processing flow
{'intent': 'refund_policy', 'needs_human': False, 'confidence': 0.92}

This is already a typical use case of structured output in a real workflow.


How Should the Prompt Be Written So Structured Output Is More Stable?

Section titled “How Should the Prompt Be Written So Structured Output Is More Stable?”

A more stable way usually includes:

  • explicit field names
  • explicit field types
  • explicit instruction to output only JSON
  • explicit instruction not to add explanations

For example:

Please perform intent recognition based on the user input and strictly output JSON.
Field requirements:
- intent: string, possible values are refund_policy / certificate / other
- needs_human: boolean
- confidence: float, range 0 to 1
Do not output any extra explanation. Only output JSON.

Because you are not just “stating a request,” but:

Defining an output contract for the model.

The clearer the contract, the more stable the result.


Why Do Structured Outputs Still Need Validation?

Section titled “Why Do Structured Outputs Still Need Validation?”

Even if your prompt is written very well, the model may still:

  • miss fields
  • use the wrong type
  • output extra explanatory text
  • produce invalid JSON syntax

Structured output contract and validation loop

import json
def validate_output(text):
try:
data = json.loads(text)
except Exception:
return False, "invalid_json"
required = ["intent", "needs_human", "confidence"]
for field in required:
if field not in data:
return False, f"missing_{field}"
if not isinstance(data["intent"], str):
return False, "intent_type_error"
if not isinstance(data["needs_human"], bool):
return False, "needs_human_type_error"
if not isinstance(data["confidence"], (int, float)):
return False, "confidence_type_error"
return True, data
good = '{"intent":"refund_policy","needs_human":false,"confidence":0.92}'
bad = '{"intent":"refund_policy","confidence":"high"}'
print(validate_output(good))
print(validate_output(bad))

Expected output:

Terminal window
(True, {'intent': 'refund_policy', 'needs_human': False, 'confidence': 0.92})
(False, 'missing_needs_human')

Structured output validation result map

This step is especially important because it changes your system from:

  • “the model will probably output something like this”

to:

  • “the program clearly knows whether the output is valid”

What Is the Relationship Between Structured Output and Function Calling?

Section titled “What Is the Relationship Between Structured Output and Function Calling?”

They are both doing the same thing:

turning model output from free text into a format that programs can more easily receive.

Roughly speaking:

  • Structured output: broader, focused on “stable result format”
  • Function Calling: one step further, focused on “the output is a tool-calling intent”

For example:

  • Structured output: output a classification result JSON
  • Function Calling: output {name, arguments} to call a tool

So you can understand it like this:

Function Calling is a more execution-oriented form of structured output.


If Your Goal Is to Generate Fixed-Format Word / PPT, How Should the Schema Be Designed?

Section titled “If Your Goal Is to Generate Fixed-Format Word / PPT, How Should the Schema Be Designed?”

If your goal is to:

  • generate support triage reports
  • generate release review reports
  • generate documents with fixed sections

then the most important step in structured output is often not “telling the model to output JSON,” but first designing the schema clearly.

A minimal schema suitable for an incident-review report often looks like this:

{
"title": "Password Reset Incident Review",
"audience": "Support operations team",
"objective": ["Identify root cause", "Define follow-up actions"],
"sections": [
{"type": "summary", "heading": "Incident Summary", "items": ["Users could not receive reset emails between 09:10 and 09:40"]},
{"type": "evidence", "heading": "Evidence", "items": ["Email queue latency peaked at 14 minutes"]},
{"type": "action", "heading": "Follow-up Actions", "items": ["Add queue-latency alert and publish status-page update template"]}
],
"source_refs": [{"doc_id": "incident_042", "page_or_slide": 3}]
}

The most important thing for beginners to notice about this schema is:

  • More fields are not always better
  • Instead, the fields should be just enough to drive later template rendering and source tracing

The more fields you have, the easier it is for the model to make mistakes, and the more complex post-processing becomes.

For example, if confidence sometimes means 0 to 1 and sometimes means a percentage, that design is very dangerous.

Many demos seem to work, but once connected to a program they break. The problem is usually here.

The Output Structure Is Detached from the Business Flow

Section titled “The Output Structure Is Detached from the Business Flow”

If the JSON is complete but cannot directly drive the downstream flow, then structured output is not really serving the business.


Structured output is not successful just because it “looks like JSON”; it must be stably consumable by the program. After designing a schema, you can use the checklist below to verify it.

Check ItemPassing BehaviorCommon Problem
Parseablejson.loads() can parse it directlyExplanatory text appears before or after, JSON is not closed properly
Complete fieldsAll required fields are presentMissing fields, too many field-name variants
Correct typesStable types such as string, boolean, number, arrayconfidence is sometimes a number and sometimes “high”
Controlled enumClassification fields stay within allowed valuesintent outputs many similar but inconsistent terms
Business usableOutput can directly drive the next processJSON is complete, but the backend doesn’t know how to use it
Failure identifiableThe program can detect invalid_json, missing_field, type_errorAll failures are only shown as “parse failed”

If this table is not passed, prioritize fixing the schema and validation logic, rather than repeatedly changing the Prompt wording.

When you start optimizing structured output, the Prompt itself should also have versions like code. Otherwise, it becomes hard to answer: which change improved the output, and which change introduced a new problem?

FieldExamplePurpose
prompt_versionintent_schema_v2Marks the current Prompt version
change_reasonAdd needs_human fieldExplains why it was changed
test_inputs20 fixed inputsCompare stability with the same sample set
pass_rate18/20Record the structured output pass rate
failure_cases2 missing-field casesKeep evidence for the next optimization round

A simple record can look like this:

Version
intent_schema_v2
Change
Added the needs_human field, and required confidence to be a number from 0 to 1
Evaluation
18 out of 20 test inputs passed parsing and validation
Failures
2 outputs used confidence="high"
Conclusion
Keep the field, but emphasize the confidence type in the prompt

This habit will turn Prompt engineering from “let’s try it” into “iterate with records.”

How to Record Structured Output Failure Samples

Section titled “How to Record Structured Output Failure Samples”

It is recommended to record failure samples by type, rather than only saying “the model did not follow the format.”

Failure TypeExampleFix Direction
invalid_jsonMissing the right braceRequire outputting only JSON and add retry on parse failure
missing_fieldMissing needs_humanMark required fields in the field requirements
type_errorconfidence is output as a stringClarify the type and range
enum_errorintent outputs refund instead of refund_policyProvide allowed values and forbid inventing categories
extra_textExplanations are added before and after JSONExplicitly forbid any extra explanation

The clearer the failure samples, the easier regression testing becomes later. In real projects, the stability of structured output is often not guaranteed by one perfect Prompt, but by schema, validation, failure logging, and regression samples working together.


Keep this page’s proof of learning as a small evidence card:

Schema
required fields and allowed types
Parser
output is parsed, not trusted visually
Valid Case
one output accepted by validation
Invalid Case
missing field or wrong type rejected
Repair Rule
retry, fallback, or ask for clarification

The most important thing in this section is not memorizing JSON syntax, but understanding:

The essence of structured output is turning the model’s answer into an intermediate result that programs can consume reliably.

When you start connecting models into real systems, this is often more important than “making the answer prettier.”


  1. Design a JSON output format for a “course Q&A routing” task, and include at least intent, confidence, and needs_human.
  2. Intentionally construct a JSON object with a missing field and see whether the validator can catch it.
  3. Think about it: when should you use structured output, and when is plain natural language enough?
  4. Explain in your own words: why is structured output a key step in the engineering transformation of Prompt engineering?
Solution approach and explanation
  1. A reasonable JSON shape is {"intent": "billing|course_help|technical_issue|other", "confidence": 0.0, "needs_human": false, "reason": "short explanation"}.
  2. If intent, confidence, or needs_human is required, a missing field should fail validation. That failure is the point: bad output should be caught before it reaches product logic.
  3. Use structured output when another program must route, store, score, or trigger actions from the answer. Natural language is enough when the answer is only for human reading.
  4. Structured output turns a prompt response into an interface contract. That is what lets prompt work become testable, automatable, and maintainable.