Skip to content

9.3.2 Function Calling Explained

  • Understand the full engineering pipeline of Function Calling
  • Learn how to design more robust tool schemas
  • Understand parameter validation, failure handling, and error recovery
  • Read a small closed loop of multi-step tool calls
  • Distinguish what “the model decides” and what “the program executes”

Why dig deeper into Function Calling separately?

Section titled “Why dig deeper into Function Calling separately?”

The beginner version only solves “can it call?”

Section titled “The beginner version only solves “can it call?””

The simplest function calling system only requires:

  • The model chooses the right tool
  • The parameters are roughly correct

That is usually enough in the demo stage.

Once it goes live, harder problems appear immediately

Section titled “Once it goes live, harder problems appear immediately”

For example:

  • There are many tools, and the model often chooses the wrong one
  • Parameters are often missing fields
  • Some calls must be protected by permission checks
  • How do you recover after a tool failure?
  • How do you avoid infinite loops in multi-step calls?

So real Function Calling is not just a JSON structure, but an engineering mechanism.


The standard closed loop of Function Calling

Section titled “The standard closed loop of Function Calling”
flowchart LR
A["User input"] --> B["Model selects tool and outputs parameters"]
B --> C["Program validates whether the call is legal"]
C --> D["Execute the tool"]
D --> E["Get the result"]
E --> F["Continue response / continue next step"]
style A fill:#e3f2fd,stroke:#1565c0,color:#333
style B fill:#fff3e0,stroke:#e65100,color:#333
style C fill:#f3e5f5,stroke:#6a1b9a,color:#333
style D fill:#fffde7,stroke:#f9a825,color:#333
style E fill:#e8f5e9,stroke:#2e7d32,color:#333
style F fill:#ffebee,stroke:#c62828,color:#333
StepResponsible party
Decide whether to call a toolModel
Output the call structureModel
Validate whether the parameters are legalProgram
Execute the toolProgram
Continue to the next step based on the resultModel / workflow / Agent scheduler

This is a very important boundary:

The model is responsible for “decision-making,” and the program is responsible for ensuring safe and stable execution.

Function Calling schema validation and execution guardrail map


Why does schema design directly affect results?

Section titled “Why does schema design directly affect results?”
bad_schema = {
"name": "search",
"description": "Do some queries",
"parameters": {
"q": {"type": "string"}
}
}
print(bad_schema)

The problems with this schema are:

  • The tool name is too vague
  • The description is too empty
  • The parameter semantics are unclear

When the model sees a schema like this, it can easily get confused.

good_schema = {
"name": "search_course_policy",
"description": "Query policy-related course documents, such as refunds, certificates, and learning order",
"parameters": {
"keyword": {
"type": "string",
"description": "The topic keyword to search for, such as refund or certificate"
}
},
"required": ["keyword"]
}
print(good_schema)

A better schema usually has:

  • Clear tool names
  • Specific descriptions
  • Semantic parameter names
  • Explicit required fields

Parameter validation: you cannot treat the model as a perfectly reliable caller

Section titled “Parameter validation: you cannot treat the model as a perfectly reliable caller”
tool_call = {
"name": "search_course_policy",
"arguments": {}
}

If you execute it directly:

search_course_policy(**tool_call["arguments"])

the program will likely fail.

def validate_tool_call(call):
if "name" not in call:
return False, "missing_name"
if "arguments" not in call:
return False, "missing_arguments"
if call["name"] == "search_course_policy":
args = call["arguments"]
if "keyword" not in args:
return False, "missing_keyword"
if not isinstance(args["keyword"], str):
return False, "keyword_must_be_string"
return True, "ok"
print(validate_tool_call({"name": "search_course_policy", "arguments": {"keyword": "refund"}}))
print(validate_tool_call({"name": "search_course_policy", "arguments": {}}))

Expected output:

Terminal window
(True, 'ok')
(False, 'missing_keyword')

This step is not a “nice-to-have”; it is a basic line of defense for any production system.


import ast
import operator
OPS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
}
def safe_calculate(expression):
def visit(node):
if isinstance(node, ast.Expression):
return visit(node.body)
if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
return node.value
if isinstance(node, ast.BinOp) and type(node.op) in OPS:
return OPS[type(node.op)](visit(node.left), visit(node.right))
if isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
return -visit(node.operand)
raise ValueError("unsupported_expression")
return visit(ast.parse(expression, mode="eval"))
def search_course_policy(keyword):
docs = {
"refund": "You can apply for a refund within 7 days after purchase and if your learning progress is below 20%.",
"certificate": "You can obtain a certificate after completing all required items and passing the final assessment."
}
return docs.get(keyword, "No related policy found")
def calculate(expression):
return str(safe_calculate(expression))
def validate_tool_call(call):
if "name" not in call or "arguments" not in call:
return False, "invalid_call_structure"
if call["name"] == "search_course_policy":
args = call["arguments"]
if "keyword" not in args or not isinstance(args["keyword"], str):
return False, "invalid_policy_arguments"
if call["name"] == "calculate":
args = call["arguments"]
if "expression" not in args or not isinstance(args["expression"], str):
return False, "invalid_calculate_arguments"
return True, "ok"
def dispatch(call):
if call["name"] == "search_course_policy":
return search_course_policy(**call["arguments"])
if call["name"] == "calculate":
return calculate(**call["arguments"])
return "unknown_tool"

Simulate “the model decides the tool call”

Section titled “Simulate “the model decides the tool call””
def mock_model(user_query):
if "refund" in user_query:
return {
"name": "search_course_policy",
"arguments": {"keyword": "refund"}
}
if "certificate" in user_query:
return {
"name": "search_course_policy",
"arguments": {"keyword": "certificate"}
}
if "calculate" in user_query:
return {
"name": "calculate",
"arguments": {"expression": user_query.replace("calculate", "").strip()}
}
return None
queries = [
"What is the refund policy?",
"How do I get a certificate?",
"calculate 12 * (3 + 2)"
]
for q in queries:
print("User question:", q)
call = mock_model(q)
print("Model output:", call)
valid, msg = validate_tool_call(call)
print("Validation result:", valid, msg)
if valid:
result = dispatch(call)
print("Tool execution result:", result)
else:
print("Call rejected")
print("-" * 50)

Expected output:

Terminal window
User question: What is the refund policy?
Model output: {'name': 'search_course_policy', 'arguments': {'keyword': 'refund'}}
Validation result: True ok
Tool execution result: You can apply for a refund within 7 days after purchase and if your learning progress is below 20%.
--------------------------------------------------
User question: How do I get a certificate?
Model output: {'name': 'search_course_policy', 'arguments': {'keyword': 'certificate'}}
Validation result: True ok
Tool execution result: You can obtain a certificate after completing all required items and passing the final assessment.
--------------------------------------------------
User question: calculate 12 * (3 + 2)
Model output: {'name': 'calculate', 'arguments': {'expression': '12 * (3 + 2)'}}
Validation result: True ok
Tool execution result: 60
--------------------------------------------------

Function Calling closed-loop run result map

This example is already much closer to a real system than simply printing tool_call.


Where is the real challenge in multi-step calls?

Section titled “Where is the real challenge in multi-step calls?”

The challenge is not calling once more, but managing state

Section titled “The challenge is not calling once more, but managing state”

For example, a user asks:

“First check the refund policy, then check the certificate policy.”

At this point, the system may need to:

  1. Call search_course_policy
  2. Then call search_course_policy again with another keyword
  3. Finally merge the answer

The problem is:

  • How do you store intermediate results?
  • When should the next step stop?
  • How do you handle errors?
def multi_step_agent(query):
steps = []
if "refund" in query:
call_1 = {"name": "search_course_policy", "arguments": {"keyword": "refund"}}
steps.append(("tool_call", call_1))
result_1 = dispatch(call_1)
steps.append(("tool_result", result_1))
if "certificate" in query:
call_2 = {"name": "search_course_policy", "arguments": {"keyword": "certificate"}}
steps.append(("tool_call", call_2))
result_2 = dispatch(call_2)
steps.append(("tool_result", result_2))
return steps
for step in multi_step_agent("First check the refund policy, then check the certificate policy"):
print(step)

Expected output:

Terminal window
('tool_call', {'name': 'search_course_policy', 'arguments': {'keyword': 'refund'}})
('tool_result', 'You can apply for a refund within 7 days after purchase and if your learning progress is below 20%.')
('tool_call', {'name': 'search_course_policy', 'arguments': {'keyword': 'certificate'}})
('tool_result', 'You can obtain a certificate after completing all required items and passing the final assessment.')

This is why, once Function Calling goes deeper, it will eventually be combined with Agents.


Why are failure handling and recovery important?

Section titled “Why are failure handling and recovery important?”

Tool failure is the norm, not the exception

Section titled “Tool failure is the norm, not the exception”

In real systems, tool failures are very common:

  • Wrong parameters
  • API timeouts
  • Network issues
  • Empty data
def safe_dispatch(call):
try:
valid, msg = validate_tool_call(call)
if not valid:
return {"error": msg}
return {"result": dispatch(call)}
except Exception as e:
return {"error": str(e)}
print(safe_dispatch({"name": "calculate", "arguments": {"expression": "2 + 3"}}))
print(safe_dispatch({"name": "calculate", "arguments": {"wrong": "2 + 3"}}))

Expected output:

Terminal window
{'result': '5'}
{'error': 'invalid_calculate_arguments'}

A mature system usually does not crash just because one tool call fails.


What should you really pay attention to in the deeper end of Function Calling?

Section titled “What should you really pay attention to in the deeper end of Function Calling?”

It is not about “can it call,” but “can it call reliably”

Section titled “It is not about “can it call,” but “can it call reliably””

The truly important questions include:

  • Is the schema clear enough?
  • Is parameter validation strict enough?
  • Are tool permissions layered properly?
  • How do multi-step calls converge?
  • Can errors be replayed and diagnosed?

The tool layer is the reliability foundation of Agent engineering

Section titled “The tool layer is the reliability foundation of Agent engineering”

If the tool layer is unstable, everything above it will wobble:

  • Reasoning chains
  • Multi-step execution
  • Memory systems
  • Multi-Agent collaboration

So although Function Calling looks like “structured output,” in essence it is a key piece of infrastructure in Agent engineering.


A schema is not decorative documentation; it is the call boundary itself.

This is very dangerous.

Once something is called incorrectly, a parameter is wrong, or execution blows up, you will not know where the problem came from.


When designing tools for real, you can first review the schema with the table below. It helps reduce problems such as “the model chose the wrong tool, the parameters were wrong, or the program executed something dangerous.”

Check itemWhat good looks likeCommon problems
Tool nameClear verb + object, such as search_course_policyNames like search or process are too vague
DescriptionClearly states when to use it and when not toOnly says “query information”
ParametersEvery field has semantics, type, and constraintsq, data, input are too vague
Required fieldsrequired is explicitThe model omits critical parameters
Return valueThe program can distinguish success, failure, and empty resultsOnly returns a string, making status hard to judge
PermissionsDistinguish read-only, write, and high-risk toolsAll tool permissions are mixed together

The clearer the tool schema, the more likely the model is to make the right choice; the stricter the program validation, the less likely the system is to be dragged down by bad parameters.

Many people only design input parameters and ignore return values. But what the Agent does next depends heavily on whether it can understand the tool result.

A more robust return structure can look like this:

def tool_result(ok, data=None, error=None, retryable=False):
return {
"ok": ok,
"data": data,
"error": error,
"retryable": retryable,
}
print(tool_result(True, data={"text": "You can apply for a refund within 7 days after purchase"}))
print(tool_result(False, error="timeout", retryable=True))

Expected output:

Terminal window
{'ok': True, 'data': {'text': 'You can apply for a refund within 7 days after purchase'}, 'error': None, 'retryable': False}
{'ok': False, 'data': None, 'error': 'timeout', 'retryable': True}

This structure is more suitable for Agents than simply returning a string, because the system can decide whether to continue, retry, switch tools, or stop and explain to the user based on ok, error, and retryable.

Safety boundaries for multi-step tool calls

Section titled “Safety boundaries for multi-step tool calls”

The easiest problem to run into with multi-step calls is not “failing to continue,” but “not being able to stop” or “doing something it should not do.” So at minimum, you need these boundaries:

BoundaryPurpose
Max stepsPrevent infinite loops
Tool allowlistPrevent unauthorized capabilities from being called
Parameter validationPrevent bad or dangerous parameters from reaching the execution layer
Human confirmationHigh-risk write operations must be confirmed first
Error classificationDistinguish retryable from non-retryable errors
Trace loggingEnable replay of every step after an error

You can remember it with one sentence:

The model may suggest actions, but the program must control the boundaries.

This is also the key dividing line between Function Calling demos and real Agent engineering.


Keep this page’s proof of learning as a small evidence card:

Tool Contract
name, description, input schema, output schema
Permission
what the tool is allowed to read or change
Call Trace
arguments, result, error, retry or fallback
Failure Check
wrong tool, bad arguments, unsafe action, or missing observation
Safety Action
validate, confirm, sandbox, rate-limit, or rollback

The most important thing in this section is not learning to write {"name": ..., "arguments": ...}, but understanding that:

The real value of Function Calling lies in safely connecting the model’s decision-making ability to the execution ability of an engineering system.

Once you start paying attention to schema design, parameter validation, failure recovery, and multi-step state, you are truly entering tool-layer engineering.


  1. Add another tool get_weather(city) to the tool system in this section, and include the corresponding schema and validation.
  2. Deliberately construct a tool call with incorrect parameters and see whether the validator can block it.
  3. Extend multi_step_agent() so that it executes at most 3 steps to avoid infinite loops.
  4. Think about this: why is Function Calling more critical in an Agent system than in a normal chatbot?
Reference implementation and walkthrough
  1. Define get_weather(city) with a required string city, a predictable return shape such as {ok, data, error}, and validation before execution.
  2. The validator should block missing city, non-string values, unknown parameters, and malformed JSON before the tool runner sees the call.
  3. A 3-step limit should stop the loop with a clear trace such as stopped: max_steps_reached, not silently continue.
  4. Function Calling matters more in Agents because tool calls can change external state, spend money, leak data, or start loops. Schema and validation are the safety boundary between text and action.