Skip to content

9 AI Agent and Intelligent Agent Systems

Main visual of the AI Agent system

Chapter 8 made the model answer from documents. Chapter 9 makes the system act toward a goal: plan a next step, call a tool, read the observation, adjust, stop safely, and leave a trace that people can review.

Do not start with multi-agent frameworks. Start with one small Agent that can show every step.

You have already built an LLM response loop and a RAG evidence loop. This chapter adds controlled action: the system decides the next step, calls allowed tools, reads observations, updates state, and stops with a replayable trace.

This is the final core application layer in the main route. After this chapter, Chapters 10-12 become product specializations, and Chapter 13 adds open-source model runtime ownership. Vision, NLP, multimodal workflows, and self-hosted LLMs can all plug into the same evidence, tool, trace, and safety habits.

Agent execution loop

An Agent is not “a chatbot with tools.” It is a controlled execution loop.

PartPlain meaningWhat you must control
GoalWhat the Agent is trying to finishscope, success criteria, stop condition
StateWhat is known right nowcurrent inputs, previous observations, remaining steps
PlanWhat to try nextstep limit, fallback path, human takeover
ToolExternal action such as search, file read, API call, code runschema, validation, whitelist, risk level
ObservationWhat the tool returnederror handling, retry rule, trust boundary
MemoryWhat should persist across steps or runsshort-term state versus long-term preference
TraceThe replayable record of the rungoal, action, arguments, observation, cost, final result

Build a single traceable Agent before multi-agent systems. Follow the core single-Agent path first: 9.1 -> 9.2 -> 9.3 -> 9.4 -> 9.8 -> 9.10. Treat MCP, frameworks, multi-agent systems, and deployment operations as advanced chapters after the single-Agent loop is stable.

StepReadDoEvidence to keep
9.1Agent basics and architectureExplain goal, state, plan, tool, observation, memoryone architecture sketch
9.2Reasoning and planningCompare ReAct and Plan-and-Execute on one taska step-by-step trace
9.3Tool callingDefine one or two tools with parameters and errorstools_schema.md
9.4MemorySeparate current state from long-term memorymemory boundary notes
9.8Evaluation and safetyScore outputs, block risky actions, and inspect tracestrace logs, safety block, eval cases
9.10Stage projectRun 9.10.5 Hands-on: Build a Traceable Single-Agent Assistantagent_traces.jsonl, safety boundary, eval cases
9.5MCP and protocolsUnderstand MCP plus agent handoff contracts as ways to connect tools, data sources, and peer agentsone capability card and integration note
9.6-9.7Frameworks and multi-agentStudy only after the single-Agent loop is stableframework choice note
9.9Deployment and operationsAdd runtime, recovery, cost, and production readiness after the core project workslaunch checklist and rollback note

Required core

Study the Single-Agent loop, tool schema, whitelist, max steps, state boundary, memory boundary, trace log, safety block, and evaluation cases. These are the minimum skills for an Agent that can be reviewed instead of merely demoed.

Optional extension

Return to MCP, framework comparison, multi-agent coordination, deployment operations, and cost optimization after the Single-Agent loop is stable and the product needs integration or scale.

Depth challenge

Compare the same task as a workflow, RAG flow, function call, and Agent trace. Then justify the simplest safe design so Agent use stays intentional rather than fashionable.

Fast pass: finish one traceable Agent loop with a blocked unsafe tool. Keep agent_traces.jsonl plus a short trace explanation.

Standard pass: complete the core path 9.1 -> 9.2 -> 9.3 -> 9.4 -> 9.8 -> 9.10. Keep tools_schema.md, safety_boundary.md, evaluation cases, and one failure trace.

Deep pass: add MCP, framework comparison, deployment readiness, or multi-agent coordination only after the Single-Agent loop works. Keep a design memo explaining why Agent is safer or more useful than workflow, RAG, or function calling.

A strong Chapter 9 output proves control. It should show what the Agent was allowed to do, what it was not allowed to do, how it stopped, and how a reviewer can replay the trace.

This offline script has no LLM dependency. It teaches the engineering habit: every action must be replayable. Later, replace the fixed plan with a model-generated plan, but keep the trace format.

Create ch09_agent_trace.py and run it with Python 3.10 or later.

import json
def search_docs(tool_input: dict) -> str:
return "Found notes about RAGOps, AgentOps, evaluation sets, and trace logs."
def make_todo(tool_input: dict) -> str:
topic = tool_input["topic"]
return f"1) Review {topic} notes; 2) add one eval case; 3) write failure notes."
TOOLS = {
"search_docs": {"fn": search_docs, "risk": "read_only"},
"make_todo": {"fn": make_todo, "risk": "draft_only"},
}
goal = "Prepare a short RAG review plan."
plan = [
{
"thought": "Find relevant course materials before making a plan.",
"action": "search_docs",
"input": {"query": "RAGOps AgentOps evaluation trace"},
},
{
"thought": "Turn the materials into a small review checklist.",
"action": "make_todo",
"input": {"topic": "RAG evaluation"},
},
]
trace = []
for step_number, step in enumerate(plan, start=1):
tool = TOOLS.get(step["action"])
if tool is None:
observation = "Blocked: tool is not whitelisted."
risk = "blocked"
else:
observation = tool["fn"](step["input"])
risk = tool["risk"]
trace.append(
{
"step": step_number,
"goal": goal,
"thought": step["thought"],
"action": step["action"],
"input": step["input"],
"risk": risk,
"observation": observation,
}
)
for item in trace:
print(json.dumps(item, ensure_ascii=False))

Expected output starts like this:

Terminal window
{"step": 1, "goal": "Prepare a short RAG review plan.", "thought": "Find relevant course materials before making a plan.", "action": "search_docs", ...
{"step": 2, "goal": "Prepare a short RAG review plan.", "thought": "Turn the materials into a small review checklist.", "action": "make_todo", ...

Operation tip: change make_todo to a non-whitelisted tool name such as send_email. The script should block it. This is the smallest version of a safety boundary.

LevelWhat you can prove
Minimum passYou can run one trace and explain each goal, action, input, observation, and result.
Project-readyYou can define tool schemas, block non-whitelisted tools, set max steps, and save failed traces.
Deeper checkYou can decide when a workflow is safer than an Agent, and where human approval belongs for risky actions.

Choose Agent, Workflow, RAG, Or Function Calling

Section titled “Choose Agent, Workflow, RAG, Or Function Calling”

Agent boundary map

Agents are powerful, but they are not the default solution.

ProblemStart withUse an Agent when
Steps are fixed and knownWorkflowthe route must change after each observation
Answer needs private or fresh knowledgeRAGretrieval is only one step inside a larger goal
One structured action is enoughFunction Callingmultiple tool calls and state updates are required
Task is high riskWorkflow with human approvalthe Agent can draft, but humans must confirm risky actions
Exploration needs planning, tools, memory, and recoveryAgentyou can log every step and stop safely
  • Building multi-agent before a single Agent is stable.
  • Calling tools without schema, validation, or useful error messages.
  • Missing stop conditions, which causes loops and cost spikes.
  • Letting high-risk tools run without human confirmation.
  • Showing only a successful demo while hiding failed traces.
  • Using memory as a dumping ground instead of separating current state, long-term preference, and task history.

Keep this page’s proof of learning as a small evidence card:

Core Route
9.1 9.2 9.3 9.4 9.8 9.10 first
Agent Loop
goal plan tool/action observation memory evaluation
Trace Rule
every action should leave input, output, decision, and error record
Protocol Rule
tool discovery, authorization, and agent handoff should have a capability card
Safety Rule
permissions, tool boundaries, sandbox guardrails, and rollback are part of design
Depth Split
MCP/frameworks/multi-agent/deployment after single-Agent loop is stable

Before leaving this chapter, you should be able to:

  • explain goal, state, plan, tool, observation, memory, trace, and guardrail;
  • run the trace script and block a non-whitelisted tool;
  • save agent_traces.jsonl, tools_schema.md, protocol_card.md, safety_boundary.md, and failure_cases.md;
  • judge whether a task needs workflow, RAG, function calling, or an Agent;
  • run the full Chapter 9 workshop and add one evaluation task plus one safety-block example.

For a printable checklist, use 9.0 Learning Checklist. For the guided project, start with 9.10.5 Hands-on: Build a Traceable Single-Agent Assistant.

Check reasoning and explanation
  1. A passing answer describes the agent loop: goal, plan, tool call, observation, memory or state update, and stop condition.
  2. The evidence should include a trace that another developer can inspect, not only the final answer.
  3. A good self-check names one safety or reliability control such as tool schemas, permission boundaries, retries, evaluation cases, or a human-review point.