Skip to content

9.2.4 ReAct Framework

ReAct reasoning-action-observation loop diagram

  • Understand the core ReAct loop: Thought -> Action -> Observation
  • Understand how it differs from pure CoT
  • Learn a minimal, runnable ReAct agent loop through an example
  • Understand what kinds of problems ReAct is best for, and when it becomes cumbersome

Because many answers are not in the model’s head

Section titled “Because many answers are not in the model’s head”

For example:

  • What is the weather in Beijing today?
  • What is the current status of a certain order?
  • What is the exact sum of these two numbers?

These questions depend on:

  • Real-time external information
  • Precise tool capabilities

If the model only relies on itself to “guess,” it can lead to:

  • Hallucinations
  • Overconfidence
  • Calculation errors

The essence of ReAct: think while getting new information

Section titled “The essence of ReAct: think while getting new information”

Its typical loop is:

  1. Thought What information am I missing now?
  2. Action Which tool should I call?
  3. Observation What did the tool return?
  4. Enter the next round of thinking

This allows the Agent to do more than just “make up an answer in its head,” and instead gradually move closer to the real environment.

An analogy: like doing an investigation, not writing in isolation

Section titled “An analogy: like doing an investigation, not writing in isolation”

Pure CoT is more like solving a problem on scratch paper. ReAct is more like doing an investigation:

  • First think about what to check
  • Gather evidence
  • Then continue judging based on the evidence

The fundamental difference between ReAct and CoT

Section titled “The fundamental difference between ReAct and CoT”

The core questions are:

  • How to break down the steps
  • How to maintain intermediate state

ReAct focuses on “reasoning + external interaction”

Section titled “ReAct focuses on “reasoning + external interaction””

It adds another layer:

  • When should it ask the outside world for information?

So ReAct is more like:

  • CoT + Tool Loop

Why is this especially important for Agents?

Section titled “Why is this especially important for Agents?”

Because Agents do more than static Q&A. They often need to:

  • Query a knowledge base
  • Call a database
  • Perform calculations
  • Execute commands

All of these require the system to continuously connect with the external world during reasoning.


First run a real minimal ReAct closed loop

Section titled “First run a real minimal ReAct closed loop”

The following example simulates a small e-commerce assistant. The user asks:

  • What is the refund policy?
  • For an order amount of 299 + 15, how much will be refunded in the end?

The Agent needs to:

  1. Check the refund policy first
  2. Then call the calculator
  3. Finally combine the information into an answer
import ast
import operator
OPS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
}
def safe_calculate(expression):
def visit(node):
if isinstance(node, ast.Expression):
return visit(node.body)
if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
return node.value
if isinstance(node, ast.BinOp) and type(node.op) in OPS:
return OPS[type(node.op)](visit(node.left), visit(node.right))
if isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
return -visit(node.operand)
raise ValueError("unsupported_expression")
return visit(ast.parse(expression, mode="eval"))
def search_policy(topic):
policies = {
"refund": "Unshipped orders can be refunded directly. The amount will be returned to the original payment method, usually within 3 to 7 business days.",
}
return policies.get(topic, "No related policy found.")
def calculator(expression):
return str(safe_calculate(expression))
def policy(state):
trace = state["trace"]
question = state["question"]
if not any(item["action"] == "search_policy" for item in trace):
return {
"thought": "I need to confirm the refund policy first before answering the policy part.",
"action": "search_policy",
"args": {"topic": "refund"},
}
if not any(item["action"] == "calculator" for item in trace):
return {
"thought": "Now that I know the policy, I should calculate the refund amount 299 + 15.",
"action": "calculator",
"args": {"expression": "299 + 15"},
}
policy_text = next(item["observation"] for item in trace if item["action"] == "search_policy")
amount = next(item["observation"] for item in trace if item["action"] == "calculator")
return {
"thought": "I have enough information now, so I can provide the final answer.",
"action": None,
"answer": f"{policy_text} The estimated refund amount for this order is {amount} yuan.",
}
TOOLS = {
"search_policy": search_policy,
"calculator": calculator,
}
def run_react(question, max_steps=5):
state = {"question": question, "trace": []}
for _ in range(max_steps):
decision = policy(state)
if decision["action"] is None:
return state["trace"], decision["answer"]
tool_name = decision["action"]
observation = TOOLS[tool_name](**decision["args"])
state["trace"].append(
{
"thought": decision["thought"],
"action": tool_name,
"args": decision["args"],
"observation": observation,
}
)
return state["trace"], "Maximum steps reached, task not completed."
trace, answer = run_react("What is the refund policy? For an order amount of 299 + 15, how much will be refunded in the end?")
print("trace:")
for item in trace:
print(item)
print("\nfinal answer:")
print(answer)

Expected output:

Terminal window
trace:
{'thought': 'I need to confirm the refund policy first before answering the policy part.', 'action': 'search_policy', 'args': {'topic': 'refund'}, 'observation': 'Unshipped orders can be refunded directly. The amount will be returned to the original payment method, usually within 3 to 7 business days.'}
{'thought': 'Now that I know the policy, I should calculate the refund amount 299 + 15.', 'action': 'calculator', 'args': {'expression': '299 + 15'}, 'observation': '314'}
final answer:
Unshipped orders can be refunded directly. The amount will be returned to the original payment method, usually within 3 to 7 business days. The estimated refund amount for this order is 314 yuan.

ReAct refund tool trace result map

It is recommended to read it in this order:

  1. Start with policy Understand how the agent decides the “next step” each round
  2. Then look at TOOLS Understand where external capabilities come from
  3. Finally look at run_react Understand how the full loop gradually accumulates the trace

Because ReAct does not answer in one shot, but progresses step by step.

With a trace, you can know:

  • What it thought
  • What it called
  • What it saw
  • Why it gave that final answer

This is crucial for debugging.

Why is ReAct often stronger than “calling a tool directly once”?

Section titled “Why is ReAct often stronger than “calling a tool directly once”?”

Because real problems are often not solved in a single step. The order of tool calls may depend on the result of the previous step.

For example here:

  • First confirm the policy
  • Then calculate the amount
  • Then compose the answer

This is exactly the kind of structure ReAct is best at.


Tasks that require multiple rounds of observation

Section titled “Tasks that require multiple rounds of observation”

For example:

  • Search first, then calculate
  • Check first, then compare
  • Inspect the status first, then decide the next step

If every task is strictly:

  1. Check A
  2. Check B
  3. Output

Then a normal workflow may be enough.

ReAct is more suitable when:

  • The result of the current step affects the next choice

Because ReAct naturally has:

  • thought
  • action
  • observation

This makes it a good fit for:

  • Debugging
  • Replay
  • Error analysis

What are the most common problems with ReAct?

Section titled “What are the most common problems with ReAct?”

If the agent keeps:

  • Thinking
  • Acting
  • Thinking again
  • Acting again

Then it can become:

  • Slow
  • Expensive
  • Prone to drifting off track

ReAct does not guarantee the right tool is chosen each round. It may:

  • Query the wrong knowledge source
  • Call the same tool repeatedly
  • Call a tool that is actually unnecessary

Even if the tool returns the correct information, the agent may:

  • Ignore key fields
  • Misread the result
  • Combine the information incorrectly in the end

This shows that the difficulty of ReAct is not only “whether there is a tool,” but also “whether the tool output can be understood.”


How can we make ReAct more stable in practice?

Section titled “How can we make ReAct more stable in practice?”

The clearer the tool description is, the less likely the agent is to call tools incorrectly.

One of the simplest ways to avoid useless loops is to:

  • Set max_steps explicitly

If the tool returns a messy block of natural language, the agent is more likely to misread it.

A more stable approach is usually to:

  • Return structured fields

For example:

  • {"refund_days": "3-7", "channel": "original_payment"}

Misconception 1: ReAct just means “can call tools”

Section titled “Misconception 1: ReAct just means “can call tools””

That is not accurate enough. The key idea of ReAct is:

  • Reasoning and action alternate and progress together

Misconception 2: As long as there is a trace, it must be reliable

Section titled “Misconception 2: As long as there is a trace, it must be reliable”

A trace is traceable, but it does not automatically guarantee correctness.

Misconception 3: All Agents should use ReAct

Section titled “Misconception 3: All Agents should use ReAct”

Not necessarily. If the process is highly fixed, an explicit workflow may be simpler and more stable.


Keep this page’s proof of learning as a small evidence card:

Task Goal
what the agent is trying to solve
Plan Or Trace
reasoning steps, plan, ReAct trace, or execution graph
Observation
what changed after each action
Failure Check
hallucinated step, stale observation, loop, or unverified conclusion
Eval Action
compare against expected result and revise the plan

The most important thing in this lesson is not to treat ReAct as a buzzword, but to understand why it matters:

When a task requires thinking while also obtaining information from the external world, ReAct can organize “reasoning” and “acting” into a loop that gathers evidence step by step and gradually approaches the answer.

Once this understanding is clear, you will find it much easier to follow more complex Agent traces, tool strategies, and multi-step execution frameworks later on.


  1. Add another tool to the example, such as check_order_status, so the agent has one more step of judgment.
  2. Why is ReAct more suitable for tasks where the “next action depends on the previous observation”?
  3. Why is ReAct more likely to make mistakes if the tool output is messy?
  4. Think of a task that is better suited to a fixed workflow and not very suitable for ReAct.
Reference implementation and walkthrough
  1. check_order_status should add a new action choice and an observation that can change the next step.
  2. ReAct fits when each observation can change the plan: search result, tool error, missing field, permission result, or calculation output.
  3. Messy tool output makes the observation hard to interpret, so the next action may be based on the wrong signal.
  4. Password reset, invoice creation, or approval flows with strict required steps often fit fixed workflows better than open ReAct loops.