Skip to main content

9.1.4 Agent Capability Levels

Agent capability level ladder

Learning Objectives

After completing this section, you will be able to:

  • Describe the capability boundaries of different Agents using a layered approach
  • Distinguish the differences between “can answer,” “can use tools,” and “can complete tasks in multiple steps”
  • Choose a more suitable system form based on task complexity
  • Practice judging the required capability level for a task with a small example

Why do we need to grade Agents?

Because the word “Agent” is too easy to overstate

Some systems only:

  • Can call a single tool

Other systems can:

  • Plan multiple steps
  • Remember state
  • Coordinate multiple tools

If we call them all Agents, many concepts get mixed together.

The value of grading is to describe system capability more honestly

It helps you answer:

  • What exactly can this system do?
  • Is it a stable workflow or a flexible intelligent agent?
  • Which layer is the problem most likely in?

A practical capability grading framework

L0: Pure response type

Features:

  • Generates answers based on input
  • Basically does not actively call tools
  • More like a chat model

Examples:

  • General Q&A bot
  • Pure Prompt generator

L1: Single-tool execution type

Features:

  • Can choose one tool based on the question
  • Responds directly after one call

Examples:

  • Weather lookup assistant
  • Calculator assistant
  • One-time retrieval Q&A

One level higher

L2: Multi-step tool coordination type

Features:

  • Performs two or more actions
  • Can decide the next step based on intermediate results

Examples:

  • First check the order, then check the refund policy, then give a conclusion
  • First search for information, then summarize it into a report

L3: Goal-driven type

Features:

  • Receives a higher-level goal
  • Organizes an execution flow on its own
  • May include state management and failure retry

Examples:

  • Automatic research assistant
  • Automatic data analysis assistant
  • Automatic code-fixing flow

Higher capability usually means higher risk

L4: Long-running / multi-Agent / high autonomy

Features:

  • Can run long task chains
  • May coordinate multiple tools and multiple sub-Agents
  • Has memory, planning, and reflection mechanisms

These systems sound the coolest, but they are also the hardest to engineer.

Higher capability does not mean better suited for your task

Because improved capability often comes with:

  • Higher cost
  • Harder debugging
  • More possible failure paths

So the right mindset is usually not “the higher, the better,” but:

Use the smallest level that is just enough.


A quick capability level reference table

LevelCore capabilityTypical systems
L0Pure responseChat Q&A
L1Single tool callWeather / calculation / one-time retrieval
L2Multi-step executionCheck first and then calculate, search first and then write
L3Goal-drivenResearch assistant, data analysis assistant
L4Long-running autonomy / multi-AgentComplex automation team systems

A small exercise: assign levels to tasks

Runnable example

tasks = [
"Answer: What is RAG?",
"Check Beijing weather",
"First check the refund policy, then decide whether I qualify",
"Automatically generate a weekly report based on sales data and send an email"
]

def recommend_level(task):
task_lower = task.lower()
if "first check" in task_lower and "then" in task_lower:
return "L2"
if "automatically generate a weekly report" in task_lower or "send an email" in task_lower:
return "L3"
if "check" in task_lower:
return "L1"
return "L0"

for task in tasks:
print(task, "-> recommended capability level:", recommend_level(task))

Expected output:

Answer: What is RAG? -> recommended capability level: L0
Check Beijing weather -> recommended capability level: L1
First check the refund policy, then decide whether I qualify -> recommended capability level: L2
Automatically generate a weekly report based on sales data and send an email -> recommended capability level: L3

Of course, this is a simplified version, but it helps you build a very practical habit:

First determine which capability level the task needs, then decide how the system should do it.


How do you upgrade from a lower level?

From L0 to L1

The key is to add:

  • Tool interfaces
  • Parameter generation
  • Filling tool results back into the response

From L1 to L2

The key is to add:

  • Intermediate state
  • Multi-step execution
  • Dependencies between actions

From L2 to L3

The key is to add:

  • Task decomposition
  • Sub-goal management
  • Error recovery

The higher you go, the more it feels like building a “small operating system.”


How do you avoid “overstating capability” in engineering?

Set boundaries for the system first

For example:

  • How many steps can it execute at most?
  • How many tools can it call at most?
  • Which tasks must be confirmed by a human?

Launch with the minimum necessary capability first

Many systems actually only need:

  • L1 or L2

If you jump straight to L4, you often end up with:

  • Too complex
  • Too expensive
  • Too unstable

Common beginner misconceptions

Thinking tool use automatically means an advanced Agent

Being able to call one tool is usually at most L1.

Thinking more steps automatically means smarter

More steps sometimes just means more error paths.

Piling up architecture without distinguishing task levels

This is one of the reasons many Agent projects are hard to ship.


Summary

The most important takeaway from this section is:

An Agent’s capability is not a switch; it is a continuous range of levels.

Once you learn to grade capability, it becomes easier to make safe architectural decisions, and you are less likely to be misled by the phrase “fully autonomous intelligent agent.”


Exercises

  1. Make a list of 5 tasks and decide whether each one is better suited for L0, L1, L2, or L3.
  2. Think about a real project of yours: why might it not need to go all the way to L3 / L4?
  3. If a system often calls the wrong tool, which capability layer is more likely to have the problem?