9.10.3 Project: Data Analysis Agent

Section focus

The real value of a Data Analysis Agent is not:

helping you calculate the average

But rather:

Can it connect “read data -> analyze -> explain conclusions” into a reproducible chain?

That is why this kind of project is especially good for showing multi-step tool coordination and intermediate states.

Learning objectives

Learn how to define the minimum project scope for a Data Analysis Agent
Learn how to connect data input, statistical computation, and explanatory output into a closed loop
Learn how to use a minimal example to demonstrate “reproducibility”
Learn how to package this topic into a strong one-page portfolio project

First, build a map

A Data Analysis Agent is easier to understand as “read data -> compute statistics -> form interpretation -> provide visualization suggestions”:

So what this section really wants to solve is:

Why a Data Analysis Agent is not just “good at calling pandas”
Why a reproducible intermediate process is more important than a final one-line conclusion

How should we narrow the project topic?

It is recommended to start with:

reading a small table
calculating a few core statistics
generating an insight summary based on those statistics

Rather than starting with:

an automatic BI platform
a fully automated report factory

A more beginner-friendly overall analogy

You can think of a Data Analysis Agent as:

an analysis assistant that first computes, then explains, and can also suggest how to visualize the data

Its difference from a regular calculator is not:

calculating faster

But rather:

organizing numbers into conclusions with explanatory power

First run a minimal data analysis loop

This example will:

Read a small sales table
Compute total sales and category averages
Give a simple analytical conclusion

sales = [
    {"category": "course", "amount": 299},
    {"category": "course", "amount": 199},
    {"category": "book", "amount": 59},
    {"category": "book", "amount": 79},
    {"category": "service", "amount": 499},
]


def summarize_sales(rows):
    total = sum(row["amount"] for row in rows)

    grouped = {}
    for row in rows:
        grouped.setdefault(row["category"], []).append(row["amount"])

    per_category_avg = {
        category: round(sum(values) / len(values), 2)
        for category, values in grouped.items()
    }

    top_category = max(per_category_avg, key=per_category_avg.get)

    return {
        "total_amount": total,
        "per_category_avg": per_category_avg,
        "insight": f"{top_category} has the highest average order value.",
    }


result = summarize_sales(sales)
print(result)

Expected output:

{'total_amount': 1135, 'per_category_avg': {'course': 249.0, 'book': 69.0, 'service': 499.0}, 'insight': 'service has the highest average order value.'}

Data Analysis Agent sales trace result map

Reading the result

Use the image to read the printed dictionary as evidence: raw rows feed grouped averages, the highest average becomes the insight, and the chart rule turns the same fields into a presentation suggestion.

Why is this already very project-like?

Because it does not only do “computation”, it also does:

input data
intermediate statistics
output conclusions

This is already the smallest data analysis workflow.

Why is `insight` especially important?

Because users are usually not trying to look at raw numbers, but want:

conclusions with explanatory power

This is exactly the difference between a Data Analysis Agent and a regular calculator.

A beginner-friendly project checklist to remember first

Step	What should you confirm first
Input data	Are the field meanings clear
Intermediate statistics	Are the calculation rules consistent
insight	Do the conclusion and the numbers match
Chart suggestions	Does the chart type fit the data shape

This table is especially useful for beginners because it compresses “Data Analysis Agent” back into a workflow that can be checked step by step.

Data Analysis Agent reproducible workflow diagram

Reading guide

Read this diagram using a notebook mindset: load data, profile schema, compute statistics, generate insight, suggest chart, write report. Every conclusion should be traceable back to the intermediate computation results.

What should a portfolio-level Data Analysis Agent show?

What does the input data look like?

It is best to make clear:

fields
sample size
missing value status

Intermediate computation results

For example:

summary statistics
grouped results
trend judgments

Final explanation

For example:

which product category performed best
which time period fluctuated the most

Chart suggestions

Even if you do not generate charts directly, you can still output:

whether a bar chart or a line chart should be used

This makes the project feel closer to a real analysis assistant.

Add a minimal “chart recommender”

def suggest_chart(columns):
    if "date" in columns and "amount" in columns:
        return "line_chart"
    if "category" in columns and "amount" in columns:
        return "bar_chart"
    return "table"


print(suggest_chart(["category", "amount"]))
print(suggest_chart(["date", "amount"]))

Expected output:

bar_chart
line_chart

What value does this small module provide?

It shows that the project is not just “doing arithmetic”, but is gradually moving toward:

analysis
explanation
visualization suggestions

Let’s look at a minimal “analysis trace” example

Continue in the same file or Python session, because this block reuses sales and result.

trace = {
    "input_rows": len(sales),
    "total_amount": result["total_amount"],
    "per_category_avg": result["per_category_avg"],
    "insight": result["insight"],
}

print(trace)

Expected output:

{'input_rows': 5, 'total_amount': 1135, 'per_category_avg': {'course': 249.0, 'book': 69.0, 'service': 499.0}, 'insight': 'service has the highest average order value.'}

This example is especially suitable for beginners because it helps you see:

where the real value of a Data Analysis Agent project lies
often in whether the process can be verified

The most common pitfalls

Misunderstanding the fields

This is a typical fatal problem for Data Analysis Agents. If the field meanings are misunderstood, the entire workflow may be led astray.

Only showing the conclusion, not the intermediate process

This makes the project feel like a black box and makes it hard to build trust.

Only handling the happy path

If you do not show:

missing values
outliers
inconsistent calculation rules

the project will feel unrealistic.

How do you polish it into a portfolio-level page?

Suggested structure

Raw data example
Intermediate statistics table
Insight summary
Chart suggestions
Error cases

One highlight worth adding

Show:

raw data
intermediate computations
final conclusions

as a single trace. This will be much stronger than pasting only a result.

A beginner-friendly evaluation table to remember first

Dimension	What is the most important first question
Correctness	Are the numbers calculated correctly
Reproducibility	Can the intermediate process be traced back
Interpretability	Do the conclusions match the statistics
Presentation	Do the chart suggestions naturally fit the conclusions

This table is especially useful for beginners because it breaks “Is the Agent project good?” into a few more concrete judgments.

Summary

The most important thing in this section is to build a portfolio-level judgment:

The real highlight of a Data Analysis Agent is not whether it can call pandas, but whether it can organize input data, intermediate computations, and final insights into a reproducible analysis loop.

As long as that loop is clear, this project is very well suited to showing your understanding of multi-tool Agents.

If you turn this into a portfolio project, what is most worth showing?

What is usually most worth showing is not:

a single analytical conclusion

But rather:

A raw data example
Intermediate statistical results
How the insight was generated
Why the chart suggestion was made that way

This makes it easier for others to see:

that you understand the analysis loop
not just that the Agent said something

Suggested version roadmap

Version	Goal	Delivery focus
Basic version	Run the minimal loop	Can input, process, and output, with one set of examples preserved
Standard version	Form a presentable project	Add configuration, logs, error handling, README, and screenshots
Challenge version	Approach portfolio quality	Add evaluation, comparison experiments, failure sample analysis, and next-step roadmap

It is recommended to finish the basic version first; do not chase a large, all-in-one solution from the start. For each version upgrade, write into the README what new capability was added, how it was verified, and what problems still remain.

Exercises

Add a date field to the example data and expand the project into a simple time-trend analysis.
Think about why “reproducibility” is especially important for a Data Analysis Agent.
If the conclusions do not match the numbers, which layer is most likely to be the problem?
If you were presenting this as a portfolio project, which part would you design to be the most eye-catching?

Learning objectives​

First, build a map​

How should we narrow the project topic?​

A more beginner-friendly overall analogy​

First run a minimal data analysis loop​

Why is this already very project-like?​

Why is insight especially important?​

A beginner-friendly project checklist to remember first​

What should a portfolio-level Data Analysis Agent show?​

What does the input data look like?​

Intermediate computation results​

Final explanation​

Chart suggestions​

Add a minimal “chart recommender”​

What value does this small module provide?​

Let’s look at a minimal “analysis trace” example​

The most common pitfalls​

Misunderstanding the fields​

Only showing the conclusion, not the intermediate process​

Only handling the happy path​

How do you polish it into a portfolio-level page?​

Suggested structure​

One highlight worth adding​

A beginner-friendly evaluation table to remember first​

Summary​

If you turn this into a portfolio project, what is most worth showing?​

Suggested version roadmap​

Exercises​

Learning objectives

First, build a map

How should we narrow the project topic?

A more beginner-friendly overall analogy

First run a minimal data analysis loop

Why is this already very project-like?

Why is `insight` especially important?

A beginner-friendly project checklist to remember first

What should a portfolio-level Data Analysis Agent show?

What does the input data look like?

Intermediate computation results

Final explanation

Chart suggestions

Add a minimal “chart recommender”

What value does this small module provide?

Let’s look at a minimal “analysis trace” example

The most common pitfalls

Misunderstanding the fields

Only showing the conclusion, not the intermediate process

Only handling the happy path

How do you polish it into a portfolio-level page?

Suggested structure

One highlight worth adding

A beginner-friendly evaluation table to remember first

Summary

If you turn this into a portfolio project, what is most worth showing?

Suggested version roadmap

Exercises