Skip to main content

3.3.1 Pandas Roadmap: From Raw Table to Analysis Table

Pandas is the table workstation of this course. Use it when a plain CSV, Excel file, log table, or SQL query result must become clean enough for charts, machine learning, RAG evaluation, or reports.

Look at the Workflow First

Pandas data processing roadmap

Keep this one-line flow in mind:

read -> inspect -> select -> clean -> transform -> group -> merge -> export

Do not memorize every API first. Ask: what table do I have, what table do I need, and which step changes one into the other?

Run a Tiny Table Once

Create pandas_first_loop.py and run it after installing pandas.

import pandas as pd

orders = pd.DataFrame(
[
{"date": "2026-05-01", "category": "book", "amount": 120},
{"date": "2026-05-02", "category": "tool", "amount": 80},
{"date": "2026-05-03", "category": "book", "amount": None},
{"date": "2026-06-01", "category": "book", "amount": 150},
]
)

clean = (
orders.dropna(subset=["amount"])
.assign(month=lambda df: pd.to_datetime(df["date"]).dt.to_period("M").astype(str))
)
summary = clean.groupby(["month", "category"], as_index=False)["amount"].sum()

print(summary)

Expected output shape:

     month category  amount
0 2026-05 book 120.0
1 2026-05 tool 80.0
2 2026-06 book 150.0

You just did the core Pandas loop: create/read data, clean missing values, add a derived column, group, and summarize.

Learn in This Order

OrderReadWhat to practice
13.3.2 Core Data StructuresSeries, DataFrame, Index
23.3.3 Data Reading and WritingCSV, Excel, JSON, export
33.3.4 Selection and Filteringloc, iloc, conditions
43.3.5 Data Cleaningmissing values, duplicates, types
53.3.6 Data Transformationnew columns, mapping, string/date handling
63.3.7 Grouping and Aggregationgroupby, metrics, category/month summaries
73.3.8 Data Mergingjoin multiple tables safely
83.3.9 Time Seriesdate index, resampling, time windows

Pass Check

You pass this subchapter when you can turn one raw table into one clean summary table, explain each column change, and save the result for visualization or machine learning.