3.3.1 Pandas Roadmap: From Raw Table to Analysis Table

Pandas is the table workstation of this course. Use it when a plain CSV, Excel file, log table, or SQL query result must become clean enough for charts, machine learning, RAG evaluation, or reports.

Look at the Workflow First

Pandas data processing roadmap

Keep this one-line flow in mind:

read -> inspect -> select -> clean -> transform -> group -> merge -> export

Do not memorize every API first. Ask: what table do I have, what table do I need, and which step changes one into the other?

Run a Tiny Table Once

Create pandas_first_loop.py and run it after installing pandas.

import pandas as pd

orders = pd.DataFrame(
    [
        {"date": "2026-05-01", "category": "book", "amount": 120},
        {"date": "2026-05-02", "category": "tool", "amount": 80},
        {"date": "2026-05-03", "category": "book", "amount": None},
        {"date": "2026-06-01", "category": "book", "amount": 150},
    ]
)

clean = (
    orders.dropna(subset=["amount"])
    .assign(month=lambda df: pd.to_datetime(df["date"]).dt.to_period("M").astype(str))
)
summary = clean.groupby(["month", "category"], as_index=False)["amount"].sum()

print(summary)

Expected output shape:

     month category  amount
2026-05     book   120.0
2026-05     tool    80.0
2026-06     book   150.0

You just did the core Pandas loop: create/read data, clean missing values, add a derived column, group, and summarize.

Learn in This Order

Order	Read	What to practice
1	3.3.2 Core Data Structures	`Series`, `DataFrame`, `Index`
2	3.3.3 Data Reading and Writing	CSV, Excel, JSON, export
3	3.3.4 Selection and Filtering	`loc`, `iloc`, conditions
4	3.3.5 Data Cleaning	missing values, duplicates, types
5	3.3.6 Data Transformation	new columns, mapping, string/date handling
6	3.3.7 Grouping and Aggregation	`groupby`, metrics, category/month summaries
7	3.3.8 Data Merging	join multiple tables safely
8	3.3.9 Time Series	date index, resampling, time windows

Pass Check

You pass this subchapter when you can turn one raw table into one clean summary table, explain each column change, and save the result for visualization or machine learning.

Look at the Workflow First​

Run a Tiny Table Once​

Learn in This Order​

Pass Check​

Look at the Workflow First

Run a Tiny Table Once

Learn in This Order

Pass Check