Skip to content

3.3.1 Pandas Roadmap: From Raw Table to Analysis Table

Pandas is the table workstation of this course. Use it when a plain CSV, Excel file, log table, or SQL query result must become clean enough for charts, machine learning, RAG evaluation, or reports.

Pandas data processing roadmap

Keep this one-line flow in mind:

readinspectselectcleantransformgroupmergeexport

Do not memorize every API first. Ask: what table do I have, what table do I need, and which step changes one into the other?

Create pandas_first_loop.py and run it after installing pandas.

import pandas as pd
orders = pd.DataFrame(
[
{"date": "2026-05-01", "category": "book", "amount": 120},
{"date": "2026-05-02", "category": "tool", "amount": 80},
{"date": "2026-05-03", "category": "book", "amount": None},
{"date": "2026-06-01", "category": "book", "amount": 150},
]
)
clean = (
orders.dropna(subset=["amount"])
.assign(month=lambda df: pd.to_datetime(df["date"]).dt.to_period("M").astype(str))
)
summary = clean.groupby(["month", "category"], as_index=False)["amount"].sum()
print(summary)

Expected output shape:

Terminal window
month category amount
0 2026-05 book 120.0
1 2026-05 tool 80.0
2 2026-06 book 150.0

You just did the core Pandas loop: create/read data, clean missing values, add a derived column, group, and summarize.

OrderReadWhat to practice
13.3.2 Core Data StructuresSeries, DataFrame, Index
23.3.3 Data Reading and WritingCSV, Excel, JSON, export
33.3.4 Selection and Filteringloc, iloc, conditions
43.3.5 Data Cleaningmissing values, duplicates, types
53.3.6 Data Transformationnew columns, mapping, string/date handling
63.3.7 Grouping and Aggregationgroupby, metrics, category/month summaries
73.3.8 Data Mergingjoin multiple tables safely
83.3.9 Time Seriesdate index, resampling, time windows

Keep this page’s proof of learning as a small evidence card:

Dataframe State
columns, dtypes, row count, missing values, and sample rows
Operation
read/write, select/filter, clean, transform, groupby, merge, or time-series step
Output
resulting table, saved file, aggregation, join result, or time index view
Failure Check
dtype mismatch, missing data, duplicated keys, chained assignment, or wrong time frequency
Expected Output
before/after table sample with the transformation reason

You pass this subchapter when you can turn one raw table into one clean summary table, explain each column change, and save the result for visualization or machine learning.

Check reasoning and explanation
  1. A passing answer starts from the question, identifies the table/DataFrame or query needed, and keeps the cleaning step reproducible.
  2. The evidence should include a small output sample, a plot or SQL result when relevant, and one sentence interpreting what changed.
  3. A good self-check names one data-quality risk such as missing values, duplicate rows, wrong joins, misleading aggregation, or an unreadable chart.