Skip to content

3.4.3 Seaborn Statistical Visualization

Seaborn Statistical Plot Selection Guide

  • Understand the relationship between Seaborn and Matplotlib
  • Master distribution plots, relational plots, and categorical plots
  • Learn to draw heatmaps and correlation matrices
  • Use FacetGrid for faceting plots

For beginners, the best way to understand Seaborn is not by memorizing its function list, but by first seeing the 4 problem types it handles best:

flowchart LR
A["Distribution"] --> B["histplot / kdeplot"]
C["Relationship"] --> D["scatterplot / lineplot / pairplot"]
E["Categorical comparison"] --> F["boxplot / violinplot / barplot / countplot"]
G["Matrix relationships"] --> H["heatmap"]

So what this section really aims to solve is:

  • When you are doing EDA, which plot should you use first?
  • Why is Seaborn more suitable than plain Matplotlib for quick exploration?

If you think of Matplotlib as brushes and paint, then Seaborn is a brush set + palette + templates.

flowchart LR
A["Matplotlib<br/>Basic plotting tools"] -->|"Seaborn wraps it"| B["Seaborn<br/>Beautiful statistical charts"]
B -->|"Under the hood, still based on"| A
C["Pandas<br/>DataFrame"] -->|"Pass directly"| B
style A fill:#e3f2fd,stroke:#1565c0,color:#333
style B fill:#e8f5e9,stroke:#2e7d32,color:#333
style C fill:#fff3e0,stroke:#e65100,color:#333
ComparisonMatplotlibSeaborn
PositioningLow-level plotting libraryHigh-level statistical plotting library
Amount of codeMore, requires manual setupLess, ready to use out of the box
Default aestheticsAverageVery polished
Data formatArrays, listsDirectly uses DataFrame
Statistical featuresMust compute manuallyAutomatically computes means, confidence intervals, etc.
CustomizationExtremely strongModerate (can be extended with Matplotlib)

One-line summary: Seaborn lets you create a beautiful statistical plot in 1 line of code that might take 10 lines with Matplotlib.

You can think of Seaborn as:

  • A data visualization tool where the table is already set for you

Matplotlib is like setting everything up from scratch, pots and pans included, while Seaborn is like having the tableware and default style already prepared. This makes it easier to focus on:

  • What statistical phenomenon this chart is meant to show

# Install
# python -m pip install --upgrade seaborn
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Seaborn's built-in example datasets
tips = sns.load_dataset("tips") # restaurant tip data
iris = sns.load_dataset("iris") # iris flower data
titanic = sns.load_dataset("titanic") # Titanic data
# Set global style
sns.set_theme(style="whitegrid") # white background + grid, clean and nice
StyleDescriptionBest use case
"whitegrid"White background + gridNumerical comparison (recommended default)
"darkgrid"Gray background + gridHighlight data points
"white"Plain white backgroundPapers, reports
"dark"Gray backgroundArtistic style
"ticks"White background + tick marksClean and professional

Distribution Plots: What Does the Data Look Like?

Section titled “Distribution Plots: What Does the Data Look Like?”

Distribution plots help answer: Where are the values concentrated? How spread out are they? Is the distribution skewed?

The Most Reliable Default Order for Your First EDA

Section titled “The Most Reliable Default Order for Your First EDA”

A safer workflow is usually:

  1. Start with distribution plots First see where the data is concentrated and whether it is skewed.
  2. Then look at relational plots Check whether there is a clear relationship between variables.
  3. Then look at categorical plots Compare differences across groups.
  4. Finally, check heatmaps Quickly scan the overall correlation structure.

This order is especially good for beginners because it gives the exploration process a clear main thread.

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Basic histogram
sns.histplot(data=tips, x="total_bill", ax=axes[0])
axes[0].set_title("Basic Histogram")
# Add density curve
sns.histplot(data=tips, x="total_bill", kde=True, ax=axes[1])
axes[1].set_title("Histogram + Density Curve")
# Color by category
sns.histplot(data=tips, x="total_bill", hue="time", kde=True, ax=axes[2])
axes[2].set_title("Grouped by Meal Time")
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# One-dimensional density
sns.kdeplot(data=tips, x="total_bill", hue="sex", fill=True, ax=axes[0])
axes[0].set_title("Distribution of Total Bill Density")
# Two-dimensional density (contours)
sns.kdeplot(data=tips, x="total_bill", y="tip", fill=True, cmap="Blues", ax=axes[1])
axes[1].set_title("Joint Density of Total Bill vs Tip")
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(8, 4))
sns.kdeplot(data=tips, x="total_bill", fill=True, ax=ax)
sns.rugplot(data=tips, x="total_bill", ax=ax, alpha=0.5)
ax.set_title("Density Curve + Rug Plot (Each line represents one data point)")
plt.show()

Relational Plots: What Is the Relationship Between Variables?

Section titled “Relational Plots: What Is the Relationship Between Variables?”
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Basic scatter plot, use color to distinguish categories
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time", ax=axes[0])
axes[0].set_title("Total Bill vs Tip")
# Use size and color to show information at the same time
sns.scatterplot(data=tips, x="total_bill", y="tip",
hue="day", size="size", sizes=(20, 200), ax=axes[1])
axes[1].set_title("Multidimensional Scatter Plot")
plt.tight_layout()
plt.show()

lineplot: Line Plot (with Confidence Interval)

Section titled “lineplot: Line Plot (with Confidence Interval)”
# Simulated experimental data: each x has multiple y values
rng = np.random.default_rng(seed=42)
data = pd.DataFrame({
"step": np.tile(np.arange(1, 51), 10),
"accuracy": np.tile(np.linspace(0.5, 0.95, 50), 10) + rng.normal(0, 0.03, 500),
"model": np.repeat(["Model A", "Model B"], 250)
})
fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(data=data, x="step", y="accuracy", hue="model", ax=ax)
ax.set_title("Training Accuracy Over Time (Shaded area = 95% confidence interval)")
plt.show()

pairplot: A Quick Look at Pairwise Relationships

Section titled “pairplot: A Quick Look at Pairwise Relationships”
# Relationships among all variables in the iris dataset
sns.pairplot(iris, hue="species", diag_kind="kde", corner=True)
plt.suptitle("Feature Relationships in the Iris Dataset", y=1.02)
plt.show()

pairplot can show relationships among all variables with just one line of code, making it a powerful tool in the data exploration stage.


Categorical Plots: How Do Different Groups Compare?

Section titled “Categorical Plots: How Do Different Groups Compare?”

Categorical plots are one of Seaborn’s strengths. They help you compare distributions and statistics across categories.

A Handy Plot Selection Guide for Beginners

Section titled “A Handy Plot Selection Guide for Beginners”
What you want to know mostSafer first choice
What does the distribution of this column look like?histplot
Is there a relationship between two variables?scatterplot
Do the distributions of two or more groups differ a lot?boxplot / violinplot
Which category has more samples?countplot
Are multiple numerical variables correlated?heatmap

This table can save beginners a lot of detours because you do not have to be overwhelmed by function names at the start.

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Basic box plot
sns.boxplot(data=tips, x="day", y="total_bill", ax=axes[0])
axes[0].set_title("Total Bill Distribution by Day")
# Use color to distinguish subgroups
sns.boxplot(data=tips, x="day", y="total_bill", hue="sex", ax=axes[1])
axes[1].set_title("Total Bill Distribution by Day (by Sex)")
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(10, 5))
sns.violinplot(data=tips, x="day", y="total_bill", hue="sex",
split=True, inner="quart", ax=ax)
ax.set_title("Total Bill Distribution by Day (Violin Plot, female left, male right)")
plt.show()

A violin plot = box plot + density distribution, so it shows more of the distribution shape than a box plot.

fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=tips, x="day", y="total_bill", hue="sex",
ci=95, ax=ax) # ci=95 means a 95% confidence interval
ax.set_title("Average Total Bill by Day (error bars = 95% confidence interval)")
plt.show()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Simple counts
sns.countplot(data=tips, x="day", order=["Thur", "Fri", "Sat", "Sun"], ax=axes[0])
axes[0].set_title("Number of Diners by Day")
# Grouped counts
sns.countplot(data=titanic, x="class", hue="survived", ax=axes[1])
axes[1].set_title("Survival by Class")
plt.tight_layout()
plt.show()

A heatmap uses color intensity to represent values and is most commonly used for correlation matrices.

# Compute correlations for numeric columns
# Select the numeric columns in tips
numeric_cols = tips.select_dtypes(include="number")
corr = numeric_cols.corr()
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="RdBu_r",
center=0, vmin=-1, vmax=1,
square=True, linewidths=0.5, ax=ax)
ax.set_title("Correlation Matrix of the Tips Dataset")
plt.tight_layout()
plt.show()

Key parameters:

ParameterPurposeCommon value
annotShow valuesTrue
fmtNumber format".2f" for two decimals
cmapColor map"RdBu_r" inverted red-blue
centerCenter value for colors0 (correlation coefficient)
squareSquare cellsTrue
# Pivot-table heatmap (for example, average total bill by day and time)
pivot = tips.pivot_table(values="total_bill", index="day", columns="time", aggfunc="mean")
fig, ax = plt.subplots(figsize=(6, 4))
sns.heatmap(pivot, annot=True, fmt=".1f", cmap="YlOrRd",
linewidths=1, ax=ax)
ax.set_title("Average Total Bill by Day and Time")
plt.show()

Use FacetGrid when you want to split a chart into multiple subplots based on a variable.

# Facet by meal time to show the relationship between total bill and tip
g = sns.FacetGrid(tips, col="time", row="sex", hue="smoker",
height=4, aspect=1.2)
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
g.add_legend()
g.fig.suptitle("Total Bill vs Tip Faceted by Time and Sex", y=1.02)
plt.show()
# Histogram faceted by day
g = sns.FacetGrid(tips, col="day", col_wrap=2, height=3)
g.map_dataframe(sns.histplot, x="total_bill", kde=True)
g.set_titles("Day: {col_name}")
g.fig.suptitle("Total Bill Distribution by Day", y=1.02)
plt.show()
ParameterPurpose
colSplit into columns by this variable
rowSplit into rows by this variable
hueUse this variable for color
col_wrapMaximum number of columns per row (wrap automatically)
heightHeight of each subplot
aspectWidth-to-height ratio

Why is faceting especially good for exploration?

Section titled “Why is faceting especially good for exploration?”

Because it helps you turn:

  • “The overall picture looks fine”

into:

  • What is actually different across categories and groups?

This is very important for beginners, because many data issues do not show up in the overall distribution. Instead, they become obvious once you split the data into groups.


root(("Seaborn<br/>Plot Types"))
Distribution plots
histplot Histogram
kdeplot Density curve
rugplot Rug plot
Relational plots
scatterplot Scatter plot
lineplot Line plot
pairplot Matrix of variables
Categorical plots
boxplot Box plot
violinplot Violin plot
barplot Mean bar plot
countplot Count plot
Matrix plots
heatmap Heatmap
Multi-panel
FacetGrid Faceting
pairplot Pairwise

Keep this page’s proof of learning as a small evidence card:

Question
what comparison, distribution, trend, or relationship the chart answers
Chart Choice
line, bar, scatter, histogram, box, heatmap, or interactive dashboard
Artifact
saved chart image/html plus the data slice used
Failure Check
misleading scale, overloaded chart, wrong aggregation, or missing labels
Expected Output
chart artifact with one sentence explaining the insight
NeedFunctionDescription
Check distributionhistplot / kdeplotHistogram / density curve
Relationship between two variablesscatterplot / lineplotScatter plot / line plot
Relationships among all variablespairplotOne-line matrix plot
Compare categoriesboxplot / violinplot / barplotDistribution / mean
Count categoriescountplotBar chart
Numerical matrixheatmapHeatmap
Faceted displayFacetGridMultiple subplots

Core advantage: You can draw beautiful charts with statistical information in one line of code, and pass a DataFrame directly.

What You Should Take Away from This Section

Section titled “What You Should Take Away from This Section”
  • The most important value of Seaborn is not that it is more flashy, but that it is better for quick statistical exploration
  • For your first EDA, start with distribution, then relationships, then categorical comparisons — this is usually the safest path
  • When choosing a plot, it is more important to ask “What statistical phenomenon do I want to see?” than to memorize function names first

# Load the tips dataset
# 1. Use histplot to draw the distribution of tip, colored by time
# 2. Use kdeplot to draw the density curve of total_bill, grouped by sex
# Load the titanic dataset
# 1. Use boxplot to compare the age distribution across classes
# 2. Use countplot to show the number of survivors in each class
# Load the iris dataset
# 1. Compute the correlation matrix for numeric columns
# 2. Visualize it with heatmap and add value annotations
# 3. Use pairplot to inspect relationships among all variables
# Use the tips dataset
# Use FacetGrid to facet by day and draw a scatter plot of total_bill and tip
# Use color to distinguish sex
Reference implementation and walkthrough
  • For distribution charts, use histplot or kdeplot and describe skew, outliers, and the approximate center. A beautiful curve without interpretation is incomplete.
  • For categorical comparisons, box plots and count plots answer different questions: spread versus frequency. Pick the one that matches the question before styling it.
  • For heatmaps and pair plots, explain one or two relationships that matter and one limitation. Correlation or visual clustering is a lead for investigation, not proof by itself.