10.4.2 Semantic Segmentation

Learning Objectives
Section titled “Learning Objectives”- Understand the difference between semantic segmentation and classification/detection
- Understand why segmentation masks are more fine-grained
- Use a runnable example to understand pixel-level labels and IoU
- Build a basic intuition for the semantic segmentation task
First, Build a Map
Section titled “First, Build a Map”If you have just finished learning classification and detection, you can think of this section as:
- Classification gives one answer for the whole image
- Detection gives one box for each object
- Segmentation starts giving a category to each pixel
So the real new core ideas in this section are:
- The output granularity becomes the finest
- Evaluation also becomes more fine-grained
- Whether the model “understands boundaries” becomes very important
The most beginner-friendly order for understanding semantic segmentation is:
flowchart LR A["Image Classification"] --> B["Give one label for the whole image"] A2["Object Detection"] --> B2["Give each object a box"] A3["Semantic Segmentation"] --> B3["Give each pixel a label"]The most important thing in this section is not to memorize network names first, but to understand:
- Why segmentation is more fine-grained than detection
- Why the mask is the key representation
A Better Overall Analogy for Beginners
Section titled “A Better Overall Analogy for Beginners”If we compare classification, detection, and segmentation together, it can be understood like this:
- Classification is like answering: “What is this photo mainly about?”
- Detection is like answering: “What objects are in the image, and roughly where are they?”
- Segmentation is like answering: “If we color the whole image, every pixel must decide who it belongs to.”
This makes it much easier to see why segmentation is harder:
- It does not give just one answer
- It does not just draw a few boxes
- It is responsible for every region in the entire image
What Exactly Does Semantic Segmentation Do?
Section titled “What Exactly Does Semantic Segmentation Do?”Its goal is:
- Assign a class to every pixel in an image
For example:
- sky
- road
- person
- car
What Should You Remember First When Learning This Section?
Section titled “What Should You Remember First When Learning This Section?”The most important thing to remember first is:
- Classification gives a whole-image label
- Detection gives boxes
- Segmentation gives regions
And this “region” is not just a rough hint — every pixel belongs to a class.
Why Is It More Fine-Grained Than Detection?
Section titled “Why Is It More Fine-Grained Than Detection?”Because detection only gives boxes, while segmentation gives the region boundaries much more precisely.
Why Is the “Pixel-Level” Idea the Most Important Part to Grasp First?
Section titled “Why Is the “Pixel-Level” Idea the Most Important Part to Grasp First?”Because starting from this section, vision tasks are no longer just about:
- finding objects
- drawing boxes
They start answering:
- What does this region actually belong to?
- Is this boundary drawn accurately?
So you can think of semantic segmentation simply as:
Turning the whole image into a “map where every pixel has a semantic label.”
First, Run a Minimal Segmentation Mask Example
Section titled “First, Run a Minimal Segmentation Mask Example”pred_mask = [ [0, 0, 1], [0, 1, 1], [0, 0, 1],]
gt_mask = [ [0, 0, 1], [0, 1, 1], [0, 1, 1],]
def iou_for_class(pred, gt, target_class): inter = 0 union = 0 for pred_row, gt_row in zip(pred, gt): for p, g in zip(pred_row, gt_row): if p == target_class and g == target_class: inter += 1 if p == target_class or g == target_class: union += 1 return inter / union if union else 0.0
print("IoU for class 1:", round(iou_for_class(pred_mask, gt_mask, 1), 4))Expected output:
IoU for class 1: 0.8Here, class 1 is the foreground region. One foreground pixel is missed, so the predicted mask still looks close, but the IoU already drops to 0.8.
What Is the Most Important Intuition in This Example?
Section titled “What Is the Most Important Intuition in This Example?”Segmentation evaluation is not about whether the whole image is correct, but about:
- how well the regions overlap
Why Does IoU Matter So Much in Segmentation?
Section titled “Why Does IoU Matter So Much in Segmentation?”Because in segmentation tasks:
- Getting the class right is not enough
- The regions must overlap reasonably well to count as truly correct
That is why you often see:
- per-class IoU
- mIoU
instead of only a single overall accuracy value.
This is why IoU is also very important in segmentation.
When a Beginner Learns Segmentation for the First Time, What Three Things Should They Remember Most?
Section titled “When a Beginner Learns Segmentation for the First Time, What Three Things Should They Remember Most?”- A mask is a pixel-level label map
- Segmentation evaluation cares more about region overlap than classification does
- Small classes and boundary regions are often the hardest
Why Do Segmentation Projects So Often Have the Problem of “Looks Almost the Same, But the Metrics Are Very Different”?
Section titled “Why Do Segmentation Projects So Often Have the Problem of “Looks Almost the Same, But the Metrics Are Very Different”?”Because as long as the boundary region is slightly wrong, or a small class is missed, the IoU can drop very noticeably.
This is also why segmentation tasks are especially good for helping beginners build this awareness:
In visual results, “looks similar” is not enough; the region boundary itself is part of the result.
What Is Most Likely to Be Underestimated When Learning Segmentation for the First Time?
Section titled “What Is Most Likely to Be Underestimated When Learning Segmentation for the First Time?”The most commonly underestimated parts are:
- boundaries
- small classes
- class imbalance
Because these areas often take up only a tiny portion in visualizations, but they can have a huge impact on IoU and real-world performance.
Look at Another Minimal “Class Imbalance” Example
Section titled “Look at Another Minimal “Class Imbalance” Example”mask = [ [0, 0, 0, 0], [0, 1, 1, 0], [0, 0, 0, 0], [0, 0, 0, 0],]
def class_counts(mask): counts = {} for row in mask: for value in row: counts[value] = counts.get(value, 0) + 1 return counts
print(class_counts(mask))Expected output:
{0: 14, 1: 2}There are only 2 foreground pixels but 14 background pixels. This is why a model can get high overall pixel accuracy while still failing the small class that you actually care about.
Although this example is very small, it can help beginners immediately understand a real-world issue:
- Background pixels are often in the majority
- The truly important small classes may take up only a tiny part
So in segmentation projects, if you only focus on overall pixel accuracy, it is easy to be misled by the fact that “the background was all predicted correctly.”

The Most Common Pitfalls
Section titled “The Most Common Pitfalls”Inaccurate Boundaries
Section titled “Inaccurate Boundaries”Segmentation models can easily make mistakes at object edges.
Extremely Imbalanced Classes
Section titled “Extremely Imbalanced Classes”Backgrounds are often too dominant, and small object classes can easily be ignored.
Only Looking at Overall Pixel Accuracy
Section titled “Only Looking at Overall Pixel Accuracy”High pixel accuracy does not mean small classes are actually segmented well.
Only Looking at Color Visualizations, Without Failure Analysis
Section titled “Only Looking at Color Visualizations, Without Failure Analysis”Many beginners make a few nice-looking mask images and stop there when they first do segmentation. But what really drives project progress is usually dividing failure cases into:
- boundary errors
- missed small classes
- class confusion
- ambiguous annotations
Only then can you know what to improve next:
- the data
- the loss function
- the sampling strategy
- or the model architecture
The Right Expectations When Studying This Section
Section titled “The Right Expectations When Studying This Section”The most important thing in this section is not to master complex segmentation models today, but to truly understand:
- Segmentation is more fine-grained than detection
- Segmentation results usually depend more on region-level metrics
- Small classes and boundary errors can greatly affect real performance
The Best Order When Doing a Semantic Segmentation Project for the First Time
Section titled “The Best Order When Doing a Semantic Segmentation Project for the First Time”A better approach is:
- Narrow down the class definitions first
- Then unify the mask annotation standard
- Start with a minimal baseline
- Then look at per-class IoU and failure cases
This will make it much easier to understand the project clearly than chasing a complex network from the start.
If You Turn This into a Project, What Is Most Worth Showing?
Section titled “If You Turn This into a Project, What Is Most Worth Showing?”- Original image
- Predicted mask
- GT mask
- per-class IoU
- Boundary comparisons in failure cases
This will look much more like a real project than just pasting a single “colorful mask” image.
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Input Image
- original image and target mask or class map
- Prediction
- predicted mask, overlay visualization, and boundary examples
- Metric
- IoU, Dice, per-class score, and boundary failure notes
- Failure Check
- annotation quality, thin boundary, small region, or class confusion
- Expected Output
- mask overlay plus segmentation metric summary
Summary
Section titled “Summary”The most important thing in this section is to build this judgment:
Semantic segmentation is pixel-level classification, so it is more fine-grained than classification and detection, and it depends more on region-level evaluation.
What Should You Take Away from This Section?
Section titled “What Should You Take Away from This Section?”- The core object of segmentation is the mask, not the whole-image label or the box
- IoU is not a side metric in segmentation; it is a core evaluation perspective
- Boundaries and class imbalance are common challenges in segmentation tasks
If we compress it into one sentence, it is:
Semantic segmentation builds a pixel-level semantic map, so what it really cares about is not only “whether it was recognized,” but also “whether the region boundaries were drawn correctly.”
Exercises
Section titled “Exercises”- Modify a set of
pred_maskvalues yourself and observe how IoU changes. - Why does high pixel accuracy not necessarily mean the segmentation model is truly good?
- What is the biggest difference between semantic segmentation and object detection?
- If the classes are very imbalanced, what problem would you worry about most?
Reference implementation and walkthrough
- Changing
pred_maskchanges the intersection and union. Even small boundary errors can noticeably lower IoU, especially for small regions. - High pixel accuracy can be misleading because background pixels may dominate the image. A model can look accurate while failing rare or small classes.
- Semantic segmentation gives a class label to every pixel, while object detection gives boxes. Semantic segmentation usually does not separate two instances of the same class.
- With severe class imbalance, worry most about the model predicting only background or large classes. Use class-wise IoU/Dice and consider weighted loss or sampling.