Skip to content

10.4.1 Segmentation Roadmap: Pixel-Level Regions

Segmentation is finer than detection. Instead of a box, it outputs a mask that says which pixels belong to a class or instance.

Image segmentation chapter learning order diagram

Semantic segmentation mask example

Semantic segmentation IoU and boundary map

The main object in this chapter is the mask. The main failure is often boundary quality, tiny objects, occlusion, or class confusion.

This script compares two tiny binary masks.

truth = [
[1, 1, 0],
[1, 0, 0],
[0, 0, 0],
]
pred = [
[1, 0, 0],
[1, 1, 0],
[0, 0, 0],
]
intersection = 0
union = 0
for y in range(3):
for x in range(3):
intersection += truth[y][x] == 1 and pred[y][x] == 1
union += truth[y][x] == 1 or pred[y][x] == 1
print("mask_iou:", round(intersection / union, 3))

Expected output:

Terminal window
mask_iou: 0.5

Segmentation reports should show masks, metrics, and boundary errors, not only a colored overlay.

StepReadPractice Output
1Semantic segmentationPredict one class for every pixel
2Instance segmentationSeparate different objects of the same class
3Segmentation practiceCompare masks, IoU/Dice, boundary errors, and failed samples

Keep this page’s proof of learning as a small evidence card:

Input Image
original image and target mask or class map
Prediction
predicted mask, overlay visualization, and boundary examples
Metric
IoU, Dice, per-class score, and boundary failure notes
Failure Check
annotation quality, thin boundary, small region, or class confusion
Expected Output
mask overlay plus segmentation metric summary

You pass this chapter when you can create or inspect a mask, compute a simple overlap metric, and explain one boundary or class-confusion failure.

Check reasoning and explanation
  1. A passing answer maps the task to the right visual output: class label, bounding box, mask, OCR text, embedding, or video event.
  2. The evidence should include a rendered visual artifact and one metric or qualitative error note.
  3. A good self-check names one visual failure mode such as class confusion, missed objects, bad masks, lighting shift, domain shift, or weak annotation quality.