10.4.3 Instance Segmentation
Learning objectives
Section titled “Learning objectives”- Understand the difference between instance segmentation and semantic segmentation
- Understand why “category” and “instance” are two different levels
- Build intuition for instance masks through runnable examples
- Understand why instance segmentation is closer to real-world visual scenes
First, build a map
Section titled “First, build a map”For beginners, the best way to understand instance segmentation is not “it’s just one more segmentation task,” but to first see clearly:
flowchart LR A["Semantic segmentation"] --> B["Only separates categories"] C["Instance segmentation"] --> D["Also separates individual objects of the same category"] D --> E["Better for counting, tracking, and interaction scenarios"]So what this section really wants to solve is:
- Why “the category is correct” is still not enough
- Why “separating same-category individuals” significantly increases the difficulty
What does instance segmentation add beyond semantic segmentation?
Section titled “What does instance segmentation add beyond semantic segmentation?”Semantic segmentation:
- Only separates categories
Instance segmentation:
- Category + individual distinction
In other words, two “person” objects in the image should not be merged into one whole.
Three things beginners should distinguish first
Section titled “Three things beginners should distinguish first”When learning instance segmentation for the first time, the most important things to remember are:
- Semantic segmentation answers “what category is this”
- Instance segmentation also answers “which individual is this”
- So the latter is naturally closer to real multi-object scenarios
A comparison table that is easier for beginners
Section titled “A comparison table that is easier for beginners”When many beginners first learn this topic, it is easy to mix up classification, detection, semantic segmentation, and instance segmentation. The safest way is to place them in the same table and compare them:
| Task | What it outputs | Core question |
|---|---|---|
| Classification | One category for the whole image | What is this image overall |
| Detection | Category + box | Where is the object |
| Semantic segmentation | Category mask | Which pixels belong to which category |
| Instance segmentation | Category mask + individual distinction | How do we separate same-category objects one by one |
This table is especially valuable because it helps you immediately see:
- Instance segmentation is not just “more detailed semantic segmentation”
- It actually handles pixel-level understanding and instance-level distinction at the same time
Start with a minimal instance mask example
Section titled “Start with a minimal instance mask example”instance_map = [ [0, 1, 1, 0], [0, 1, 1, 2], [0, 0, 0, 2],]
def pixels_of_instance(instance_map, target_id): pixels = [] for r, row in enumerate(instance_map): for c, value in enumerate(row): if value == target_id: pixels.append((r, c)) return pixels
print("instance 1:", pixels_of_instance(instance_map, 1))print("instance 2:", pixels_of_instance(instance_map, 2))Expected output:
instance 1: [(0, 1), (0, 2), (1, 1), (1, 2)]instance 2: [(1, 3), (2, 3)]The two instance IDs are not class labels; they are individual object IDs. Both could belong to the same class, but the system still needs to keep them separate.
What is the most important thing in this example?
Section titled “What is the most important thing in this example?”It shows that instance segmentation does not just output a category ID, it also distinguishes:
- the 1st instance
- the 2nd instance
This is very important in counting, tracking, and interaction scenarios.
Why is instance segmentation especially suitable for security and autonomous driving?
Section titled “Why is instance segmentation especially suitable for security and autonomous driving?”Because in these scenarios, the system often does not only care about:
- whether there is a person in the scene
It cares more about:
- how many people there are
- which targets are very close to each other
- whether these individuals can be tracked afterward
In other words, instance segmentation is naturally closer to a visual representation for downstream decision-making.
Look at another minimal “count + area” example
Section titled “Look at another minimal “count + area” example”One reason instance segmentation is so valuable in real systems is that it can not only tell you “how many targets there are,” but also make it easy to compute:
- how large each target is
- which target is closer to the boundary
- which targets overlap with each other
The example below gives you a minimal way to feel this “instance-level statistics”:
instance_map = [ [0, 1, 1, 0, 2], [0, 1, 1, 0, 2], [0, 0, 0, 0, 2],]
def instance_area(instance_map, target_id): area = 0 for row in instance_map: for value in row: if value == target_id: area += 1 return area
for target_id in [1, 2]: print(target_id, "area =", instance_area(instance_map, target_id))Expected output:
1 area = 42 area = 3Once each object has its own mask, you can count objects and compute area. That is one reason instance segmentation is valuable in inspection, tracking, and planning systems.
The most important thing to grasp here is not the code itself, but that:
- once individuals are separated
- many statistics and decisions naturally become available afterward

The most common pitfalls
Section titled “The most common pitfalls”Adjacent same-category instances easily stick together
Section titled “Adjacent same-category instances easily stick together”This is a very common error in instance segmentation.
Small instances are harder
Section titled “Small instances are harder”The smaller and denser the objects are, the harder they are to separate.
Evaluation is more complex than semantic segmentation
Section titled “Evaluation is more complex than semantic segmentation”Because now you not only need to check mask quality, you also need to check whether the instances are correctly separated.
The safest default workflow for your first instance segmentation project
Section titled “The safest default workflow for your first instance segmentation project”When you first bring instance segmentation into a project, it is better to move forward in this order:
- First confirm that the task really needs same-category objects to be separated
- First inspect a small number of samples manually to see whether instance boundaries are clear
- First build a visualization baseline to see whether instances are being merged
- Then check whether mask quality and instance separation both hold
- Finally consider more complex models and finer-grained evaluation
This is safer than chasing a complex network from the start, because the biggest problems in instance segmentation are often not:
- the model is not complex enough
but rather:
- the annotation boundaries themselves are unclear
- the task requirements are not defined clearly
The right expectation when learning this section for the first time
Section titled “The right expectation when learning this section for the first time”The most important thing in this section is not to master a complex instance segmentation network today, but to clearly understand:
- Why semantic segmentation and instance segmentation are not the same thing
- Why neighboring same-category objects become the real difficulty
- Why, once this task is done well, it becomes especially valuable for counting, interaction, and tracking
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Input Image
- original image and target mask or class map
- Prediction
- predicted mask, overlay visualization, and boundary examples
- Metric
- IoU, Dice, per-class score, and boundary failure notes
- Failure Check
- annotation quality, thin boundary, small region, or class confusion
- Expected Output
- mask overlay plus segmentation metric summary
Summary
Section titled “Summary”The most important idea in this section is to build one judgment:
Instance segmentation solves one more layer of the problem than semantic segmentation: how to distinguish among objects of the same category, so it is closer to real multi-object visual scenarios.
What you should take away from this section
Section titled “What you should take away from this section”- Instance segmentation adds “individual separation” on top of semantic segmentation
- The difficulty is often not the category, but the boundary between neighboring same-category objects
- If downstream tasks need counting, tracking, or interaction, instance segmentation is often especially valuable
Exercises
Section titled “Exercises”- Create a larger
instance_mapby yourself and mark 3 instances. - Why is instance segmentation harder than semantic segmentation?
- If two adjacent objects are always merged into one instance, what would you suspect first?
- Think about it: why is instance segmentation especially valuable in autonomous driving or security scenarios?
Solution approach and explanation
- A correct
instance_mapusually uses0for background and a different integer id for each object instance. Three objects should have three distinct ids. - Instance segmentation is harder because the model must classify pixels and separate different objects of the same class.
- If adjacent objects are always merged, first suspect boundary annotation, image resolution, post-processing, or whether the model has enough separation cues.
- Instance segmentation is valuable when the system must count, track, or act on individual objects, such as pedestrians, vehicles, packages, or people in security scenes.