Skip to main content

6.3.1 CNN Roadmap: Turn Images Into Feature Maps

CNNs learn local visual patterns. Instead of reading an image as one flat row of numbers, they scan small regions and build feature maps.

Look at the Image Flow First

CNN chapter relationship diagram

CNN receptive field growth map

ConceptFirst meaning
channelcolor or learned feature dimension
kernelsmall sliding filter
feature mapoutput after filters scan the image
pooling / strideshrink spatial size
transfer learningreuse a pretrained vision backbone

Run One Convolution

Create cnn_first_loop.py and run it after installing torch.

import torch

image = torch.randn(1, 3, 32, 32)
conv = torch.nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
features = conv(image)

print("input_shape:", tuple(image.shape))
print("feature_shape:", tuple(features.shape))

Expected output:

input_shape: (1, 3, 32, 32)
feature_shape: (1, 8, 32, 32)

Read the shape as [batch, channels, height, width]. The convolution changed 3 input channels into 8 learned feature channels.

Learn in This Order

OrderReadWhat to practice
16.3.2 Convolution Basicskernel, stride, padding, channel
26.3.3 CNN Structureconv block, pooling, classifier head
36.3.4 Classic ArchitecturesLeNet, AlexNet, VGG, ResNet intuition
46.3.5 Transfer Learningfrozen backbone, fine-tuning
56.3.6 Image Classification Practicedataset, training, prediction examples

Pass Check

You pass this roadmap when you can explain what changed between input image shape and feature map shape, and why pretrained CNN backbones are useful for small datasets.