Skip to content

6.3.1 CNN Roadmap: Turn Images Into Feature Maps

CNNs learn local visual patterns. Instead of reading an image as one flat row of numbers, they scan small regions and build feature maps.

CNN chapter relationship diagram

CNN receptive field growth map

ConceptFirst meaning
channelcolor or learned feature dimension
kernelsmall sliding filter
feature mapoutput after filters scan the image
pooling / strideshrink spatial size
transfer learningreuse a pretrained vision backbone

Create cnn_first_loop.py and run it after installing torch.

import torch
image = torch.randn(1, 3, 32, 32)
conv = torch.nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
features = conv(image)
print("input_shape:", tuple(image.shape))
print("feature_shape:", tuple(features.shape))

Expected output:

Terminal window
input_shape: (1, 3, 32, 32)
feature_shape: (1, 8, 32, 32)

Read the shape as [batch, channels, height, width]. The convolution changed 3 input channels into 8 learned feature channels.

OrderReadWhat to practice
16.3.2 Convolution Basicskernel, stride, padding, channel
26.3.3 CNN Structureconv block, pooling, classifier head
36.3.4 Classic ArchitecturesLeNet, AlexNet, VGG, ResNet intuition
46.3.5 Transfer Learningfrozen backbone, fine-tuning
56.3.6 Image Classification Practicedataset, training, prediction examples

Keep one CNN shape note:

Input
[batch, channels, height, width]
Conv Output
out_channels becomes new feature maps
Spatial Change
stride/padding/pooling change height and width
Classifier Bridge
conv features eventually become class logits
Transfer Choice
freeze first, fine-tune only if validation improves

You pass this roadmap when you can explain what changed between input image shape and feature map shape, and why pretrained CNN backbones are useful for small datasets.

Check reasoning and explanation
  1. A passing answer connects tensors, model layers, loss, backward(), and optimizer updates into one training loop.
  2. The evidence should include a runnable mini experiment, tensor-shape checks, and a loss or validation curve you can explain.
  3. A good self-check names one failure mode such as shape mismatch, no loss decrease, overfitting, data leakage, or using Attention/Transformer words without explaining the data flow.