6.3.1 CNN Roadmap: Turn Images Into Feature Maps
CNNs learn local visual patterns. Instead of reading an image as one flat row of numbers, they scan small regions and build feature maps.
Look at the Image Flow First
Section titled “Look at the Image Flow First”

| Concept | First meaning |
|---|---|
| channel | color or learned feature dimension |
| kernel | small sliding filter |
| feature map | output after filters scan the image |
| pooling / stride | shrink spatial size |
| transfer learning | reuse a pretrained vision backbone |
Run One Convolution
Section titled “Run One Convolution”Create cnn_first_loop.py and run it after installing torch.
import torch
image = torch.randn(1, 3, 32, 32)conv = torch.nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)features = conv(image)
print("input_shape:", tuple(image.shape))print("feature_shape:", tuple(features.shape))Expected output:
input_shape: (1, 3, 32, 32)feature_shape: (1, 8, 32, 32)Read the shape as [batch, channels, height, width]. The convolution changed 3 input channels into 8 learned feature channels.
Learn in This Order
Section titled “Learn in This Order”| Order | Read | What to practice |
|---|---|---|
| 1 | 6.3.2 Convolution Basics | kernel, stride, padding, channel |
| 2 | 6.3.3 CNN Structure | conv block, pooling, classifier head |
| 3 | 6.3.4 Classic Architectures | LeNet, AlexNet, VGG, ResNet intuition |
| 4 | 6.3.5 Transfer Learning | frozen backbone, fine-tuning |
| 5 | 6.3.6 Image Classification Practice | dataset, training, prediction examples |
Evidence to Keep
Section titled “Evidence to Keep”Keep one CNN shape note:
- Input
- [batch, channels, height, width]
- Conv Output
- out_channels becomes new feature maps
- Spatial Change
- stride/padding/pooling change height and width
- Classifier Bridge
- conv features eventually become class logits
- Transfer Choice
- freeze first, fine-tune only if validation improves
Pass Check
Section titled “Pass Check”You pass this roadmap when you can explain what changed between input image shape and feature map shape, and why pretrained CNN backbones are useful for small datasets.
Check reasoning and explanation
- A passing answer connects tensors, model layers, loss,
backward(), and optimizer updates into one training loop. - The evidence should include a runnable mini experiment, tensor-shape checks, and a loss or validation curve you can explain.
- A good self-check names one failure mode such as shape mismatch, no loss decrease, overfitting, data leakage, or using Attention/Transformer words without explaining the data flow.