Skip to main content

10.1.1 Vision Basics Roadmap: Pixels, Channels, Processing

Computer vision starts with input intuition. Before classification, detection, or segmentation, you need to know what an image looks like as numbers.

See the Image Pipeline First

Vision basics chapter learning flow

Pixel RGB grid diagram

Image array shape and channel map

The first mental model is simple: image = height × width × channels. Most later bugs come from shape, channel order, coordinates, or color-space confusion.

Run a Tiny Image Shape Check

This toy image has 2 rows, 3 columns, and RGB values.

image = [
[[255, 0, 0], [0, 255, 0], [0, 0, 255]],
[[255, 255, 255], [0, 0, 0], [128, 128, 128]],
]

height = len(image)
width = len(image[0])
channels = len(image[0][0])
top_left_pixel = image[0][0]

print("shape:", (height, width, channels))
print("top_left_pixel:", top_left_pixel)

Expected output:

shape: (2, 3, 3)
top_left_pixel: [255, 0, 0]

If your code reads a real image with the wrong shape or channel order, every later model result becomes harder to trust.

Learn in This Order

StepReadPractice Output
1Image representationExplain pixel, channel, height, width, RGB/BGR
2OpenCV basicsLoad, view, crop, resize, and save an image
3Basic processingTry grayscale, threshold, blur, edge, and simple filters

Pass Check

You pass this chapter when you can inspect an image shape, crop a region by coordinates, explain channel order, and save one processed result for your README.