Skip to content

10.1.1 Vision Basics Roadmap: Pixels, Channels, Processing

Computer vision starts with input intuition. Before classification, detection, or segmentation, you need to know what an image looks like as numbers.

Vision basics chapter learning flow

Pixel RGB grid diagram

Image array shape and channel map

The first mental model is simple: image = height × width × channels. Most later bugs come from shape, channel order, coordinates, or color-space confusion.

This toy image has 2 rows, 3 columns, and RGB values.

image = [
[[255, 0, 0], [0, 255, 0], [0, 0, 255]],
[[255, 255, 255], [0, 0, 0], [128, 128, 128]],
]
height = len(image)
width = len(image[0])
channels = len(image[0][0])
top_left_pixel = image[0][0]
print("shape:", (height, width, channels))
print("top_left_pixel:", top_left_pixel)

Expected output:

Terminal window
shape: (2, 3, 3)
top_left_pixel: [255, 0, 0]

If your code reads a real image with the wrong shape or channel order, every later model result becomes harder to trust.

StepReadPractice Output
1Image representationExplain pixel, channel, height, width, RGB/BGR
2OpenCV basicsLoad, view, crop, resize, and save an image
3Basic processingTry grayscale, threshold, blur, edge, and simple filters

Keep this page’s proof of learning as a small evidence card:

Input Image
source image or synthetic image used in the run
Array Shape
width, height, channels, dtype, and coordinate convention
Processed Output
grayscale, crop, edge, threshold, or saved intermediate image
Failure Check
channel order, resize distortion, coordinate mistake, or over-processing
Expected Output
before/after image plus the printed shape or pixel values

You pass this chapter when you can inspect an image shape, crop a region by coordinates, explain channel order, and save one processed result for your README.

Check reasoning and explanation
  1. A passing answer maps the task to the right visual output: class label, bounding box, mask, OCR text, embedding, or video event.
  2. The evidence should include a rendered visual artifact and one metric or qualitative error note.
  3. A good self-check names one visual failure mode such as class confusion, missed objects, bad masks, lighting shift, domain shift, or weak annotation quality.