Skip to main content

3.2.4 Array Operations

NumPy Broadcasting and Vectorization Diagram

Learning Objectives

  • Understand the concept and advantages of vectorized operations
  • Master element-wise operations and universal functions (ufuncs)
  • Understand the rules of Broadcasting
  • Use aggregation functions confidently for statistical calculations

Vectorized Operations: Say Goodbye to Loops

Vectorized operations are a core idea in NumPy — operate on the entire array without writing loops.

Pure Python vs NumPy

import numpy as np

# Pure Python: compute one by one
prices = [100, 200, 300, 400, 500]
discounted = []
for p in prices:
discounted.append(p * 0.8)
print(discounted) # [80.0, 160.0, 240.0, 320.0, 400.0]

# NumPy: one line does it all
prices = np.array([100, 200, 300, 400, 500])
discounted = prices * 0.8
print(discounted) # [ 80. 160. 240. 320. 400.]

Element-wise Operations

Arithmetic operations on NumPy arrays are performed element by element:

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print(a + b) # [11 22 33 44] add corresponding elements
print(a - b) # [ -9 -18 -27 -36]
print(a * b) # [ 10 40 90 160] multiply corresponding elements (not matrix multiplication!)
print(a / b) # [0.1 0.1 0.1 0.1]
print(a ** 2) # [ 1 4 9 16] square
print(b % 3) # [1 2 0 1] remainder
print(b // 3) # [ 3 6 10 13] integer division

Operations with Scalars

When an array is operated on with a single number (a scalar), NumPy automatically applies the scalar to every element:

arr = np.array([10, 20, 30, 40])

print(arr + 5) # [15 25 35 45]
print(arr * 2) # [20 40 60 80]
print(arr / 10) # [1. 2. 3. 4.]
print(1 / arr) # [0.1 0.05 0.033 0.025]

Comparison Operations

arr = np.array([15, 23, 8, 42, 31])

print(arr > 20) # [False True False True True]
print(arr == 23) # [False True False False False]
print(arr != 8) # [ True True False True True]

Universal Functions (ufuncs)

NumPy provides many universal functions that apply mathematical operations to each element in an array:

Common Mathematical Functions

arr = np.array([1, 4, 9, 16, 25])

# Square root
print(np.sqrt(arr)) # [1. 2. 3. 4. 5.]

# Absolute value
neg = np.array([-3, -1, 0, 2, 5])
print(np.abs(neg)) # [3 1 0 2 5]

# Power
print(np.power(arr, 0.5)) # same as sqrt

# Exponential and logarithms
print(np.exp([0, 1, 2])) # [1. 2.718 7.389] e raised to a power
print(np.log([1, np.e, 10])) # [0. 1. 2.303] natural logarithm
print(np.log10([1, 10, 100])) # [0. 1. 2.] base 10
print(np.log2([1, 2, 8, 64])) # [0. 1. 3. 6.] base 2

Trigonometric Functions

# Create angles from 0 to 2π
angles = np.linspace(0, 2 * np.pi, 5) # [0, π/2, π, 3π/2, 2π]

print(np.sin(angles)) # [ 0. 1. 0. -1. 0.] ← sine
print(np.cos(angles)) # [ 1. 0. -1. 0. 1.] ← cosine

Rounding Functions

arr = np.array([1.2, 2.5, 3.7, -1.3, -2.8])

print(np.floor(arr)) # [ 1. 2. 3. -2. -3.] round down
print(np.ceil(arr)) # [ 2. 3. 4. -1. -2.] round up
print(np.round(arr)) # [ 1. 2. 4. -1. -3.] round to nearest
print(np.trunc(arr)) # [ 1. 2. 3. -1. -2.] truncate decimals

Operations Between Two Arrays

a = np.array([3, 5, 7, 9])
b = np.array([1, 4, 2, 8])

print(np.maximum(a, b)) # [3 5 7 9] take the larger value at each position
print(np.minimum(a, b)) # [1 4 2 8] take the smaller value at each position
print(np.where(a > b, a, b)) # same as maximum, but more flexible

Broadcasting

When arrays with different shapes are operated on, NumPy automatically "broadcasts" the smaller array so their shapes become compatible.

The Simplest Example

arr = np.array([1, 2, 3])

# scalar + array → scalar is broadcast to [10, 10, 10]
print(arr + 10) # [11 12 13]

This is broadcasting in action — NumPy expands 10 to [10, 10, 10], then adds element by element.

2D Array + 1D Array

matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

row = np.array([10, 20, 30])

# row is broadcast to every row
result = matrix + row
print(result)
# [[11 22 33]
# [14 25 36]
# [17 28 39]]

You can think about broadcasting like this:

matrix:         row (before broadcasting):      row (after broadcasting):
[[1, 2, 3], [10, 20, 30] → [[10, 20, 30],
[4, 5, 6], [10, 20, 30],
[7, 8, 9]] [10, 20, 30]]

Column Vector + Row Vector

col = np.array([[1], [2], [3]])    # shape: (3, 1) column vector
row = np.array([10, 20, 30]) # shape: (3,) row vector

# both are broadcast
result = col + row
print(result)
# [[11 21 31]
# [12 22 32]
# [13 23 33]]

Broadcasting Rules

Simple rule to remember: Compare dimensions from back to front — they must either be equal, or one of them must be 1.

# ✅ Can broadcast
# (3, 4) + (4,) → (3, 4) the last dimension is 4 in both
# (3, 4) + (1, 4) → (3, 4) first dimension 3 and 1 → broadcast to 3
# (3, 1) + (1, 4) → (3, 4) both dimensions are broadcast

# ❌ Cannot broadcast
# (3, 4) + (3,) → error! last dimension 4 ≠ 3, and neither is 1

Real-World Uses of Broadcasting

# Standardize data: subtract the mean of each column from that column
data = np.array([
[85, 170, 60],
[92, 180, 75],
[78, 165, 55],
[90, 175, 70]
]) # 4 students: score, height, weight

# Compute the mean of each column
col_mean = data.mean(axis=0) # [86.25 172.5 65. ] shape: (3,)

# Broadcasting: (4, 3) - (3,) → (4, 3)
centered = data - col_mean
print(centered)
# [[-1.25 -2.5 -5. ]
# [ 5.75 7.5 10. ]
# [-8.25 -7.5 -10. ]
# [ 3.75 2.5 5. ]]

Aggregation Functions

Aggregation functions "summarize" a group of data into one value or a small set of values:

Common Aggregation Functions

arr = np.array([4, 7, 2, 9, 1, 5, 8, 3, 6])

print(np.sum(arr)) # 45 total
print(np.mean(arr)) # 5.0 mean
print(np.median(arr)) # 5.0 median
print(np.std(arr)) # 2.58 standard deviation
print(np.var(arr)) # 6.67 variance
print(np.min(arr)) # 1 minimum
print(np.max(arr)) # 9 maximum
print(np.argmin(arr)) # 4 index of minimum
print(np.argmax(arr)) # 3 index of maximum
print(np.cumsum(arr)) # [ 4 11 13 22 23 28 36 39 45] cumulative sum
print(np.cumprod(arr[:5])) # [ 4 28 56 504 504] cumulative product

Aggregation Along an Axis

For multi-dimensional arrays, the axis parameter controls which direction to aggregate along:

matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

# No axis specified: aggregate all elements
print(np.sum(matrix)) # 45

# axis=0: along the row direction (aggregate by column) — compress top to bottom
print(np.sum(matrix, axis=0)) # [12 15 18]

# axis=1: along the column direction (aggregate by row) — compress left to right
print(np.sum(matrix, axis=1)) # [ 6 15 24]

A helpful way to understand axisaxis=0 removes rows, axis=1 removes columns:

Original (3, 3):
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]

axis=0 (remove rows → result shape=(3,)):
[1+4+7, 2+5+8, 3+6+9] = [12, 15, 18]

axis=1 (remove columns → result shape=(3,)):
[1+2+3, 4+5+6, 7+8+9] = [6, 15, 24]

Practice: Grade Analysis

# Scores for 5 students in 3 subjects
scores = np.array([
[85, 92, 78], # Student 1: Chinese, Math, English
[90, 88, 95], # Student 2
[72, 65, 80], # Student 3
[95, 98, 92], # Student 4
[60, 55, 70] # Student 5
])

subjects = ["Chinese", "Math", "English"]

# Total score for each student
total = np.sum(scores, axis=1)
print("Total score for each student:", total) # [255 273 217 285 185]

# Average score for each student
avg_per_student = np.mean(scores, axis=1)
print("Average score for each student:", avg_per_student)

# Average score for each subject
avg_per_subject = np.mean(scores, axis=0)
for sub, avg in zip(subjects, avg_per_subject):
print(f" {sub} average score: {avg:.1f}")

# Who got the highest score, and in which subject
max_idx = np.unravel_index(np.argmax(scores), scores.shape)
print(f"Highest score: {scores[max_idx]} (Student {max_idx[0]+1}'s {subjects[max_idx[1]]})")

# Which student has the highest total score
best_student = np.argmax(total)
print(f"Highest total score: Student {best_student + 1}, total {total[best_student]}")

np.where: Conditional Selection

np.where is NumPy's version of the ternary expression:

arr = np.array([85, 42, 91, 67, 55, 78])

# Mark passing scores as "PASS" and failing scores as "FAIL"
result = np.where(arr >= 60, "PASS", "FAIL")
print(result) # ['PASS' 'FAIL' 'PASS' 'PASS' 'FAIL' 'PASS']

# Raise failing scores to 60
adjusted = np.where(arr >= 60, arr, 60)
print(adjusted) # [85 60 91 67 60 78]

Summary

CategoryContentExample
Vectorized operationsOperate on the whole array at once, no loops neededarr * 2, a + b
Universal functionsElement-wise mathematical functionsnp.sqrt(), np.exp(), np.log()
BroadcastingAutomatically expand arrays with different shapes(3,4) + (4,)(3,4)
Aggregation functionsStatistical summariesnp.sum(), np.mean(), np.std()
axis parameterControls aggregation directionaxis=0 by column, axis=1 by row
np.whereConditional selectionnp.where(arr > 0, arr, 0)

Hands-On Exercises

Exercise 1: Vectorized Calculation

# Convert Fahrenheit to Celsius
# Formula: C = (F - 32) × 5/9
import numpy as np

fahrenheit = np.array([32, 68, 100, 212, 72, 98.6])

# Complete the conversion in one line using vectorized operations
celsius = (fahrenheit - 32) * 5 / 9

Exercise 2: Broadcasting Practice

# Original prices of 3 products
import numpy as np

prices = np.array([100, 200, 300])

# 3 discount rates (column vector)
discounts = np.array([[0.9], [0.8], [0.7]])

# Use broadcasting to calculate the price of each product under each discount (3×3 matrix)
final_prices = discounts * prices
# Expected result:
# [[ 90. 180. 270.]
# [ 80. 160. 240.]
# [ 70. 140. 210.]]

Exercise 3: Grade Statistics

# Generate random scores for 50 students (between 40 and 100)
rng = np.random.default_rng(seed=42)
scores = rng.integers(40, 101, size=50)

# 1. Compute the mean, median, and standard deviation
# 2. Find the highest score, lowest score, and their positions
# 3. Count how many students fall into each range: failing (<60), passing (60-69), average (70-79), good (80-89), excellent (90+)
# 4. Calculate the passing rate