Skip to main content

6.7.2 Hyperparameter Tuning Strategy

Section Overview

Hyperparameter tuning is experiment design. Change one important thing, keep a log, compare validation evidence, then decide the next move.

Learning Objectives

  • Tune in a stable order instead of changing everything at once.
  • Run a small learning-rate sweep in PyTorch.
  • Read validation loss, validation accuracy, and training stability together.
  • Record experiment evidence in a reusable table.
  • Decide when to tune learning rate, batch size, regularization, or early stopping.

Use the Route First

Deep learning tuning and diagnosis route

The practical order:

make training run -> tune learning rate -> check validation -> control overfitting -> refine locally

Do not start by tuning every knob. A useful tuning run should answer one question.

QuestionParameter to try firstWhat to watch
Does the model learn at all?learning ratetrain loss trend
Is training unstable?learning rate, gradient clipping, batch sizespikes or divergence
Is validation worse than training?weight decay, dropout, augmentation, early stoppinggeneralization gap
Is training too slow?batch size, model size, precisiontime and memory
Is deployment too heavy?architecture, pruning, quantizationlatency and size

Lab: Run a Learning-Rate Sweep

This toy classification task is small enough to run quickly, but it shows the workflow.

Create lr_sweep.py:

import torch
from torch import nn

torch.manual_seed(11)

X = torch.randn(240, 2)
y = ((X[:, 0] * 0.8 + X[:, 1] * -0.5) > 0).long()

train_x, val_x = X[:180], X[180:]
train_y, val_y = y[:180], y[180:]


def run(lr):
torch.manual_seed(123)
model = nn.Sequential(nn.Linear(2, 8), nn.ReLU(), nn.Linear(8, 2))
opt = torch.optim.SGD(model.parameters(), lr=lr)
loss_fn = nn.CrossEntropyLoss()

for _ in range(40):
logits = model(train_x)
loss = loss_fn(logits, train_y)
opt.zero_grad()
loss.backward()
opt.step()

with torch.no_grad():
train_loss = loss_fn(model(train_x), train_y).item()
val_logits = model(val_x)
val_loss = loss_fn(val_logits, val_y).item()
val_acc = (val_logits.argmax(dim=1) == val_y).float().mean().item()

return train_loss, val_loss, val_acc


results = []
for lr in [1e-3, 1e-2, 1e-1, 1.0, 10.0]:
train_loss, val_loss, val_acc = run(lr)
results.append((lr, train_loss, val_loss, val_acc))

print("lr_sweep")
for lr, train_loss, val_loss, val_acc in results:
print(
f"lr={lr:g} "
f"train_loss={train_loss:.3f} "
f"val_loss={val_loss:.3f} "
f"val_acc={val_acc:.3f}"
)

best = min(results, key=lambda row: row[2])
print("best_lr:", best[0])

Run it:

python lr_sweep.py

Expected output:

lr_sweep
lr=0.001 train_loss=0.763 val_loss=0.733 val_acc=0.450
lr=0.01 train_loss=0.675 val_loss=0.663 val_acc=0.533
lr=0.1 train_loss=0.340 val_loss=0.373 val_acc=0.967
lr=1 train_loss=0.053 val_loss=0.072 val_acc=0.983
lr=10 train_loss=0.280 val_loss=0.291 val_acc=0.883
best_lr: 1.0

Read it carefully:

  • 0.001 and 0.01 are too slow for this budget.
  • 0.1 and 1.0 learn well.
  • 10.0 is worse even though it still trains, so “larger” is not automatically better.
  • The best choice is based on validation loss here, not training loss.

What to Tune Next

Hyperparameter tuning search diagram

After a reasonable learning rate, tune in this order:

  1. Batch size: adjust memory use, speed, and gradient noise.
  2. Epochs and early stopping: stop when validation stops improving.
  3. Weight decay and dropout: reduce overfitting.
  4. Architecture size: change capacity only after the loop is stable.
  5. Optimizer details: tune betas, scheduler, warmup, or momentum when needed.

The rule:

global search first, local refinement later

A Minimal Experiment Log

Use a log even for small projects.

experiment_id:
code_version:
data_version:
seed:
lr:
batch_size:
optimizer:
weight_decay:
dropout:
epochs:
best_val_metric:
train_time:
decision:

Example decision text:

lr=1.0 gives the best validation loss in the quick sweep.
Next: keep lr=1.0 fixed and compare batch_size=32 vs 64.

Diagnosis Patterns

PatternLikely causeNext experiment
train loss does not moveLR too low, model too small, bad labelsraise LR, inspect data, try larger model
train loss explodesLR too high, gradients unstablelower LR, add clipping
train good, validation pooroverfitting or leakageadd regularization, check split
validation improves then worsensoverfitting after best epochearly stopping
results change a lot by seedunstable training or small datarun 3 seeds and report mean/std

Common Mistakes

MistakeFix
changing LR, batch size, optimizer, and model togetherchange one main variable per experiment
choosing by training metricuse validation metric for model selection
ignoring runtimetrack time and memory, not only accuracy
trusting one lucky seedrepeat important runs with multiple seeds
tuning before data is cleaninspect labels, leakage, and preprocessing first

Exercises

  1. Add lr=0.3 and lr=3.0 to the sweep. Which is closer to the best region?
  2. Change the training budget from 40 steps to 10 steps. Does the best LR change?
  3. Add a seed column by running each LR with two seeds.
  4. Write a decision line for the next experiment after the LR sweep.
  5. Explain why tuning is easier when each experiment answers one question.

Key Takeaways

  • Tuning is controlled experiment design, not guessing.
  • Learning rate is usually the first knob to test.
  • Validation evidence should drive decisions.
  • Logs make experiments reproducible and interpretable.
  • Tune broad settings first, then refine locally.