Skip to content

6.7.1 Training Tips Roadmap: Diagnose Before Changing Everything

Training tips are useful only when they answer a diagnosis. Do not change optimizer, learning rate, model size, and data at the same time.

Deep learning training tips chapter relationship diagram

Training diagnosis dashboard map

SymptomFirst check
training loss highmodel too small, learning rate too low, bad data
training good, validation badoverfitting, leakage, weak augmentation
unstable losslearning rate too high, bad batch, exploding gradients
too slowbatch size, device, model size
too heavy to deploycompression, quantization, pruning

Create training_tips_first_loop.py.

val_loss = [0.62, 0.51, 0.48, 0.49, 0.53]
best_epoch = min(range(len(val_loss)), key=val_loss.__getitem__) + 1
print("best_epoch:", best_epoch)
print("best_val_loss:", val_loss[best_epoch - 1])
print("action: stop or reduce learning rate if validation keeps worsening")

Expected output:

Terminal window
best_epoch: 3
best_val_loss: 0.48
action: stop or reduce learning rate if validation keeps worsening

Training tips first loss output result map

Before adding tricks, read the curve. A simple log often tells you what to try next.

After this mini-chapter, keep one diagnosis decision record:

Visible Symptom
what did the curve or output show?
First Check
data, shape, gradient, or validation split
One Change
which single setting changed?
Before After
metric or artifact comparison
Decision
keep, tune, rollback, or investigate

The point is to make training changes reversible. If you change five things and the run improves, you still do not know which change helped.

OrderReadWhat to practice
16.7.2 Hyperparameter Tuninglearning rate, batch size, optimizer
26.7.3 Training Diagnosisloss curves, overfitting, instability
36.7.4 Model Compressionsmaller, faster, deployable models

You pass this roadmap when you can look at a training/validation curve and choose one next action with a reason.

Check reasoning and explanation
  1. A passing answer connects tensors, model layers, loss, backward(), and optimizer updates into one training loop.
  2. The evidence should include a runnable mini experiment, tensor-shape checks, and a loss or validation curve you can explain.
  3. A good self-check names one failure mode such as shape mismatch, no loss decrease, overfitting, data leakage, or using Attention/Transformer words without explaining the data flow.