Skip to main content

6.7.1 Training Tips Roadmap: Diagnose Before Changing Everything

Training tips are useful only when they answer a diagnosis. Do not change optimizer, learning rate, model size, and data at the same time.

Look at the Diagnosis Flow First

Deep learning training tips chapter relationship diagram

Training diagnosis dashboard map

SymptomFirst check
training loss highmodel too small, learning rate too low, bad data
training good, validation badoverfitting, leakage, weak augmentation
unstable losslearning rate too high, bad batch, exploding gradients
too slowbatch size, device, model size
too heavy to deploycompression, quantization, pruning

Read a Tiny Loss Log

Create training_tips_first_loop.py.

val_loss = [0.62, 0.51, 0.48, 0.49, 0.53]
best_epoch = min(range(len(val_loss)), key=val_loss.__getitem__) + 1

print("best_epoch:", best_epoch)
print("best_val_loss:", val_loss[best_epoch - 1])
print("action: stop or reduce learning rate if validation keeps worsening")

Expected output:

best_epoch: 3
best_val_loss: 0.48
action: stop or reduce learning rate if validation keeps worsening

Before adding tricks, read the curve. A simple log often tells you what to try next.

Learn in This Order

OrderReadWhat to practice
16.7.2 Hyperparameter Tuninglearning rate, batch size, optimizer
26.7.3 Training Diagnosisloss curves, overfitting, instability
36.7.4 Model Compressionsmaller, faster, deployable models

Pass Check

You pass this roadmap when you can look at a training/validation curve and choose one next action with a reason.