Skip to main content

5.4.1 Evaluation Roadmap: Trust the Score Before Tuning

Model evaluation answers: is the model actually good, or did the score only look good by accident?

Look at the Evaluation Map First

Model Evaluation Learning Map

Chapter Flow for Model Evaluation

TopicFirst question
metricswhat score matches the task?
cross-validationis the score stable across splits?
bias-varianceis the model too simple or too flexible?
tuningwhich parameter change is actually better?

Run One Cross-Validation Check

Create evaluation_first_loop.py and run it after installing scikit-learn.

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)
model = DecisionTreeClassifier(max_depth=2, random_state=42)
scores = cross_val_score(model, X, y, cv=5)

print("fold_scores:", [float(round(score, 3)) for score in scores])
print("mean_accuracy:", round(scores.mean(), 3))

Expected output:

fold_scores: [0.933, 0.967, 0.9, 0.867, 1.0]
mean_accuracy: 0.933

One score is a snapshot. Several folds tell you whether the result is stable enough to trust.

Learn in This Order

OrderReadWhat to practice
15.4.2 Evaluation Metricsaccuracy, precision, recall, F1, R2, RMSE
25.4.3 Cross-Validationstable estimates, data split risk
35.4.4 Bias and Varianceunderfitting, overfitting, learning curves
45.4.5 Hyperparameter Tuninggrid search, comparison records

Pass Check

You pass this roadmap when you can choose a metric for the task, explain one score stability check, and avoid tuning before the evaluation method is trustworthy.