Skip to main content

5.2.1 Supervised Learning Roadmap: Learn From Labeled Examples

Supervised learning answers one question: when examples already have labels, how do we learn a model that predicts labels for new examples?

Look at the Model Choice Map First

Supervised Learning Roadmap

Supervised Learning Chapter Flow

Model familyFirst use
linear regressionpredict a continuous number
logistic regressionclassify with a simple probability model
decision treesplit data with readable rules
ensemble modelscombine many models for stronger tabular baselines
SVMlearn a stable boundary with margin intuition

Run One Regression Baseline

Create supervised_first_loop.py and run it after installing scikit-learn.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = LinearRegression().fit(X_train, y_train)
predictions = model.predict(X_test)

print("task: regression")
print("r2:", round(r2_score(y_test, predictions), 3))
print("first_prediction:", round(predictions[0], 1))

Expected output:

task: regression
r2: 0.485
first_prediction: 137.9

The score is not perfect, and that is useful. A baseline tells you where later models or feature work must improve.

Learn in This Order

OrderReadWhat to compare
15.2.2 Linear Regressionsimple numeric prediction
25.2.3 Logistic Regressionclassification probability
35.2.4 Decision Treesrules, nonlinearity, overfitting
45.2.5 Ensemble Learningbagging, boosting, stronger tabular models
55.2.6 Support Vector Machinesmargin, boundary, classic classifier intuition

Pass Check

You pass this roadmap when you can decide whether a labeled task is regression or classification, run one baseline, and explain one reason the model may fail.