Skip to content

5.2.1 Supervised Learning Roadmap: Learn From Labeled Examples

Supervised learning answers one question: when examples already have labels, how do we learn a model that predicts labels for new examples?

Supervised Learning Roadmap

Supervised Learning Chapter Flow

Model familyFirst use
linear regressionpredict a continuous number
logistic regressionclassify with a simple probability model
decision treesplit data with readable rules
ensemble modelscombine many models for stronger tabular baselines
SVMlearn a stable boundary with margin intuition

Create supervised_first_loop.py and run it after installing scikit-learn.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = LinearRegression().fit(X_train, y_train)
predictions = model.predict(X_test)
print("task: regression")
print("r2:", round(r2_score(y_test, predictions), 3))
print("first_prediction:", round(predictions[0], 1))

Expected output:

Terminal window
task: regression
r2: 0.485
first_prediction: 137.9

The score is not perfect, and that is useful. A baseline tells you where later models or feature work must improve.

OrderReadWhat to compare
15.2.2 Linear Regressionsimple numeric prediction
25.2.3 Logistic Regressionclassification probability
35.2.4 Decision Treesrules, nonlinearity, overfitting
45.2.5 Ensemble Learningbagging, boosting, stronger tabular models
55.2.6 Support Vector Machinesmargin, boundary, classic classifier intuition

Keep this page’s proof of learning as a small evidence card:

Task
regression or classification problem with target definition
Model
linear/logistic/tree/ensemble/SVM configuration and train/test split
Metric
regression error, accuracy/F1, threshold curve, or confusion matrix
Failure Check
overfitting, underfitting, feature scaling, threshold choice, or class imbalance
Expected Output
model result plus error samples or residual review

You pass this roadmap when you can decide whether a labeled task is regression or classification, run one baseline, and explain one reason the model may fail.

Check reasoning and explanation
  1. If the label is a continuous value, start with regression. If it is a class, start with classification.
  2. The baseline can be a simple linear/logistic model or a dummy rule. Its purpose is to define the score a more complex model must beat.
  3. Common failure reasons include weak features, target leakage, class imbalance, poor scaling, overfitting, and choosing a metric that does not match the real goal.