Skip to content

5.1.1 Machine Learning Basics Roadmap: Task, Data, Model, Score

Machine learning starts when you stop hand-writing every rule and let a model learn patterns from data. The first habit is not algorithm memorization. It is a small project loop.

Machine Learning Basics Learning Map

Machine Learning Basics Chapter Flow

Keep this loop:

define tasksplit datafit modelpredictscoredecide next step
WordFirst meaning
featureinput column used by the model
label / targetanswer the model learns to predict
train setdata used to learn
test setdata kept aside to check generalization
baselinea simple first model used for comparison

Create ml_first_loop.py and run it after installing scikit-learn.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("task: classification")
print("test_accuracy:", round(model.score(X_test, y_test), 3))
print("prediction_count:", len(predictions))

Expected output:

Terminal window
task: classification
test_accuracy: 0.967
prediction_count: 30

This is the smallest useful machine learning loop: split first, train only on the training set, evaluate on the test set.

OrderReadWhat to practice
15.1.2 What Is Machine Learning?task types, features, labels
25.1.3 Scikit-learn Introductionfit, predict, score
35.1.4 How Math Flows Into MLvectors, probability, loss, optimization
45.1.5 Machine Learning Historywhy major algorithms appeared
55.1.6 sklearn and Matplotlib Workshoprun, plot, explain a baseline

Keep this page’s proof of learning as a small evidence card:

Ml Problem
supervised, unsupervised, evaluation, or feature-engineering task
Baseline
simplest sklearn/modeling loop and fixed train/test split
Output
prediction, metric, chart, or model decision note
Failure Check
data leakage, unclear target, weak baseline, or metric mismatch
Expected Output
minimal ML loop with metric and one failure observation

You pass this roadmap when you can name the task type, identify X and y, explain why train/test split matters, and keep one baseline score as evidence.

Check reasoning and explanation
  1. X is the feature matrix: rows are samples and columns are inputs the model can use. y is the label or target the model learns to predict.
  2. Train/test split matters because the test set simulates new data. If the model sees test information during training, the score is no longer evidence of generalization.
  3. A passing baseline record should include the task type, split settings, model, metric, and one sentence about what could still fail.