5.1.1 Machine Learning Basics Roadmap: Task, Data, Model, Score
Machine learning starts when you stop hand-writing every rule and let a model learn patterns from data. The first habit is not algorithm memorization. It is a small project loop.
Look at the Map First
Section titled “Look at the Map First”

Keep this loop:
define tasksplit datafit modelpredictscoredecide next step
| Word | First meaning |
|---|---|
| feature | input column used by the model |
| label / target | answer the model learns to predict |
| train set | data used to learn |
| test set | data kept aside to check generalization |
| baseline | a simple first model used for comparison |
Run the Smallest sklearn Loop
Section titled “Run the Smallest sklearn Loop”Create ml_first_loop.py and run it after installing scikit-learn.
from sklearn.datasets import load_irisfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y)
model = LogisticRegression(max_iter=200)model.fit(X_train, y_train)predictions = model.predict(X_test)
print("task: classification")print("test_accuracy:", round(model.score(X_test, y_test), 3))print("prediction_count:", len(predictions))Expected output:
task: classificationtest_accuracy: 0.967prediction_count: 30This is the smallest useful machine learning loop: split first, train only on the training set, evaluate on the test set.
Learn in This Order
Section titled “Learn in This Order”| Order | Read | What to practice |
|---|---|---|
| 1 | 5.1.2 What Is Machine Learning? | task types, features, labels |
| 2 | 5.1.3 Scikit-learn Introduction | fit, predict, score |
| 3 | 5.1.4 How Math Flows Into ML | vectors, probability, loss, optimization |
| 4 | 5.1.5 Machine Learning History | why major algorithms appeared |
| 5 | 5.1.6 sklearn and Matplotlib Workshop | run, plot, explain a baseline |
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Ml Problem
- supervised, unsupervised, evaluation, or feature-engineering task
- Baseline
- simplest sklearn/modeling loop and fixed train/test split
- Output
- prediction, metric, chart, or model decision note
- Failure Check
- data leakage, unclear target, weak baseline, or metric mismatch
- Expected Output
- minimal ML loop with metric and one failure observation
Pass Check
Section titled “Pass Check”You pass this roadmap when you can name the task type, identify X and y, explain why train/test split matters, and keep one baseline score as evidence.
Check reasoning and explanation
Xis the feature matrix: rows are samples and columns are inputs the model can use.yis the label or target the model learns to predict.- Train/test split matters because the test set simulates new data. If the model sees test information during training, the score is no longer evidence of generalization.
- A passing baseline record should include the task type, split settings, model, metric, and one sentence about what could still fail.