Skip to main content

5.2.5 Ensemble Learning: Forest, Boosting, Stacking

Ensemble Learning Bagging vs Boosting Comparison

Ensemble learning combines several models so one model's weakness is less likely to dominate the final prediction. For tabular data, this is often the strongest classic ML family.

Look at the Two Main Paths

Ensemble Learning Family Comic

Do not memorize every model name first. Separate the two main ideas:

PathVisual ideaTypical modelMain benefitMain risk
Baggingmany models train in parallel and voteRandom Foreststable, less variancecan become large and less interpretable
Boostingeach new model focuses on previous errorsGBDT, XGBoost, LightGBM, CatBooststrong accuracyeasier to overfit without controls
Stackingbase model predictions feed a meta-modelStackingClassifiercombines different model familiesleakage if built without cross-validation

Run the Comparison Lab

Create ch05_ensemble_lab.py.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data,
data.target,
test_size=0.25,
random_state=42,
stratify=data.target,
)

models = {
"single_tree": DecisionTreeClassifier(max_depth=4, random_state=42),
"random_forest": RandomForestClassifier(
n_estimators=200,
max_depth=6,
random_state=42,
),
"gradient_boost": GradientBoostingClassifier(
n_estimators=120,
learning_rate=0.05,
max_depth=2,
random_state=42,
),
}

models["stacking_cv"] = StackingClassifier(
estimators=[
("rf", models["random_forest"]),
("gb", models["gradient_boost"]),
("lr", make_pipeline(
StandardScaler(),
LogisticRegression(max_iter=2000, random_state=42),
)),
],
final_estimator=LogisticRegression(max_iter=2000, random_state=42),
cv=5,
)

for name, model in models.items():
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(f"{name:<15} accuracy={accuracy_score(y_test, pred):.3f} f1={f1_score(y_test, pred):.3f}")

rf = models["random_forest"]
importances = rf.feature_importances_
top = importances.argsort()[-3:][::-1]
print("top_rf_features=")
for idx in top:
print(f"- {data.feature_names[idx]}: {importances[idx]:.3f}")

Run:

python ch05_ensemble_lab.py

Expected output:

single_tree     accuracy=0.944 f1=0.956
random_forest accuracy=0.958 f1=0.967
gradient_boost accuracy=0.944 f1=0.956
stacking_cv accuracy=0.986 f1=0.989
top_rf_features=
- worst perimeter: 0.146
- worst area: 0.140
- worst concave points: 0.109

Ensemble comparison lab result map

Small score changes across sklearn versions are acceptable. Keep the comparison table and top features as project evidence.

Read the Result

The single tree is the baseline. Random Forest usually improves stability by averaging many different trees.

Boosting is not automatically better in every small dataset. It needs careful control of tree depth, learning rate, number of trees, and validation performance.

Stacking can win here because it combines different model families, but it must use cross-validation. Training the meta-model on predictions made on the same rows used to fit the base models leaks information.

Bagging: Random Forest

Ensemble Learning Voting and Forest Diagram

Random Forest trains many decision trees on randomized views of the data and averages/votes their predictions.

Good first settings:

ParameterWhat it controlsBeginner default
n_estimatorsnumber of trees100 to 300
max_depthtree depthstart small, then increase
min_samples_leafminimum samples in a leafincrease if overfitting
random_statereproducibilityalways set it while learning

Boosting: GBDT and Toolkits

GBDT residual correction comic

Boosting builds models in sequence:

first small tree -> find errors -> next small tree focuses on errors -> repeat

In sklearn, start with GradientBoostingClassifier or HistGradientBoostingClassifier. In real tabular projects, XGBoost, LightGBM, and CatBoost are common external libraries, but do not add them before the sklearn baseline is clear.

Boosting toolkit model choice comic

First tuning order for boosting:

StepChangeWhy
1learning_rate and n_estimatorscontrols step size and training length
2max_depth / leaf settingscontrols model complexity
3validation or early stoppingstops overfitting
4feature preprocessingimproves signal quality

Stacking Safely

Stacking leakage-safe workflow comic

Stacking is powerful only if the meta-model sees out-of-fold predictions:

train base models in CV folds -> collect out-of-fold predictions -> train meta-model -> evaluate on holdout test

Use sklearn's StackingClassifier(cv=5) instead of manually reusing predictions from the training rows.

Choosing a Model

SituationStart with
need a strong, stable baselineRandom Forest
tabular data with many nonlinear patternsGradient Boosting / XGBoost / LightGBM
categorical-heavy tabular dataCatBoost, after baseline
several model families perform differentlyStacking with cross-validation
need easiest explanationshallow tree or Random Forest feature importance

Common Failures

SymptomFirst checkUsual fix
ensemble barely beats one treefeatures are weak or split is unstableadd features, use cross-validation
train score high, test score lowoverfittinglower depth, increase leaf size, add validation
boosting gets worse as trees increasetoo many roundsreduce learning rate or use early stopping
stacking looks unrealistically perfectleakageuse out-of-fold predictions or StackingClassifier(cv=...)
feature importance overreadcorrelated featuresvalidate with permutation importance or ablation

Practice

  1. Change Random Forest max_depth from 6 to 3 and None.
  2. Change Gradient Boosting learning_rate from 0.05 to 0.2.
  3. Remove cv=5 from your mental model and explain why stacking would leak without cross-validation.
  4. Save a model comparison table and one paragraph explaining which model you would ship first.

Pass Check

You are ready to continue when you can explain:

  • the difference between Bagging and Boosting;
  • why Random Forest is usually safer than one tree;
  • why Boosting needs validation control;
  • why Stacking must use cross-validation;
  • why the best leaderboard score is not always the best production choice.