Skip to content

5.2.5 Ensemble Learning: Forest, Boosting, Stacking

Ensemble Learning Bagging vs Boosting Comparison

Ensemble learning combines several models so one model’s weakness is less likely to dominate the final prediction. For tabular data, this is often the strongest classic ML family.

Ensemble Learning Family Comic

Do not memorize every model name first. Separate the three main paths:

  • Bagging, such as Random Forest: many models train in parallel and vote. Use it when you want stability and lower variance. Watch out for larger, harder-to-explain models.
  • Boosting, such as GBDT, XGBoost, LightGBM, and CatBoost: each new model focuses on previous errors. Use it when tabular accuracy matters. Control depth, learning rate, and early stopping to avoid overfitting.
  • Stacking, such as StackingClassifier: base model predictions feed a meta-model. Use it when different model families have complementary strengths. Build it with cross-validation to avoid leakage.

Create ch05_ensemble_lab.py.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data,
data.target,
test_size=0.25,
random_state=42,
stratify=data.target,
)
models = {
"single_tree": DecisionTreeClassifier(max_depth=4, random_state=42),
"random_forest": RandomForestClassifier(
n_estimators=200,
max_depth=6,
random_state=42,
),
"gradient_boost": GradientBoostingClassifier(
n_estimators=120,
learning_rate=0.05,
max_depth=2,
random_state=42,
),
}
models["stacking_cv"] = StackingClassifier(
estimators=[
("rf", models["random_forest"]),
("gb", models["gradient_boost"]),
("lr", make_pipeline(
StandardScaler(),
LogisticRegression(max_iter=2000, random_state=42),
)),
],
final_estimator=LogisticRegression(max_iter=2000, random_state=42),
cv=5,
)
for name, model in models.items():
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(f"{name:<15} accuracy={accuracy_score(y_test, pred):.3f} f1={f1_score(y_test, pred):.3f}")
rf = models["random_forest"]
importances = rf.feature_importances_
top = importances.argsort()[-3:][::-1]
print("top_rf_features=")
for idx in top:
print(f"- {data.feature_names[idx]}: {importances[idx]:.3f}")

Run:

Terminal window
python ch05_ensemble_lab.py

Expected output:

Terminal window
single_tree accuracy=0.944 f1=0.956
random_forest accuracy=0.958 f1=0.967
gradient_boost accuracy=0.944 f1=0.956
stacking_cv accuracy=0.986 f1=0.989
top_rf_features=
- worst perimeter: 0.146
- worst area: 0.140
- worst concave points: 0.109

Ensemble comparison lab result map

Small score changes across sklearn versions are acceptable. Keep the comparison table and top features as project evidence.

The single tree is the baseline. Random Forest usually improves stability by averaging many different trees.

Boosting is not automatically better in every small dataset. It needs careful control of tree depth, learning rate, number of trees, and validation performance.

Stacking can win here because it combines different model families, but it must use cross-validation. Training the meta-model on predictions made on the same rows used to fit the base models leaks information.

Ensemble Learning Voting and Forest Diagram

Random Forest trains many decision trees on randomized views of the data and averages/votes their predictions.

Good first settings:

ParameterWhat it controlsBeginner default
n_estimatorsnumber of trees100 to 300
max_depthtree depthstart small, then increase
min_samples_leafminimum samples in a leafincrease if overfitting
random_statereproducibilityalways set it while learning

GBDT residual correction comic

Boosting builds models in sequence:

first small treefind errorsnext small tree focuses on errorsrepeat

In sklearn, start with GradientBoostingClassifier or HistGradientBoostingClassifier. In real tabular projects, XGBoost, LightGBM, and CatBoost are common external libraries, but do not add them before the sklearn baseline is clear.

Boosting toolkit model choice comic

First tuning order for boosting:

StepChangeWhy
1learning_rate and n_estimatorscontrols step size and training length
2max_depth / leaf settingscontrols model complexity
3validation or early stoppingstops overfitting
4feature preprocessingimproves signal quality

Stacking leakage-safe workflow comic

Stacking is powerful only if the meta-model sees out-of-fold predictions:

train base models in CV foldscollect out-of-fold predictionstrain meta-modelevaluate on holdout test

Use sklearn’s StackingClassifier(cv=5) instead of manually reusing predictions from the training rows.

SituationStart with
need a strong, stable baselineRandom Forest
tabular data with many nonlinear patternsGradient Boosting / XGBoost / LightGBM
categorical-heavy tabular dataCatBoost, after baseline
several model families perform differentlyStacking with cross-validation
need easiest explanationshallow tree or Random Forest feature importance

Keep this page’s proof of learning as a small evidence card:

Task
regression or classification problem with target definition
Model
linear/logistic/tree/ensemble/SVM configuration and train/test split
Metric
regression error, accuracy/F1, threshold curve, or confusion matrix
Failure Check
overfitting, underfitting, feature scaling, threshold choice, or class imbalance
Expected Output
model result plus error samples or residual review
SymptomFirst checkUsual fix
ensemble barely beats one treefeatures are weak or split is unstableadd features, use cross-validation
train score high, test score lowoverfittinglower depth, increase leaf size, add validation
boosting gets worse as trees increasetoo many roundsreduce learning rate or use early stopping
stacking looks unrealistically perfectleakageuse out-of-fold predictions or StackingClassifier(cv=...)
feature importance overreadcorrelated featuresvalidate with permutation importance or ablation
  1. Change Random Forest max_depth from 6 to 3 and None.
  2. Change Gradient Boosting learning_rate from 0.05 to 0.2.
  3. Remove cv=5 from your mental model and explain why stacking would leak without cross-validation.
  4. Save a model comparison table and one paragraph explaining which model you would ship first.
Reference implementation and walkthrough
  1. max_depth=3 limits each tree and can reduce overfitting. None allows deeper trees, often improving training score while risking worse validation behavior.
  2. A higher boosting learning rate learns faster but can overshoot or overfit. Check validation score, not just training score.
  3. Stacking leaks when the meta-model learns from base-model predictions on rows those base models already trained on. Cross-validation creates out-of-fold predictions that are closer to real deployment.
  4. A shipping decision should name the model, validation metric, complexity, failure risk, and monitoring plan. The best first model is often the simplest one that meets the target metric reliably.

You are ready to continue when you can explain:

  • the difference between Bagging and Boosting;
  • why Random Forest is usually safer than one tree;
  • why Boosting needs validation control;
  • why Stacking must use cross-validation;
  • why the best leaderboard score is not always the best production choice.