Ensemble Learning

Theory & Concept

We've covered Bagging (Random Forest) and Boosting (XGBoost). But what if you combine a Random Forest, an SVM, and a Neural Network?

Ensemble Learning relies on the idea that models make different kinds of errors. By combining them, you smooth out the biases/variances specific to each algorithm.

Types

Voting Classifier:
- Hard Voting: Majority rule (e.g., 2 models say "Cat", 1 says "Dog" -> "Cat").
- Soft Voting: Average the probabilities (e.g., (0.9 + 0.8 + 0.4)/3 = 0.7 -> "Cat"). Usually Better.
Stacking (Stacked Generalization):
- Train "Base Models" (Level 0).
- Use their predictions as input features for a "Meta Model" (Level 1).
- The Meta Model learns how to combine them best.

Quick Readiness Check

Is this method a fit for your use case?

Best For

Kaggle Competitions. Squeezing infinite last 0.1% of performance. Production systems where accuracy > latency.

Prerequisites

Ideally, base models should be DIVERSE (correlated errors defeat the purpose).

Strengths

Almost always beats single models. Robustness.

Weaknesses

Complexity nightmare. Slow to train and predict (must run all base models). Hard to debug.

Pro Tip

'Bagging vs Boosting vs Stacking?' Bagging = Parallel (variance reduction). Boosting = Sequential (bias reduction). Stacking = Hierarchical (learning to combine).

Code Snippet

from sklearn.ensemble import StackingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
 
# Define Base Models
estimators = [
    ('rf', RandomForestClassifier(n_estimators=100)),
    ('svr', SVC(probability=True))
]
 
# 1. Voting Classifier (Soft)
voting = VotingClassifier(estimators=estimators, voting='soft')
voting.fit(X_train, y_train)
 
# 2. Stacking Classifier
# Uses Logistic Regression to combine the RF and SVM predictions
stacking = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    passthrough=False # If True, feeds original X to final_estimator too
)
stacking.fit(X_train, y_train)

Parameter Tuning Cheat Sheet

Parameter	Options / Range	Effect & Best Practice
voting	hard, soft	Soft (using probabilities) almost always outperforms hard voting because it captures confidence.
final_estimator	LogisticRegression, XGBoost	The 'Meta Learner'. Keep it simple! Linear Regression or Logistic Regression is usually best to avoid overfitting.
passthrough	True, False	If True, the Meta Learner sees [Base_Preds + Original_Features]. Often leads to overfitting.