Random Forest
An ensemble learning method that operates by constructing a multitude of decision trees. Solves the overfitting problem of individual trees.
Theory & Concept
Random Forest is an ensemble method that combines multiple Decision Trees to create a more robust and accurate model. It relies on the "Wisdom of Crowds"—while individual trees might be noisy or prone to overfitting, their average is often very stable.
It uses a technique called Bagging (Bootstrap Aggregating):
- Bootstrapping: Train each tree on a random sample of the data (with replacement).
- Feature Randomness: At each split, consider only a random subset of features (not all).
- Aggregating: Average the predictions (Regression) or take the majority vote (Classification).
Why "Random"?
The randomness comes from two places:
- Random sampling of data rows (Bagging).
- Random sampling of features at each split.
This de-correlates the trees, ensuring they don't all make the same mistakes.
Mathematical Intuition
If we have independent trees, each with variance , the variance of their average is:
By increasing (number of trees), we reduce the variance of the final model without increasing the bias. This is why Random Forest is harder to overfit than a single Decision Tree.
Quick Readiness Check
Quick Readiness Check
Is this method a fit for your use case?
Best For
'Set it and forget it' models. Often the best out-of-the-box algorithm for tabular data.
Prerequisites
No scaling needed. Works well with raw data.
Strengths
Accurate, robust to outliers, handles missing values, provides Feature Importance.
Weaknesses
Slow to predict (must run 100+ trees). Not interpretable (Black Box). Large model size.
Pro Tip
Key difference vs. Decision Tree: Variance Reduction. Key difference vs. GBM: Parallel Training (RF is parallel, GBM is sequential).
Code Snippet
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# 1. Initialize
# n_estimators=100 is standard. n_jobs=-1 uses all CPU cores.
rf = RandomForestClassifier(n_estimators=100,
max_depth=10,
max_features='sqrt',
n_jobs=-1,
random_state=42)
# 2. Train
rf.fit(X_train, y_train)
# 3. Predict
preds = rf.predict(X_test)
print(classification_report(y_test, preds))
# 4. Feature Importance (Critical for analysis)
import pandas as pd
importances = pd.Series(rf.feature_importances_, index=feature_names)
print(importances.sort_values(ascending=False).head(5))Parameter Tuning Cheat Sheet
| Parameter | Options / Range | Effect & Best Practice |
|---|---|---|
| n_estimators | 100 - 1000 | Number of trees. More is usually better (stable), but slower. Diminishing returns after larger number. |
| max_features | sqrt, log2, None | Critical. sqrt is standard for classification. Controls diversity of trees. |
| max_depth | None, int | Limits tree depth. Unlike single trees, RF is robust to full depth, but limiting it saves memory/time. |
| bootstrap | True (default), False | Keep True for Bagging effect. |