Random Forest

Theory & Concept

Random Forest is an ensemble method that combines multiple Decision Trees to create a more robust and accurate model. It relies on the "Wisdom of Crowds"—while individual trees might be noisy or prone to overfitting, their average is often very stable.

It uses a technique called Bagging (Bootstrap Aggregating):

Bootstrapping: Train each tree on a random sample of the data (with replacement).
Feature Randomness: At each split, consider only a random subset of features (not all).
Aggregating: Average the predictions (Regression) or take the majority vote (Classification).

Why "Random"?

The randomness comes from two places:

Random sampling of data rows (Bagging).
Random sampling of features at each split.

This de-correlates the trees, ensuring they don't all make the same mistakes.

Mathematical Intuition

If we have $T$ independent trees, each with variance $\sigma^2$ , the variance of their average is:

Var(\bar{X}) = \frac{\sigma^2}{T}

By increasing $T$ (number of trees), we reduce the variance of the final model without increasing the bias. This is why Random Forest is harder to overfit than a single Decision Tree.

Quick Readiness Check

Is this method a fit for your use case?

Best For

'Set it and forget it' models. Often the best out-of-the-box algorithm for tabular data.

Prerequisites

No scaling needed. Works well with raw data.

Strengths

Accurate, robust to outliers, handles missing values, provides Feature Importance.

Weaknesses

Slow to predict (must run 100+ trees). Not interpretable (Black Box). Large model size.

Pro Tip

Key difference vs. Decision Tree: Variance Reduction. Key difference vs. GBM: Parallel Training (RF is parallel, GBM is sequential).

Code Snippet

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
 
# 1. Initialize
# n_estimators=100 is standard. n_jobs=-1 uses all CPU cores.
rf = RandomForestClassifier(n_estimators=100, 
                            max_depth=10, 
                            max_features='sqrt', 
                            n_jobs=-1, 
                            random_state=42)
 
# 2. Train
rf.fit(X_train, y_train)
 
# 3. Predict
preds = rf.predict(X_test)
print(classification_report(y_test, preds))
 
# 4. Feature Importance (Critical for analysis)
import pandas as pd
importances = pd.Series(rf.feature_importances_, index=feature_names)
print(importances.sort_values(ascending=False).head(5))

Parameter Tuning Cheat Sheet

Parameter	Options / Range	Effect & Best Practice
n_estimators	100 - 1000	Number of trees. More is usually better (stable), but slower. Diminishing returns after larger number.
max_features	sqrt, log2, None	Critical. sqrt is standard for classification. Controls diversity of trees.
max_depth	None, int	Limits tree depth. Unlike single trees, RF is robust to full depth, but limiting it saves memory/time.
bootstrap	True (default), False	Keep True for Bagging effect.