Naive Bayes

Theory & Concept

Naive Bayes calculates the probability of a data point belonging to a class given its features: $P(Class | Features)$ .

It is called "Naive" because it assumes that all features are independent of each other given the class.

Example: In spam filtering, it assumes the presence of the word "buy" is unrelated to the word "viagra", which is obviously false.
Reality: Despite this wrong assumption, it works remarkably well, especially for text classification (Spam, Sentiment Analysis).

Bayes' Theorem

P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}

Posterior $P(y|X)$ : Probability of class $y$ given features $X$ .
Likelihood $P(X|y)$ : Probability of features $X$ appearing in class $y$ .
Prior $P(y)$ : Probability of class $y$ occurring generally.

Types of Naive Bayes

Multinomial NB: For discrete counts (e.g., word counts in text).
Bernoulli NB: For binary/boolean features (e.g., word present/absent).
Gaussian NB: For continuous features (assumes normal distribution).

Quick Readiness Check

Is this method a fit for your use case?

Best For

Text Classification (Spam, News categorization), Sentiment Analysis, Real-time systems.

Prerequisites

Text must be tokenized/vectorized (Bag of Words or TF-IDF). Continuous data may need binning.

Strengths

Blazing fast (linear O(N)). Needs very less training data. Handles high dimensions well.

Weaknesses

The 'Zero Frequency' problem (if a word is unseen, probability = 0). Solved by Laplace Smoothing. 'Independence' assumption often violated.

Pro Tip

'What is the Zero Frequency problem?' When a feature value isn't in the training set, likelihood is 0, wiping out the whole equation. Fix: Add 1 to all counts (Laplace Smoothing).

Code Snippet

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
 
# Text Data (Spam vs Ham)
X = ["offer is secret", "click secret link", "secret sports link", "play sports"]
y = ["spam", "spam", "ham", "ham"]
 
# 1. Pipeline: Vectorize -> Convert text to counts -> Train NB
# alpha=1.0 is Laplace Smoothing
model = make_pipeline(CountVectorizer(), MultinomialNB(alpha=1.0))
 
# 2. Train
model.fit(X, y)
 
# 3. Predict
print(model.predict(["secret sports game"]))
print(model.predict_proba(["secret sports game"]))

Parameter Tuning Cheat Sheet

Parameter	Options / Range	Effect & Best Practice
alpha	0.0 - 1.0+	Smoothing parameter (Laplace/Lidstone). 1.0 is standard. 0.0 = No smoothing (risk of zero prob).
fit_prior	True (default), False	Whether to learn class prior probabilities or assume uniform (50/50). Keep True usually.