Naive Bayes
A probabilistic classifier based on Bayes' Theorem with the 'naive' assumption of feature independence. Fast, simple, and surprisingly effective for text.
Theory & Concept
Naive Bayes calculates the probability of a data point belonging to a class given its features: .
It is called "Naive" because it assumes that all features are independent of each other given the class.
- Example: In spam filtering, it assumes the presence of the word "buy" is unrelated to the word "viagra", which is obviously false.
- Reality: Despite this wrong assumption, it works remarkably well, especially for text classification (Spam, Sentiment Analysis).
Bayes' Theorem
- Posterior : Probability of class given features .
- Likelihood : Probability of features appearing in class .
- Prior : Probability of class occurring generally.
Types of Naive Bayes
- Multinomial NB: For discrete counts (e.g., word counts in text).
- Bernoulli NB: For binary/boolean features (e.g., word present/absent).
- Gaussian NB: For continuous features (assumes normal distribution).
Quick Readiness Check
Quick Readiness Check
Is this method a fit for your use case?
Best For
Text Classification (Spam, News categorization), Sentiment Analysis, Real-time systems.
Prerequisites
Text must be tokenized/vectorized (Bag of Words or TF-IDF). Continuous data may need binning.
Strengths
Blazing fast (linear O(N)). Needs very less training data. Handles high dimensions well.
Weaknesses
The 'Zero Frequency' problem (if a word is unseen, probability = 0). Solved by Laplace Smoothing. 'Independence' assumption often violated.
Pro Tip
'What is the Zero Frequency problem?' When a feature value isn't in the training set, likelihood is 0, wiping out the whole equation. Fix: Add 1 to all counts (Laplace Smoothing).
Code Snippet
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
# Text Data (Spam vs Ham)
X = ["offer is secret", "click secret link", "secret sports link", "play sports"]
y = ["spam", "spam", "ham", "ham"]
# 1. Pipeline: Vectorize -> Convert text to counts -> Train NB
# alpha=1.0 is Laplace Smoothing
model = make_pipeline(CountVectorizer(), MultinomialNB(alpha=1.0))
# 2. Train
model.fit(X, y)
# 3. Predict
print(model.predict(["secret sports game"]))
print(model.predict_proba(["secret sports game"]))Parameter Tuning Cheat Sheet
| Parameter | Options / Range | Effect & Best Practice |
|---|---|---|
| alpha | 0.0 - 1.0+ | Smoothing parameter (Laplace/Lidstone). 1.0 is standard. 0.0 = No smoothing (risk of zero prob). |
| fit_prior | True (default), False | Whether to learn class prior probabilities or assume uniform (50/50). Keep True usually. |