Back to ML Guide
Supervised Learning

Support Vector Machines (SVM)

Finds the hyperplane that best separates classes with the maximum margin. Effective in high-dimensional spaces.

3 min read

Theory & Concept

SVM is a powerful classifier that works by finding the optimal hyperplane (decision boundary) that maximizes the margin between the two classes.

Support Vector Machines infographic showing maximum margin hyperplane, support vectors, kernel trick, and linear vs RBF kernel separation.
  • Support Vectors: The data points closest to the hyperplane. These are the "difficult" points that define the decision boundary.
  • Kernel Trick: SVMs can efficiently perform non-linear classification by implicitly mapping inputs into high-dimensional feature spaces.

The Margin

SVM tries to make the "street" separating the classes as wide as possible. A wider margin implies better generalization to unseen data.


Mathematical Intuition

For a linear SVM, we want to multiple ww and bb such that:

wx+b=0w \cdot x + b = 0

We minimize w2||w||^2 subject to constraints (perfect separation).

With the Kernel Trick, we replace the dot product xixjx_i \cdot x_j with a kernel function K(xi,xj)K(x_i, x_j). The most common is the RBF (Radial Basis Function):

K(x,y)=exp(γxy2)K(x, y) = \exp(-\gamma ||x - y||^2)

This measures similarity between points; points close together influence each other.


Quick Readiness Check

Quick Readiness Check

Is this method a fit for your use case?

Best For

High-dimensional data (e.g., text, genes) where feat > samples. Small to medium datasets.

Prerequisites

Scaling is Mandatory. SVM computes distances; unscaled features will ruin the margin.

Strengths

Robust to overfitting in high dimensions. Effective non-linear classification with kernels.

Weaknesses

Not scalable: Training is O(N^2) or O(N^3). Hard to interpret probability (requires Platt scaling).

Pro Tip

Key parameter trade-off: Bias-Variance with C. High C = Strict (overfit risk). Low C = Smooth (underfit risk).


Code Snippet

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
 
# 1. Pipeline (Scaling + SVM)
# Always wrap SVM in a pipeline with a Scaler
clf = make_pipeline(StandardScaler(), 
                    SVC(C=1.0, kernel='rbf', gamma='scale', probability=True))
 
# 2. Train
clf.fit(X_train, y_train)
 
# 3. Predict
preds = clf.predict(X_test)

Parameter Tuning Cheat Sheet

ParameterOptions / RangeEffect & Best Practice
C0.1 - 1000Regularization. High C: Strict margin (risk of overfitting). Low C: Soft margin (smoother boundary).
kernellinear, rbf, polyStart with rbf (default). Use linear for text decoding or very high dimensions.
gammascale, 0.001 - 10Defines influence of a single point. High gamma: Close points only (complex islands). Low gamma: Far points (smooth).