K-Nearest Neighbors (KNN)

Theory & Concept

KNN is a "lazy learner"—it doesn't actually learn a model during training. Instead, it memorizes the training data.

K-Nearest Neighbors infographic showing how distance and neighbor voting work.

When you ask it to predict a new point:

It calculates the distance from the new point to all stored training points.
It finds the $K$ closest points ("neighbors").
Classification: It takes a majority vote of the neighbors' classes.
Regression: It takes the average of the neighbors' values.

Mathematical Intuition

The core is the Distance Metric. The most common is Euclidean Distance:

d(p, q) = \sqrt{\sum_{i=1}^{n} (q_i - p_i)^2}

For categorical variables, Hamming distance is used.

The prediction for classification is:

y = \text{mode}(\{y_i : x_i \in N_k(x)\})

Where $N_k(x)$ is the set of $K$ nearest neighbors.

Quick Readiness Check

Is this method a fit for your use case?

Best For

Small datasets, baselines, and systems where 'similarity' is intuitive (e.g., simple recommender systems).

Prerequisites

Scaling is Mandatory. Distances are dominated by large-scale features otherwise.

Strengths

Extremely simple. Zero training time. Makes no assumptions about data distribution.

Weaknesses

Slow Inference: O(N) for every prediction (must scan all data). Curse of Dimensionality issues.

Pro Tip

If asked 'What happens if K is too small or too large?': Small K (1) = High Variance (Overfitting). Large K = High Bias (Underfitting).

Code Snippet

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
 
# 1. Pipeline
model = make_pipeline(StandardScaler(), 
                      KNeighborsClassifier(n_neighbors=5))
 
# 2. Train (Just stores data)
model.fit(X_train, y_train)
 
# 3. Predict (Calculates distances)
preds = model.predict(X_test)

Parameter Tuning Cheat Sheet

Parameter	Options / Range	Effect & Best Practice
n_neighbors (K)	1 - 50	Start with sqrt(N) or default 5. Always pick an odd number for binary classification to avoid ties.
weights	uniform, distance	uniform: All neighbors count equally. distance: Closer neighbors have more say (usually better).
metric	euclidean, manhattan	euclidean is standard. manhattan (taxicab) often better for high-dimensional data.