Scaling Vector Search with Approximate Nearest Neighbors

Published on Sun Nov 30 2025

In the previous post, we covered the basics of vector embeddings. We saw how we can turn words and documents into lists of numbers (vectors) and find similar items by calculating the distance between them.

But we left off with a bottleneck: Scale.

If you have 100 documents, comparing your query vector to all 100 of them is instant. But what if you have 100 million documents? Or a billion?

Calculating the distance between your query and every single document in your database is called Brute Force Search (or k-Nearest Neighbors, k-NN). It gives you the perfect answer, but it's painfully slow. It's slow just like any other brute force algorithm.

We need a shortcut. This is where Approximate Nearest Neighbors comes in.

Accuracy vs Speed Trade-off

The core idea of ANN is to trade-off accuracy for speed.

Instead of guaranteeing the absolute closest vector, we settle for vectors that are very likely to be close. In search, this is usually a fine trade-off. If you search for "best running shoes", you don't need the mathematically perfect match #1. You just need highly relevant results.

So, how do we skip checking millions of vectors? We use indexes.

Clustering / IVF

One of the simplest ways to speed up search is to group similar vectors together. This technique is formally known as Inverted File Index (IVF), but it relies on a concept you might already know: Clustering.

Here is how it works:

Training: We look at all our document vectors and group them into clusters based on similarity. We calculate a center point (centroid) for each cluster.
Indexing: We assign every document to its nearest centroid.
Searching: When a query comes in, we first compare it to the centroids (which is fast because there are few of them). We find the closest centroid, and then we only search the vectors inside that cluster.

Loading animation...

In the animation above:

We have our data points (documents) grouped into three colored clusters.
A new Query point appears.
Instead of checking the distance to every point, we first check which centroid is closest.
We determine the Red cluster is the winner.
We then only search within the Red cluster to find the nearest neighbor.

We just skipped checking 2/3rds of the data! In a real system with thousands of clusters, we might skip 99% of the data, making the search 100x faster.

HNSW

While IVF is great, the most popular algorithm right now is HNSW (Hierarchical Navigable Small World). The name is a bit intimidating, but the concept is simple.

Imagine you are driving from a small town in Haryana to a small town in Kerala.

You don't start by looking at a map of every small street in India.
You get on a Highway (High layer) to get to the general region quickly.
You switch to Main Roads (Middle layer) to get closer to the city.
Finally, you use Local Streets (Bottom layer) to find the specific house.

HNSW builds a similar multi-layered graph for vectors.

The top layer has very few connections, allowing you to make massive jumps across the vector space.
As you get closer to your destination (the query), you drop down to lower layers with more connections for fine-grained navigation.

Loading animation...

This allows us to traverse a dataset of billions of vectors in a logarithmic number of steps. It's really fast.

Conclusion

This post concludes our three-part series on building a search engine from scratch.

The Crawler: We started by building a simple crawler to collect data and an inverted index to find documents containing specific keywords.
Vector Embeddings: We realized that keywords aren't enough. We explored how vector embeddings can capture the meaning of text, allowing us to find relevant results even without exact keyword matches.
ANN Search: Finally, we tackled the problem of scale. We learned how algorithms like IVF and HNSW allow us to search through millions of vectors in milliseconds, making semantic search practical for real-world applications.

The Modern Search Stack

Today, the best search engines don't choose between keywords and vectors. Instead, they use both. This is often called Hybrid Search.

Keyword Search (BM25) is still unbeatable for exact matches (like searching for a specific error code or product ID).
Vector Search is superior for understanding intent and answering questions.

By combining these two, and often adding a final "Reranking" step with a powerful LLM, we can build search experiences that feel almost magical. This is the foundation of RAG (Retrieval Augmented Generation), where we use these search techniques to give AI models the right context to answer your questions accurately.

I hope this series gave you a peek under the hood of the technologies we use every day. Thanks for reading!