Scaling Vector Search with Approximate Nearest Neighbors
Searching through millions of vectors one by one is too slow. In this post, we explore Approximate Nearest Neighbor (ANN) algorithms that allow us to find 'good enough' results instantly.
Looking for my non-tech writing? Check out my Medium blog.
Searching through millions of vectors one by one is too slow. In this post, we explore Approximate Nearest Neighbor (ANN) algorithms that allow us to find 'good enough' results instantly.
Keyword search has its limits. This post dives into vector embeddings, the core of modern semantic search. I'll explore how lists of numbers can capture complex meanings and relationships, allowing search engines to understand intent and context, not just keywords.
To truly understand how search works, you have to build it. This post demystifies search engine architecture by rebuilding it from the ground up in Python, exploring the fundamentals of web crawlers, inverted indexes, and simple ranking algorithms.
Lucene is a tool for text analysis. It uses an analyzer to split the text into tokens and process them. An analyzer has three parts: char filters, tokenizer, and token filters. They can do different things with the text, such as remove HTML tags, find synonyms, or make ngrams.
Cache stampede occurs when multiple requests regenerate the same cache item simultaneously, overloading the database. Learn its causes and solutions like query optimization, locking, pre-populating the cache, and probabilistic early expiration.
Here is my personal journey of learning React.js and Node.js as a backend developer. I have outlined in this post everything I used to master these technologies in a short time.
The Abstract Factory Pattern provides an abstraction over factories, allowing a Super Factory to decide which factory to use. It simplifies object creation while adhering to creational design principles.
Design patterns provide reusable solutions to common software design problems, improving code quality and developer communication. This post highlights their advantages and categorizes them into Creational, Structural, and Behavioral patterns, with examples like Singleton, Factory Method, and Observer.
Java 12 introduces exciting updates like cleaner switch expressions, the experimental Shenandoah Garbage Collector for reduced pause times, and improvements to G1 GC for better memory management. Explore these features and how they enhance Java's performance and usability.
Handling time zones can be tricky. Store time in UTC, save time zones (not offsets), keep your timezone database updated, and avoid UTC for future or date-only values. Plan ahead to avoid headaches!