MergeAlgorithms
Merge algorithms in-class exercise for cs276
What This Repository Is About
This is a set of programming exercises for learning how search engines work behind the scenes. The core problem: when you search for multiple words at once, a search engine needs to quickly figure out which documents contain all (or some) of those words. This project teaches you the algorithms that make that fast.
The exercises are designed for a university computer science course, but they reflect real problems that engineers face. Instead of searching the entire internet like Google does, you're learning to build a small search system—something like the file search on your Mac or Windows computer. The README explains that small systems can't afford to keep all their data in memory the way Google can. Instead, they need clever algorithms that read through sorted lists of document IDs (called "postings lists") just once, from beginning to end, without jumping around. That single-pass approach is what makes the whole system fast and memory-efficient.
The exercises progress from simple to complex. You start by writing code to find documents that contain two words (an "AND" operation), then move to finding documents that contain at least one of two words (an "OR" operation). After that, you level up to something harder: finding documents where specific words appear close to each other, like searching for "machine learning" as a phrase. The final challenges add more advanced features like negation ("NOT") that real search engines support.
The key insight throughout is that sorting matters. If your lists are sorted by document ID, you can compare two lists by walking through them side-by-side, much like merging two sorted stacks of cards. This beats the naive approach of comparing every document in one list against every document in another. The README even calls out a bad solution and asks you to figure out why it's inefficient—that's the whole point of the exercise.
If you're interested in how information retrieval works, how databases organize data, or just want to understand the algorithms that power search, these exercises walk you through the real trade-offs engineers make.