snap

C++ ★ 0 updated 12y ago ⑂ fork

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data

SNAP is a tool that quickly matches short DNA sequences to a reference genome. When scientists sequence DNA from biological samples—say, tumor tissue or a microbial community—they get millions of tiny fragments. This tool's job is to figure out where each fragment came from in a known reference genome. It's fast enough to handle the massive scale of modern DNA sequencing without taking weeks to run.

The program works by creating a searchable index of the reference genome upfront, then using a hash-based matching scheme to rapidly compare each DNA fragment against that index. This approach is particularly efficient with modern DNA reads that are 100 bases or longer—the extra length gives the tool more information to work with, making matches faster and more accurate. Once you've indexed your reference genome once, you can reuse it for many different sequencing experiments.

Scientists running genomics labs, bioinformatics teams, and research institutions would use this tool as part of their DNA analysis pipeline. For example, a cancer researcher might use SNAP to align tumor sequencing data to the human genome, or a microbiologist might use it to identify what bacteria are present in an environmental sample by matching their DNA reads to known species. Any situation where you need to process millions of DNA sequences quickly would benefit from this tool.

The project is built in C++ for speed and runs on Windows, Linux, and Mac OS X, so it's accessible to labs regardless of their computing platform. The code is straightforward to compile, requiring only standard development tools and a common compression library. The team provides documentation and a user manual to help new users get started, plus online resources for more detailed guidance.

Open on GitHub → Full breakdown on explaingit →