gitmyhub

toydb

Rust ★ 7.3k updated 7d ago

Distributed SQL database in Rust, written as an educational project

toyDB is a distributed SQL database built from scratch in Rust as a learning project, showing how Raft consensus, snapshot-isolation transactions, and a SQL query engine fit together in clean, readable code.

RustSQLRaftsetup: moderatecomplexity 4/5

toyDB is a distributed SQL database built from scratch in Rust as an educational project. The author originally wrote it in 2020 to understand how databases work internally, then rewrote it later after spending years building production databases at CockroachDB and Neon. The goal is to show how the core concepts behind distributed SQL databases fit together, with an emphasis on being readable and correct rather than fast or scalable.

The database runs as a cluster of nodes that coordinate using a protocol called Raft, which ensures that all nodes agree on the same data even when some are unavailable. Transactions are supported with a property called snapshot isolation, meaning each transaction sees a consistent view of the data as it existed when the transaction started, without blocking other concurrent transactions from running. Two storage backends are included: one that persists data to disk and one that keeps everything in memory for testing.

On top of the storage layer sits a SQL engine that supports standard features including joins, aggregates, and transactions. A query planner optimizes how queries are executed. The database also supports time-travel queries, which let users read historical versions of data from a specific past point in time.

Setting up a local five-node cluster takes a single shell script. A command-line client then connects to any node and accepts SQL commands. The repository includes an architecture guide that walks through the codebase concept by concept, a SQL reference, and worked examples. Tests use a golden script format that records expected output and later checks that behavior stays the same.

Performance is not a goal. Write throughput in particular is slow due to how disk syncing is handled. The project is explicit about this: the complexity required for production-grade performance would make the code harder to learn from, which would defeat the purpose.

Where it fits