citus
Distributed PostgreSQL as an extension
Citus is a PostgreSQL extension that spreads your database across multiple machines so it can handle far more data and traffic than a single server allows, while staying fully compatible with standard PostgreSQL tools and queries.
Citus is an extension for PostgreSQL, one of the most widely used relational databases. PostgreSQL normally stores and queries data on a single computer, and Citus extends it to spread that work across multiple computers (called a cluster). This makes it possible to handle much larger amounts of data and higher traffic than a single database server could manage on its own.
The extension adds the concept of distributed tables. Instead of storing all the rows of a table on one machine, Citus splits them into smaller chunks called shards and places those shards across the nodes in the cluster. When a query arrives, Citus routes it to the right shards and runs parts of it in parallel across multiple machines, then combines the results. The database still looks and behaves like ordinary PostgreSQL to any application connecting to it.
Citus is particularly suited to three types of workloads: multi-tenant applications (where data for many separate customers lives in one database), analytical queries that scan large volumes of data, and real-time data ingestion such as time-series or IoT scenarios. It also includes a columnar storage option that compresses data and speeds up queries that read only some columns of a table.
You can run Citus on a single machine using Docker for testing, install it locally as a package on Ubuntu, Debian, or Red Hat systems, or use it through Azure Cosmos DB for PostgreSQL, Microsoft's hosted version. Adding worker nodes to an existing cluster and rebalancing data across them is done through SQL commands.
The project is fully open source and follows PostgreSQL's extension model, meaning it works with standard PostgreSQL tools and ships updates alongside new PostgreSQL releases.
Where it fits
- Scale a multi-tenant SaaS database to handle thousands of customers without switching away from PostgreSQL.
- Run analytics queries on large datasets by distributing computation across multiple machines in parallel.
- Handle high-speed time-series or IoT data ingestion that a single PostgreSQL server cannot keep up with.
- Use columnar storage to compress wide tables and speed up queries that only read a few columns.