doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache Doris is an open-source analytical database that runs complex queries across billions of rows in under a second, supporting real-time dashboards, BI tools, and federated queries across data lakes.
Apache Doris is an open-source analytical database — built for asking complex questions across very large amounts of data and getting answers back quickly. The README describes it as easy to use, high performance, and real-time, with a goal of returning query results in under a second even when the underlying data is huge. It supports both high-concurrency point queries (lots of small lookups at once) and high-throughput complex analysis (a few heavy queries crunching across a lot of data).
The way it works is built on an MPP (massively parallel processing) architecture, meaning a query is split and run across many machines at the same time. Doris uses two kinds of processes: Frontend (FE) nodes, which handle user requests, parse and plan queries, and manage metadata; and Backend (BE) nodes, which store the data and execute queries. Data is partitioned into shards and copied across multiple BE nodes for reliability. Multiple FE nodes can be deployed for disaster recovery, organized as Master, Follower, and Observer roles. Doris speaks the MySQL protocol and supports standard SQL, so you can connect with familiar clients and BI tools.
You would reach for Apache Doris when you need a unified analytics platform — real-time dashboards, ad-hoc BI queries, user behavior and A/B test analysis, log and event analysis, and querying data sitting in data lakes such as Apache Hive, Apache Iceberg, or Apache Hudi. It also supports federated queries that join data across multiple sources, pitched as a way to eliminate data silos.
Doris is an Apache Software Foundation project released under Apache 2.0, with Java as the primary language.
Where it fits
- Build a real-time analytics dashboard that queries billions of rows and returns results in under a second.
- Run ad-hoc BI queries across your data warehouse without a separate ETL step.
- Analyze user behavior and A/B test results by joining event streams as they arrive.
- Query data sitting in Apache Hive or Iceberg data lakes without moving it into Doris first.