gitmyhub

databend

Rust ★ 9.3k updated 4h ago

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.

An open-source cloud data warehouse built in Rust that queries large datasets on S3, Azure, or GCS with SQL, adds AI function support via Python sandbox scripts, vector search, and Git-like data branching.

RustPythonSQLDockersetup: moderatecomplexity 4/5

Databend is an open-source data warehouse built in Rust, designed to store and analyze large amounts of data stored in cloud object storage like Amazon S3, Azure Blob, or Google Cloud Storage. A data warehouse is a database system designed for analytical queries, meaning it is optimized for reading and summarizing large datasets rather than for fast individual record lookups. Databend handles that kind of workload while also adding vector search and full-text search in the same engine, so you do not need separate systems for those tasks.

One of the distinctive features is what the README calls "agent-ready" architecture. You can write Python functions inside the database using a feature called sandbox UDFs (user-defined functions). Those functions run in isolated containers, and you call them from regular SQL queries. The example in the README shows defining a function that could call an AI model and then running it over a table of data with a single SQL statement. This lets you combine data processing and AI logic without moving data to a separate application.

Data branching is also supported, described as working like version control for data. You can create a snapshot of production data and let processes run on that snapshot without affecting the live data, similar to creating a branch in code version control.

Getting started is quick: there is a Python package you can install with pip for local development, a Docker image for running the full system locally, and a hosted cloud service. The cloud version is described as production-ready in about sixty seconds.

The project is dual-licensed under Apache 2.0 and Elastic 2.0. An enterprise edition with additional support options is available from the company behind the project.

Where it fits