gitmyhub

dbt-core

Rust ★ 13k updated 17h ago

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Command-line tool for data analysts that turns SQL SELECT statements into clean, tested tables in a data warehouse, automatically handling the order transformations need to run.

PythonSQLJinja2setup: moderatecomplexity 3/5

dbt (data build tool) is a command-line tool that helps data analysts transform raw data in a warehouse into clean, structured tables ready for analysis. Instead of writing complex scripts or building custom pipelines, analysts write plain SQL SELECT statements, and dbt takes care of turning those statements into actual tables or views in the database.

The central concept is a "model," which is just a SQL file that pulls from other tables or models. Models can reference each other, so dbt tracks the order in which they need to run. If model B depends on model A, dbt knows to run A first. It can also visualize these relationships as a diagram, which helps teams understand how data flows through their project.

dbt also includes a testing layer so teams can verify that their data meets expectations: things like checking that a column has no nulls, or that every value in a field is unique. Running tests after each transformation run helps catch data quality problems early.

The open-source version (dbt Core) runs locally or in CI pipelines. A hosted option (dbt Cloud) adds collaboration features, scheduling, and a web interface. Both use the same model syntax, so it is straightforward to move between them.

The README is brief and links to external documentation for full usage details. An active community exists on Slack and the dbt Community Discourse forum.

Where it fits