gitmyhub

data-engineering-zoomcamp

Jupyter Notebook ★ 43k updated 11d ago

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

Free nine-week course teaching data pipeline fundamentals: Docker, Terraform, workflow orchestration, BigQuery, dbt, Spark, and Kafka for aspiring data engineers.

PythonSQLDockerTerraformKestraGoogle BigQueryApache SparkApache Kafkasetup: hardcomplexity 3/5

Data Engineering Zoomcamp is a free nine-week online course that teaches the fundamentals of building data pipelines from scratch. Data engineering is the discipline of designing and building the systems that collect, move, transform, and store data so that it can be used for analysis and machine learning. The course addresses the gap that many aspiring data professionals face: they know how to write SQL or Python but do not have hands-on experience with the production infrastructure tools that real data jobs require.

The course is structured as seven modules followed by a final project. The first module covers containerization using Docker and infrastructure provisioning using Terraform, which are tools for packaging software and managing cloud resources consistently. Module two teaches workflow orchestration, the practice of scheduling and monitoring data pipelines, using Kestra. Later modules cover data warehousing in Google BigQuery, analytics engineering with dbt which is a tool for transforming data inside a warehouse using SQL, batch processing with Apache Spark for large-scale distributed computation, and streaming data with Apache Kafka for real-time event processing. Each module includes homework assignments, and the course ends with a capstone project where students build a complete end-to-end pipeline.

You would enroll in or self-study this course if you have basic Python and SQL knowledge and want practical experience with the tools used in industry data engineering roles. The course runs in cohorts starting each January, but all materials including Jupyter Notebooks, lecture videos, and homework are freely available for self-paced study. The primary format is Jupyter Notebook alongside code and configuration files.

Where it fits