gitmyhub

data-engineer-roadmap

★ 13k updated 4y ago

Roadmap to becoming a data engineer in 2021

A visual diagram mapping all the tools, cloud platforms, and concepts you would encounter across a data engineering career, meant as a landscape reference rather than a step-by-step curriculum.

Markdownsetup: easycomplexity 1/5

data-engineer-roadmap is a visual reference guide showing the tools, technologies, and concepts a person would need to learn to work as a data engineer. Data engineering is the discipline of building and maintaining the pipelines that move, store, clean, and prepare data so that analysts, dashboards, and machine learning systems can use it. This roadmap attempts to map out that entire field in a single diagram, presented as a large image hosted in the repository.

The roadmap covers the modern data engineering landscape as of 2021, grouping topics across areas such as cloud platforms, data pipeline tools, storage formats, orchestration systems, query engines, and programming languages. A text version of the diagram is included in the repository for users who cannot view the image. There is also a separate extras diagram covering additional tools that are useful to know but not strictly required for most roles.

The README includes a note for beginners: a working data engineer would typically master only a subset of these tools over several years, shaped by the company they work for and the kinds of problems they encounter. The diagram is intended as a map of the overall landscape, not a checklist to complete before getting started.

The README itself is sparse and the main content is the roadmap image, which the README links to but does not describe in text. The project was created by datastack.tv, a learning platform that produces screencast tutorials for data engineers. Community suggestions and pull requests are welcome.

Where it fits