gitmyhub

machine-learning-systems-design

HTML ★ 10k updated 3y ago

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

A free booklet on designing machine learning systems end-to-end, from project setup through data pipelines, model training, and production deployment, plus 27 open ML interview questions with community answers.

HTMLmagicbookPDFsetup: easycomplexity 1/5

This repository contains a short booklet written in 2019 on how to design machine learning systems, covering the process from initial project setup through data handling, model training, and eventually deploying and maintaining a working system. The author, Chip Huyen, describes it as an early attempt to document this topic, and notes that her later O'Reilly book from 2022, titled Designing Machine Learning Systems, is a more thorough and current treatment of the same subject.

The booklet follows four stages: setting up the project, building the data pipeline, selecting and training a model, and serving the model in production. Each section links to external resources for deeper reading and includes case studies from machine learning engineers at large tech companies. At the end there are 27 open-ended interview questions on machine learning systems design, with community-contributed answers available in this same repository.

This is not a traditional software tool: it is a document built with a package called magicbook that converts text files into HTML and PDF output. The repository includes the source content files and build instructions for anyone who wants to contribute edits or additions to the text. Contributing can mean fixing errors, adding resources, or editing the questions and answers.

The README is explicit that this is not the repository for the 2022 O'Reilly book, which has its own separate GitHub repository. Anyone looking for the more current and comprehensive material should consult that book's repository instead. This one remains publicly available as the original draft, along with the community answers it has accumulated.

Where it fits