flyte

Go ★ 7.1k updated 2h ago

Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.

Flyte lets you build machine learning pipelines by decorating Python functions, then automatically runs, retries, and distributes those steps across cloud compute.

PythonGoFastAPIsetup: hardcomplexity 4/5

Flyte is a Python-based tool for building and running machine learning workflows. If you have a series of steps that need to happen in order, like preparing data, training a model, and then serving predictions, Flyte lets you define those steps as regular Python functions and then coordinates running them, potentially across many machines at the same time.

The core idea is that you decorate your Python functions with a marker that tells Flyte to treat them as tasks. Flyte then handles scheduling, retrying failed steps, tracking what ran and when, and distributing work across available compute. This is useful when individual steps take a long time, require a lot of memory or GPU access, or need to run in parallel.

Flyte 2, the current version shown in this repository, is designed to run locally for development and connect to cloud infrastructure for production workloads. It includes a command-line tool for running scripts and a text-based interface for monitoring what is happening. Model serving is also supported, with an example showing how to expose a prediction endpoint using FastAPI alongside Flyte.

The open-source backend for Flyte 2 is listed as coming soon at the time of this README. An enterprise-ready hosted version is available through Union.ai, the company that maintains the project. The older Flyte 1 is still maintained on a separate branch.

Flyte is a graduated project under the Linux Foundation's AI and Data program. It is licensed under Apache 2.0 and has a community Slack workspace and GitHub Discussions for support.

Where it fits

Build a multi-step ML pipeline that trains a model and serves predictions with automatic retry on failure.
Run data preprocessing and model training in parallel across multiple machines without managing the compute yourself.
Monitor and debug ML workflow execution history using the built-in text-based monitoring interface.

Open on GitHub → Full breakdown on explaingit →