oneflow

C++ ★ 9.4k updated 6mo ago

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

OneFlow is a deep learning framework that trains AI models across multiple GPUs or machines automatically, with a PyTorch-compatible API and a Global Tensor abstraction that hides distributed complexity.

C++PythonCUDADockerpipsetup: hardcomplexity 4/5

OneFlow is a deep learning framework built for training AI models at scale. It is designed to feel familiar to anyone who has used PyTorch, but adds built-in support for running training jobs across many machines or GPUs at once. The framework is developed by OneFlow Inc in collaboration with Zhejiang Lab, and is released under the Apache 2.0 license.

The main feature that sets OneFlow apart from other deep learning frameworks is something called Global Tensor, which lets a developer write model code as if all the data fits on one device, while the framework handles spreading the work across multiple GPUs or machines automatically. There is also a Graph Compiler that can speed up or prepare a trained model for deployment, similar to how some other frameworks offer a static computation mode alongside their normal dynamic one.

Installation is straightforward using pip. The simplest path is to run a single pip install command, and there are separate packages for CPU-only use and for systems with Nvidia GPU hardware. Docker images are also available with everything pre-configured. Building from source is possible but requires a Linux system, specific CUDA versions for GPU support, and a handful of system libraries.

The README links to a companion library called Libai for training large Transformer-style models such as BERT and GPT in parallel, and to FlowVision for computer vision tasks. Both are separate repositories maintained by the same team. Documentation, an API reference, and a quickstart guide are available at docs.oneflow.org.

Where it fits

Train large AI models across multiple GPUs or machines using familiar PyTorch-style code with no manual distribution logic.
Speed up or prepare a trained model for production deployment using the built-in Graph Compiler.
Train large Transformer models like BERT or GPT in parallel using the companion Libai library.
Run computer vision experiments at scale using the FlowVision companion library.

Open on GitHub → Full breakdown on explaingit →