gitmyhub

aistore

Go ★ 1.9k updated 17h ago

AIStore: scalable storage for AI applications

AIStore, often called AIS, is a distributed storage system designed specifically for the large datasets that machine learning and AI workloads require. Where general-purpose storage systems treat all data the same, AIS is built to move data to training jobs quickly, at high volume, and at consistent speed regardless of how many machines are in the cluster.

The system can pull data from cloud storage providers such as Amazon S3, Google Cloud Storage, and Azure, as well as from on-site machines. It works either alongside those cloud systems or as a standalone cluster. Unlike a simple cache, AIS treats remote data as a first-class part of the system rather than as a temporary copy. You can run it on a single Linux laptop for testing or scale it to a cluster of hundreds of servers for production use. It also runs on Kubernetes for production deployments, and the project provides an operator, Helm charts, and Ansible playbooks for that path.

For working with data, AIS provides more than thirty batch operations including copying buckets, transforming data on read, downloading large files in chunks, and running distributed sort jobs. It supports reading and writing standard archive formats like TAR and ZIP, which is useful when training data is organized as many small files packed together. There is a command-line tool for managing clusters, monitoring jobs, and running performance reports.

Developers can connect to AIS using its own API, a Python library, a Go library, or through the standard Amazon S3 API without code changes. There is also a PyTorch integration with ready-made dataset classes and data loaders for use in training pipelines.

The project is licensed under MIT and was created by NVIDIA.