gpt-neox

Python ★ 7.4k updated 10d ago

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

GPT-NeoX is a Python toolkit for training billion-parameter AI language models from scratch on GPU clusters, not for chatting with existing models, but for organizations building new ones at research scale.

PythonPyTorchDeepSpeedMegatron-LMCUDASlurmMPIsetup: hardcomplexity 5/5

GPT-NeoX is a Python library built by EleutherAI for training very large language models from scratch on clusters of GPUs. A language model is the kind of AI system that powers tools like ChatGPT, capable of generating and understanding text. Training one from scratch requires enormous amounts of compute and careful coordination across many machines running in parallel. GPT-NeoX is designed for that process, not for running or chatting with a pre-existing model. The README explicitly states that if you are not trying to train a model with billions of parameters from scratch, this is probably the wrong library to use, and recommends the Hugging Face transformers library for general inference needs instead.

The library builds on top of two other systems: NVIDIA Megatron-LM and Microsoft DeepSpeed, both of which handle splitting a model across many GPUs and coordinating the training process. GPT-NeoX adds its own optimizations on top of those, including support for a wider range of hardware configurations and cluster management tools such as Slurm and MPI. It has been run at scale on cloud providers like AWS and CoreWeave, as well as on government supercomputers including Oak Ridge National Lab systems and the LUMI system in Finland.

The project was used to train several published open-source models, including GPT-NeoX-20B and the Pythia suite. It ships with predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 and 2. More recent additions include Mixture-of-Experts support, AMD GPU support, and preference learning methods for fine-tuning.

This is primarily a research and engineering tool for organizations with access to large GPU clusters. It is maintained by EleutherAI, a nonprofit AI research organization.

The full README is longer than what was shown.

Where it fits

Train a large language model from scratch on a GPU cluster using predefined configs for Pythia, LLaMA, or Falcon
Fine-tune an existing model using preference learning methods on cloud infrastructure like AWS or CoreWeave
Run a distributed training job on a supercomputer with Slurm integration and MPI coordination

Open on GitHub → Full breakdown on explaingit →