gitmyhub

Megatron-LM

Python ★ 17k updated 15h ago

Ongoing research training transformer models at scale

NVIDIA's Python library for training very large AI language models, from 2 billion to hundreds of billions of parameters, across thousands of GPUs simultaneously, using advanced parallelism built for research and production scale.

PythonPyTorchCUDAsetup: hardcomplexity 5/5

Megatron-LM is a GPU-optimized Python library from NVIDIA for training very large transformer models — the class of AI architectures that powers modern large language models. It is designed for research teams and ML engineers who need to train models ranging from 2 billion to hundreds of billions of parameters across thousands of GPUs simultaneously.

The repository contains two main components. Megatron-LM is the higher-level reference implementation with pre-configured training scripts, useful for learning or experimentation. Megatron Core is the lower-level, composable library that framework developers can use to build custom training pipelines.

The core technical challenge it solves is distributing model training across many GPUs efficiently, through multiple parallelism strategies: tensor parallelism (splitting individual operations across GPUs), pipeline parallelism (splitting model layers across GPUs), and data parallelism (running the same model on different data batches in parallel). It also supports mixed precision training — using lower-precision number formats like FP8 and BF16 to speed up computation. According to the benchmarks, it achieves up to 47% Model FLOP Utilization (a measure of hardware efficiency) on H100 GPU clusters, tested up to a 462-billion parameter model on 6,144 GPUs.

You would use Megatron-LM if you are training or fine-tuning large language models at research or production scale and need tooling designed to work across large GPU clusters. The full README is longer than what was provided.

Where it fits