YaLM-100B

Python ★ 3.8k updated 2y ago

Pretrained language model with 100B parameters

YaLM 100B is a large text-generating neural network built by Yandex and released for public use by developers and researchers. It works similarly to other GPT-style models: given some text as input, it predicts and outputs the next words. The model can handle both English and Russian, reflecting the bilingual makeup of the data it was trained on.

Training this model was a significant undertaking. Yandex ran it on 800 high-end graphics cards for about 65 days, processing roughly 1.7 terabytes of text drawn from web pages, books, news, social media, and Wikipedia. About a quarter of that data came from an English dataset called The Pile, and the rest was Russian text carefully filtered and deduplicated to remove junk, repetitive content, and low-quality pages.

Using the model requires serious hardware. The weights alone take up 200 gigabytes of disk space, and running inference requires multiple graphics cards totaling around 200 gigabytes of GPU memory. The repository includes shell scripts to download the weights, pull a pre-built Docker container, and start generating text without having to configure the environment from scratch.

Once set up, you can interact with the model in several ways: type prompts directly from the command line for immediate responses, feed it a file of inputs for conditional generation using sampling or greedy decoding, or let it generate text freely without any prompt at all. Each mode corresponds to a ready-made example script in the repository.

The code here is not the original training code. It is a lightly modified version of an example from the DeepSpeed project, adapted just enough to load and run Yandex's trained weights. The model weights and a companion vocabulary file are hosted on Hugging Face and can be downloaded via the included script or by cloning the Hugging Face repository directly. The model is released under the Apache 2.0 license, which allows both research and commercial use.

Open on GitHub → Full breakdown on explaingit →