gemma_pytorch

Python ★ 5.7k updated 1y ago

The official PyTorch implementation of Google's Gemma models

This repository contains Google's official PyTorch code for running Gemma, a family of AI language models that Google built using the same research behind its larger Gemini models. Gemma models are released with open weights, meaning anyone can download and run them without paying for API access.

Gemma comes in several sizes. The smallest version has 1 billion parameters, while the largest has 27 billion. Smaller models run faster and need less hardware; larger ones tend to give better answers. Some variants handle only text, while others are multimodal, meaning they can accept both text and images as input. There are also instruction-tuned variants that are set up to follow conversational prompts out of the box.

To run the models, you download a checkpoint file from Kaggle or Hugging Face and then run inference scripts provided in this repo. The setup uses Docker containers to manage dependencies, which means you package everything into an isolated environment before running. Inference can run on a regular CPU, a consumer or professional GPU, or on Google's own TPU hardware. An int8 quantized option is available for reducing memory usage on smaller machines.

The repo supports both standard PyTorch and a variant called PyTorch/XLA, which is designed to run efficiently on TPUs. Separate Docker files and run scripts are provided for each hardware target, so you pick the one that matches your setup.

If you want to try Gemma without installing anything, Google provides a free Colab notebook linked in the README. This repository is an unofficial reference implementation rather than a supported Google product.

Open on GitHub → Full breakdown on explaingit →