cog
Containers for machine learning
Cog is an open-source tool that automatically packages machine learning models into Docker containers, handles GPU and library configuration, and generates an HTTP API so models can be deployed anywhere.
Cog is an open-source tool from Replicate that helps machine learning researchers and engineers package their models so they can be deployed anywhere. The core problem it solves is that getting a trained machine learning model running on a server is surprisingly difficult: you have to configure Docker containers, match the right versions of GPU libraries like CUDA with the right versions of frameworks like PyTorch or TensorFlow, and write a web server to accept requests. Cog handles all of that for you.
Instead of writing a Docker configuration file by hand, you describe your environment in a short YAML file. You tell Cog whether you need a GPU, which system packages are required, which Python version to use, and which file contains your prediction logic. Cog reads this and builds a properly configured Docker image, choosing the right base image and library combinations automatically.
To define how your model runs, you write a small Python class with two methods: one that loads the model into memory at startup, and one that processes each prediction request. Cog reads the input and output types you declare and automatically generates an HTTP API for your model, so other systems can call it by sending a JSON request to a URL. The HTTP server is built on a fast Rust-based framework.
Once built, the Docker image can run on any machine that supports Docker, including your own servers, cloud providers, or the Replicate platform. You can test predictions locally with a single command before deploying anywhere.
Cog runs on macOS, Linux, and Windows 11 via WSL. Installation is available through Homebrew on macOS, a shell script, or by downloading a binary directly from the GitHub releases page. The project was created by former engineers from Docker and Spotify, and contributions are welcome through a guide in the repository.
Where it fits
- Package a trained machine learning model into a Docker container without writing a Dockerfile or web server code.
- Deploy a PyTorch or TensorFlow model to Replicate's platform or any cloud provider using a single build command.
- Expose a model as an HTTP API automatically by defining input and output types in a small Python prediction class.
- Test model predictions locally with a single CLI command before deploying anywhere.