gitmyhub

clip-as-service

Python ★ 13k updated 2y ago

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

A Python client-server tool that converts images and text into numerical embeddings using the CLIP model, letting you compare images to text captions and rank which description best matches a photo.

PythonPyTorchONNX RuntimeTensorRTgRPCsetup: hardcomplexity 3/5

CLIP-as-service is a Python tool that turns images and text into numerical vectors (called embeddings) and compares them to each other. It is built around a model called CLIP, which was originally developed at OpenAI and understands the relationship between images and natural language descriptions. The practical result is that you can give it an image and several text captions, and it will rank the captions by how well they describe the image.

The system is split into a server component and a client component, each installed as a separate Python package. You start the server on a machine with access to a GPU, and the client connects to it to send images or text and receive embeddings or rankings back. The server supports three different runtimes: standard PyTorch, ONNX Runtime for better efficiency, and TensorRT for the fastest throughput.

Requests can be sent over gRPC, HTTP, or WebSocket, with optional TLS encryption. The client supports async (non-blocking) requests, which the README describes as designed for large amounts of data or long-running tasks. The server can also scale horizontally and run multiple CLIP model replicas on a single GPU with automatic load balancing.

The README demonstrates a few use cases: generating embeddings for images and text sentences, and visual reasoning tasks where the model is asked questions about an image by providing competing text descriptions. For example, you can send an image of berries with captions like "this is a photo of three berries" versus "this is a photo of four berries," and the model returns a confidence score for each.

Installation is handled through pip. The server can also be hosted on Google Colab using its free GPU resources.

Where it fits