chatgpt-retrieval-plugin
The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
A self-hosted FastAPI service that gives ChatGPT or any AI assistant access to your own documents by storing and searching them with vector embeddings, you control where the data lives.
The ChatGPT Retrieval Plugin is a backend service that lets ChatGPT and similar applications search through your personal or work documents by asking questions in plain language. Instead of relying only on what an AI model already knows, the plugin gives the model access to your own files and returns the most relevant snippets when you ask about them. It is a standalone retrieval backend you can plug into Custom GPTs, the function-calling and Assistants APIs, or the older (now deprecated) ChatGPT plugins model.
The idea is semantic search. When you upload documents, the plugin breaks each one into smaller chunks of text, turns each chunk into a numerical embedding using an OpenAI embedding model, and stores those embeddings in a vector database. When you ask a question, your question is also turned into an embedding, and the plugin finds the chunks whose embeddings are mathematically closest in meaning. The matching text is returned so the model can ground its answer in your actual content. You get more granular control than the built-in file-upload features in ChatGPT, for example you can tune chunk length and pick the embedding model.
You would use this if you want a self-hosted retrieval layer behind a Custom GPT or your own assistant, and you care about choosing where the data lives. The repository is structured as a FastAPI server (in the server directory), a datastore layer that handles different vector database providers, services for chunking and metadata extraction and PII detection, scripts for processing and uploading documents, example configurations, and a local server setup for testing on your own machine. The plugin manifest and OpenAPI schema live under the .well-known directory.
The backend is written in Python (the quickstart targets Python 3.10) and is installed with Poetry. It needs a bearer token and your OpenAI API key, plus environment variables for the vector database you choose. It supports many vector database providers including Pinecone, Weaviate, Zilliz, Milvus, Qdrant, Redis, Chroma, Postgres, Supabase, AnalyticDB, Llama Index, Elasticsearch, MongoDB Atlas, Azure Cognitive Search, and Azure CosmosDB Mongo vCore. It can also be configured for Azure OpenAI deployments.
Where it fits
- Build a Custom GPT that can answer questions based on your company's internal documentation.
- Add semantic document search to a customer support bot so it can retrieve relevant help articles.
- Create a personal assistant that searches through your own notes and files with natural language queries.
- Replace a heavy search setup with a lightweight self-hosted retrieval backend for a helpdesk or CRM.