document-copilot

Python ★ 54 updated 13d ago

An AI chatbot that lets you ask plain-English questions about a collection of uploaded PDFs and get sourced answers, built with FastAPI, React, Supabase, and OpenAI, demonstrated on SEC financial filings.

PythonFastAPIReactTypeScriptSupabasePostgreSQLpgvectorOpenAIsetup: moderatecomplexity 3/5

Document Copilot is an AI chatbot built to let users ask questions about a collection of documents in plain English and receive answers with source citations. The use case described in the readme is a fictional investment research firm where analysts spend significant time reading SEC financial filings (10-Ks and 10-Qs) before producing any original analysis. The chatbot is meant to handle that reading work and surface relevant information on demand.

The project uses a Python backend built with FastAPI, a React frontend, and a Supabase-hosted PostgreSQL database for storing users, chats, uploaded documents, and document chunks. When a document is ingested, it is split into pieces and converted into numerical representations using OpenAI's API, then stored with a vector search extension called pgvector. When a user asks a question, the system finds the most relevant chunks through a combination of vector similarity and standard text search, then passes them to an OpenAI language model to compose an answer.

A helper script is included to download a small set of real SEC filings from EDGAR, the public US financial disclosure database. By default it fetches recent 10-K filings for five large US companies and saves them locally for use as sample data during development.

The frontend is built with Vite, React, and TypeScript. Authentication is handled through Supabase's email-based auth system. The application is designed to be hosted on Railway.

Setting up the project requires Python 3.12 or later, Node.js, and active accounts with Supabase and OpenAI. Setup guides for the backend, frontend, and database are included in the repository's docs folder.

Where it fits

Build an internal tool that lets analysts query a library of SEC filings and receive cited answers without reading each document manually.
Create a document chatbot for any PDF collection, upload files, ask questions, and get answers with references to the source sections.
Set up a hybrid search system that combines vector similarity and standard text search for more accurate retrieval across large document sets.

Open on GitHub → Full breakdown on explaingit →