gitmyhub

all-in-rag

Python ★ 8.8k updated 16d ago

🔍大模型应用开发实战一:RAG 技术全栈指南,在线阅读地址:https://datawhalechina.github.io/all-in-rag/

A ten-chapter Chinese-language tutorial that teaches Python developers how to build AI question-answering systems that search your own documents using retrieval-augmented generation (RAG) from start to finish.

PythonDockersetup: moderatecomplexity 3/5

All-in-RAG is a structured Chinese-language tutorial series from Datawhale that teaches developers how to build RAG applications. RAG stands for Retrieval-Augmented Generation, a technique where an AI system looks up relevant information from a knowledge base before generating an answer. This approach lets you build question-answering systems that draw on your own documents rather than relying solely on what a language model learned during training.

The tutorial is organized into ten chapters covering the full pipeline from start to finish. Early chapters explain the core concepts and walk through a minimal working example in four steps. Later chapters cover data loading and preparation, splitting documents into chunks, turning text into vector representations that can be searched by meaning, storing those vectors in a database, and combining different search strategies to improve result quality. The series also covers converting natural language questions into database queries (Text2SQL), evaluating how well a RAG system performs, and connecting retrieved results to a language model to produce formatted answers.

Toward the end there are two complete hands-on projects that apply all of these pieces together, including an optional extension that uses a knowledge graph to improve retrieval. An extra chapter section allows community members to contribute specialized topics.

The intended audience is Python developers with basic programming skills who want to understand and build production-grade RAG systems. Basic familiarity with Docker and Linux commands is listed as a prerequisite. The course is written primarily in Chinese with an English README available, and the full content can be read online through the project documentation site.

Where it fits