QASystemOnMedicalKG
A tutorial and implement of disease centered Medical knowledge graph and qa system based on it。知识图谱构建,自动问答,基于kg的自动问答。以疾病为中心的一定规模医药领域知识图谱,并以该知识图谱完成自动问答与分析服务。
A Python project that builds a Chinese medical knowledge graph of 44,000 entities and 300,000 relationships in Neo4j, then uses it to answer natural-language health questions without a large AI model.
This project builds a medical knowledge graph focused on diseases, then uses that graph to power a question-and-answer chatbot. The entire pipeline, from collecting raw data to answering user questions, is built from scratch in Python. The source data comes from Chinese medical websites, and the project is documented primarily in Chinese.
The knowledge graph contains roughly 44,000 medical entities organized into seven categories: diseases, symptoms, drugs, foods, diagnostic checks, hospital departments, and commercially available drug products. These entities are connected by about 300,000 relationships, capturing things like which drugs are commonly prescribed for a given disease, which foods are recommended or should be avoided, which tests are needed for diagnosis, which diseases frequently appear together, and which department a disease belongs to. All of this is stored in a Neo4j graph database, which is a type of database designed specifically for representing connections between things.
The question-answering system sits on top of the graph. When a user types a question in natural language, the system classifies what type of question it is, then translates it into a graph database query to look up the answer. The supported question types cover most practical medical lookup needs: what are the symptoms of a disease, what might cause a given symptom, what should a patient eat or avoid, what drugs treat a condition, what tests diagnose a condition, how long does treatment typically take, and what is the cure rate.
To run the project, you need a Neo4j database running locally, the appropriate Python dependencies installed, and then run the graph-building script to load all the data, which takes several hours due to the volume. After that, a chat script starts the question-answering interface.
This is a tutorial and research project, not a production medical service. It demonstrates how knowledge graphs can be combined with natural-language question classification to answer structured queries without requiring a large language model.
Where it fits
- Build a question-answering system that looks up which drugs treat a disease or which symptoms might indicate a condition.
- Study how to connect a Neo4j graph database to a natural-language question classifier for structured medical queries.
- Use the knowledge graph as a dataset for Chinese medical NLP research or as a foundation for a medical chatbot.