gitmyhub

clinical-noshow-prediction-decision-system

Jupyter Notebook ★ 20 updated 17d ago

Predictive clinical AI infrastructure analyzing 110,000+ appointments to mitigate missed medical visits. Features XGBoost, LightGBM, and SHAP explainability for care-team interventions.

A machine learning system that predicts which patients will miss their medical appointments, giving clinic staff early warning to send reminders or adjust schedules. Uses LightGBM and XGBoost models with SHAP explanations so staff can understand why each patient was flagged as high-risk.

PythonJupyter NotebookLightGBMXGBoostSHAPKaggle datasetsetup: moderatecomplexity 3/5

This project builds a machine learning system to predict which patients are likely to miss their medical appointments before the appointment day arrives. It works from a dataset of over 110,000 appointment records and is designed to give clinic staff early warning so they can take action, such as sending reminder texts, calling high-risk patients, or adjusting the day's schedule to account for expected absences.

The core of the project trains two types of prediction models, LightGBM and XGBoost, which are both established tools for this kind of classification problem. The models take in information about each appointment, including how far in advance it was booked, whether the patient received an SMS reminder, and health markers like hypertension or diabetes, then output a risk score for that patient skipping the visit.

A notable feature is the inclusion of SHAP explanations. SHAP is a technique that shows not just whether the model flagged a patient as high-risk, but which specific factors drove that prediction for that individual appointment. This is important in a clinical context because staff and administrators generally need to understand the reasoning behind a prediction, not just act on an opaque number.

The repository includes an exploratory Jupyter notebook for analysis and experimentation, along with separate Python scripts for the data processing, model training, and explanation steps. The dataset itself is not bundled in the repo; the README points to a public Kaggle download.

This appears to be a portfolio and consulting showcase project from a healthcare data scientist rather than a production-ready system. The code is organized cleanly and includes standard installation steps, but some sections of the README read as promotional material directed at potential clinic clients.

Where it fits