gitmyhub

medical-data

★ 6.0k updated 2y ago

A curated reference list of publicly available medical datasets for machine learning research, covering medical imaging, electronic health records, and physiological signals, with access instructions for each.

setup: easycomplexity 1/5

This repository is a curated list of medical datasets available for machine learning research. It does not contain data itself; it collects links, descriptions, and access instructions for dozens of publicly available medical collections, with notes on whether a dataset requires registration before downloading.

The list is organized by data type. The medical imaging section covers datasets for cardiac MRI scans, brain MRI, CT scans, retinal photographs, skin lesion images, lung CT images, X-rays, and more. Several of these are well-known research benchmarks: OASIS covers brain MRI for Alzheimer's studies across hundreds of subjects, LIDC is a spiral CT lung image collection built to support cancer detection algorithms, and the ISIC archive contains over 23,000 classified skin lesion images including malignant and benign examples.

Other sections cover electronic health records and clinical data. The MIMIC-III database, which contains de-identified records from tens of thousands of intensive care unit patients, is one of the most widely used datasets in medical machine learning research and appears in the list with notes on how to apply for access. There are also entries for physiological signal data, including electrocardiogram (ECG) recordings and other time-series measurements from wearable sensors and hospital monitors.

The repository is intended as a reference document. Each entry typically includes a brief description of what the dataset contains, a link to the dataset or its homepage, and in some cases a citation for the original research paper that introduced it. The list notes that many datasets, especially those containing patient data, require researchers to apply and agree to usage terms before access is granted.

The README explicitly asks readers to respect usage restrictions for each listed dataset. The full README is longer than what was shown.

Where it fits