Code and data for the validation of Predictive AI in Medical Practice.
This repository contains the source code (R and Python) and dataset used in the case study for the paper "Validation studies of predictive AI for use in medical practice: overview and guidance for performance measures", published in The Lancet Digital Health.
This work was developed under the guidance of the STRATOS (STRengthening Analytical Thinking for Observational Studies) consortium.
The paper reviews 32 performance measures for clinical risk prediction models (broadly referred to as Predictive AI) intended for use in medical practice. It provides critical guidance and recommendations on selecting the most appropriate measures for validation studies.
The measures discussed in the paper are illustrated using a real-world case study:
- Context: External validation of a model to estimate the risk of malignancy in patients with an ovarian tumor selected for surgery.
- Source: Landolfo et al., Br J Cancer 2024.
Please note that the code for this study was originally produced in R (PerfMeasuresOverview.R). The Python implementations (PerfMeasuresPython.py and PerfMeasures.ipynb) are replications of the original R code provided for accessibility and convenience.
This repository provides the code to calculate the performance measures and generate the plots shown in the paper. It also includes the anonymized prediction data.
| File | Description |
|---|---|
PerfMeasuresOverview.R |
R script containing code to calculate measures and create plots. |
PerfMeasuresPython.py |
Python script equivalent for calculating measures. |
PerfMeasures.ipynb |
Jupyter Notebook version of the Python analysis. |
data_case_study.txt |
Dataset containing risk estimates and outcomes. |
The file data_case_study.txt contains the specific data used for the case study. To ensure patient privacy and anonymity, no clinical variables or personal identifiers are included.
The dataset consists of:
- Risk Estimates: The predicted probability of the outcome.
- Outcomes: Binary classification (1 = malignant, 0 = benign).
You can reproduce the analysis using either R or Python.
Run the PerfMeasuresOverview.R script in RStudio or your preferred R environment. Ensure the data_case_study.txt file is in your working directory.
You can run the analysis using the provided script or notebook:
- Script: Run
PerfMeasuresPython.py. - Notebook: Open
PerfMeasures.ipynbin Jupyter Notebook or JupyterLab.
If you use this code or methodology in your research, please cite the main paper:
Van Calster B, et al. Validation studies of predictive AI for use in medical practice: overview and guidance for performance measures. The Lancet Digital Health. 2025. https://doi.org/10.1016/S2589-7500(25)00098-6