This repository provides the first publicly available dataset of expert radiologist gaze during CT analysis CTScanGaze and CT-Searcher, a transformer-based model for 3D scanpath prediction on medical CT volumes. Our work addresses the critical gap in understanding how radiologists visually examine 3D medical images during diagnostic procedures.
🎉 This work has been accepted as a highlight paper at ICCV 2025!
# Clone the repository
git clone https://github.com/UARK-AICV/CTScanGaze
cd CTScanGaze
# Create conda environment
conda create -n ctsearcher python=3.9
conda activate ctsearcher
pip install uv
uv pip install -r requirements.txtCT-ScanGaze is the first publicly available eye gaze dataset focused on CT scan analysis. The dataset is available on Hugging Face.
Each data sample contains the following fields:
{
"name": str, # CT scan identifier
"subject": str, # Radiologist ID
"task": str, # Task description
"X": list, # X coordinates of fixations
"Y": list, # Y coordinates of fixations
"Z": list, # Z coordinates (slice numbers)
"T": list, # Fixation durations in seconds
"length": int, # Scanpath length
"split": str, # Data split ("train" or "test")
"report": str, # Report for this CT
}Note that other fields in the JSON are dummy, so you do not need to care about them. For the reports, many reports will look like duplications because multiple CTs are from the same CT reading session for the same patient.
Additionally, we provide zip files containing all CT scans that match the identifiers, along with corresponding radiological reports for each CT scan.
Before training, you need to extract Swin UNETR features from your CT volumes. We provide a two-step process:
Extract features from CT volumes using a pre-trained Swin UNETR model:
# Place your CT volumes in one_sample/cts/*.nii.gz
uv run feature_extraction/swin_unet_extract_feature.pyThis script will:
- Download the pre-trained Swin UNETR model (MONAI BTCV weights)
- Extract features using sliding window (96×96×96 patches)
- Save patch-based features to
one_sample/features/*.pt
Merge overlapping patch features into complete volumes:
python feature_extraction/merge_features.py \
--features_dir one_sample/features \
--output_dir one_sample/features_mergedThis creates final feature volumes:
{name}.pt: Decoder features (768 channels, H/32×W/32×D/32){name}_hidden_states_out_4.pt: Encoder features (768 channels, H/32×W/32×D/32)
For more details, see feature_extraction/README.md.
After extracting features, you can train the CT-Searcher model:
# Single or multi-GPU
bash bash/train.sh
# Or directly
python src/train_lightning.py \
--log_root runs/experiment \
--epoch 40 \
--batch 2 \
--img_dir /path/to/data \
--feat_dir /path/to/features_mergedsbatch bash/train_slurm.shLightning auto-detects Slurm and configures multi-node DDP. Adjust --nodes and --gres=gpu:X in the script as needed.
Features:
- Auto multi-GPU/multi-node training
- Mixed precision (16-bit)
- Smart checkpointing
- TensorBoard logging
- Slurm auto-detection
python src/train_lightning.py \
--resume_dir runs/experiment_name \
--batch 2 \
--epoch 40The trainer will automatically load the last checkpoint from the specified directory.
Evaluation is performed automatically during training (every epoch). To evaluate a saved checkpoint:
python src/train_lightning.py \
--resume_dir runs/CTScanGaze_CTSearcher \
--img_dir /path/to/test/ct/images \
--feat_dir /path/to/test/features \
--fix_dir /path/to/test/gaze/dataThe Lightning trainer handles validation automatically with comprehensive metrics.
We use comprehensive 3D-adapted metrics for scanpath evaluation:
Scanpath-based Metrics:
- ScanMatch (SM): Spatial and temporal similarity with duration consideration
- MultiMatch (MM): Five-dimensional assessment (shape, direction, length, position, duration)
- String Edit Distance (SED): Sequence-based comparison using Levenshtein distance
Spatial-based Metrics:
- Correlation Coefficient (CC): Linear correlation between predicted and ground truth heatmaps
- Normalized Scanpath Saliency (NSS): Normalized saliency at fixation locations
- Kullback-Leibler Divergence (KLDiv): Distribution similarity measure
The current code base is working as long as the path and extracted features are prepared. But a lot of refactoring work is needed.
- Extracted feature of CTs (see Feature Extraction)
- Clean and refactor codebase
- Synthetic dataset
- Improve code comments and structure
If you find our work useful, please cite our paper:
@article{pham2025ct,
title={CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling},
author={Pham, Trong-Thang and Awasthi, Akash and Khan, Saba and Marti, Esteban Duran and Nguyen, Tien-Phat and Vo, Khoa and Tran, Minh and Nguyen, Ngoc Son and Van, Cuong Tran and Ikebe, Yuki and others},
journal={arXiv preprint arXiv:2507.12591},
year={2025}
}This material is based upon work supported by the National Science Foundation (NSF) under Award No OIA-1946391, NSF 2223793 EFRI BRAID, National Institutes of Health (NIH) 1R01CA277739-01.
This project is licensed under the Creative Commons Attribution Non Commercial Share Alike 4.0 International License. See the LICENSE file for details.
Primary Contact: Trong Thang Pham (tp030@uark.edu)
For questions, feedback, or collaboration opportunities, feel free to reach out! I would love to hear from you if you have any thoughts or suggestions about this work.
Note: While we don't actively seek contributions to the codebase, we greatly appreciate and welcome feedback, discussions, and suggestions for improvements.
Star this repository if you find it useful!

