This repository is the official implementation for the paper:
cdpdnet: integrating text-guidance and hybrid encoders for medical image segmentation
Authors: Jiong Wu and Kuang Gong
Abstract: Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods.
# Clone the repository
git clone https://github.com/wujiong-hub/CDPDNet.git
# Create and activate a new conda environment
conda create -n cdpdnet python=3.9
conda activate cdpdnet
# Install PyTorch (please modify according to your server's CUDA version)
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# Install MONAI and additional dependencies
pip install 'monai[all]'
pip install -r requirements.txt- 01 Multi-Atlas Labeling Beyond the Cranial Vault - Workshop and Challenge (BTCV)
- 02 Pancreas-CT TCIA and The label we used for Dataset 01 and 02 is here
- 03 Combined Healthy Abdominal Organ Segmentation (CHAOS)
- 04 Liver Tumor Segmentation Challenge (LiTS)
- 05 Kidney and Kidney Tumor Segmentation (KiTS)
- 07 WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image
- 08 AbdomenCT-1K
- 09 Multi-Modality Abdominal Multi-Organ Segmentation Challenge (AMOS)
- 10 Decathlon (Liver, Lung, Pancreas, HepaticVessel, Spleen, Colon
- 11 CT volumes with multiple organ segmentations (CT-ORG)
- 12 AbdomenCT 12organ (FLARE22)
The post_label can be downloaded via link
-
Download and organize the dataset
Arrange the downloaded data according to the structure specified indataset/dataset_list/datasets.txt. -
Configure preprocessing parameters
Openlabel_transfer.pyand modify the following lines:ORGAN_DATASET_DIR = '/your/path/to/dataset' NUM_WORKER = 4 # Adjust based on your CPU ## For the above 11 datasets, you can directly download the post_label and arrange them in the corresponding folders. python -W ignore label_transfer.py
Please refer to the repo CLIP-Driven-Universal-Model for more details about the label transfer process.
-
**Dataset organs/tumors and corresponding label index
Index Organ Index Organ Index Organ Index Organ 1 Spleen 9 Postcava 17 Left Lung 25 Celiac Trunk 2 Right Kidney 10 Portal Vein & SV 18 Colon 26 Kidney Tumor 3 Left Kidney 11 Pancreas 19 Intestine 27 Liver Tumor 4 Gall Bladder 12 Right Adrenal Gland 20 Rectum 28 Pancreas Tumor 5 Esophagus 13 Left Adrenal Gland 21 Bladder 29 Hepatic Vessel Tumor 6 Liver 14 Duodenum 22 Prostate 30 Lung Tumor 7 Stomach 15 Hepatic Vessel 23 Left Head of Femur 31 Colon Tumor 8 Aorta 16 Right Lung 24 Right Head of Femur 32 Kidney Cyst
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 train.py --data_root_path DATA_DIR --dist True --uniform_sample- Do the inference process directly adopt our trained model:
cd pretrained_weights/
wget https://huggingface.co/jwu2009/CDPDNet/resolve/main/cdpdnet.pth
cd ../
CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume pretrained_weights/cdpdnet.pth --store_result - Do the inference process using your own trained model:
CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume CHECKPOINT_PATH --store_result We appreciate the effort of the following repositories in providing open-source code to the community:
If you find this repository useful, please consider citing this paper:
@misc{wu2025cdpdnetintegratingtextguidance,
title={CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation},
author={Jiong Wu and Yang Xing and Boxiao Yu and Wei Shao and Kuang Gong},
year={2025},
eprint={2505.18958},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.18958},
}
