CDPDNet

This repository is the official implementation for the paper:
cdpdnet: integrating text-guidance and hybrid encoders for medical image segmentation
Authors: Jiong Wu and Kuang Gong

Abstract: Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods.

Overview

Environment

# Clone the repository
git clone https://github.com/wujiong-hub/CDPDNet.git

# Create and activate a new conda environment
conda create -n cdpdnet python=3.9
conda activate cdpdnet

# Install PyTorch (please modify according to your server's CUDA version)
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

# Install MONAI and additional dependencies
pip install 'monai[all]'
pip install -r requirements.txt

Datasets

The post_label can be downloaded via link

Datasets preprocessing

Download and organize the dataset
Arrange the downloaded data according to the structure specified in dataset/dataset_list/datasets.txt.

Configure preprocessing parameters
Open label_transfer.py and modify the following lines:

ORGAN_DATASET_DIR = '/your/path/to/dataset'
NUM_WORKER = 4  # Adjust based on your CPU
## For the above 11 datasets, you can directly download the post_label and arrange them in the corresponding folders.
python -W ignore label_transfer.py

Please refer to the repo CLIP-Driven-Universal-Model for more details about the label transfer process.

**Dataset organs/tumors and corresponding label index

Index	Organ	Index	Organ	Index	Organ	Index	Organ
1	Spleen	9	Postcava	17	Left Lung	25	Celiac Trunk
2	Right Kidney	10	Portal Vein & SV	18	Colon	26	Kidney Tumor
3	Left Kidney	11	Pancreas	19	Intestine	27	Liver Tumor
4	Gall Bladder	12	Right Adrenal Gland	20	Rectum	28	Pancreas Tumor
5	Esophagus	13	Left Adrenal Gland	21	Bladder	29	Hepatic Vessel Tumor
6	Liver	14	Duodenum	22	Prostate	30	Lung Tumor
7	Stomach	15	Hepatic Vessel	23	Left Head of Femur	31	Colon Tumor
8	Aorta	16	Right Lung	24	Right Head of Femur	32	Kidney Cyst

Training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 train.py --data_root_path DATA_DIR --dist True --uniform_sample

Testing

Do the inference process directly adopt our trained model:

cd pretrained_weights/
wget https://huggingface.co/jwu2009/CDPDNet/resolve/main/cdpdnet.pth
cd ../
CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume pretrained_weights/cdpdnet.pth --store_result

Do the inference process using your own trained model:

CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume CHECKPOINT_PATH --store_result

Acknowledgement

We appreciate the effort of the following repositories in providing open-source code to the community:

Citation

If you find this repository useful, please consider citing this paper:

@misc{wu2025cdpdnetintegratingtextguidance,
      title={CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation}, 
      author={Jiong Wu and Yang Xing and Boxiao Yu and Wei Shao and Kuang Gong},
      year={2025},
      eprint={2505.18958},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.18958}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
dataset		dataset
documents		documents
model		model
optimizers		optimizers
pretrained_weights		pretrained_weights
utils		utils
README.md		README.md
index.html		index.html
label_transfer.py		label_transfer.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CDPDNet

Overview

Environment

Datasets

Datasets preprocessing

Training

Testing

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

wujiong-hub/CDPDNet

Folders and files

Latest commit

History

Repository files navigation

CDPDNet

Overview

Environment

Datasets

Datasets preprocessing

Training

Testing

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages