Skip to content

wujiong-hub/CDPDNet

Repository files navigation

CDPDNet

This repository is the official implementation for the paper:
cdpdnet: integrating text-guidance and hybrid encoders for medical image segmentation
Authors: Jiong Wu and Kuang Gong

Abstract: Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods.


Overview

Figure 1 Overview


Environment

# Clone the repository
git clone https://github.com/wujiong-hub/CDPDNet.git

# Create and activate a new conda environment
conda create -n cdpdnet python=3.9
conda activate cdpdnet

# Install PyTorch (please modify according to your server's CUDA version)
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

# Install MONAI and additional dependencies
pip install 'monai[all]'
pip install -r requirements.txt

Datasets

The post_label can be downloaded via link

Datasets preprocessing

  1. Download and organize the dataset
    Arrange the downloaded data according to the structure specified in dataset/dataset_list/datasets.txt.

  2. Configure preprocessing parameters
    Open label_transfer.py and modify the following lines:

    ORGAN_DATASET_DIR = '/your/path/to/dataset'
    NUM_WORKER = 4  # Adjust based on your CPU
    ## For the above 11 datasets, you can directly download the post_label and arrange them in the corresponding folders.
    python -W ignore label_transfer.py

    Please refer to the repo CLIP-Driven-Universal-Model for more details about the label transfer process.

  3. **Dataset organs/tumors and corresponding label index

    Index Organ Index Organ Index Organ Index Organ
    1 Spleen 9 Postcava 17 Left Lung 25 Celiac Trunk
    2 Right Kidney 10 Portal Vein & SV 18 Colon 26 Kidney Tumor
    3 Left Kidney 11 Pancreas 19 Intestine 27 Liver Tumor
    4 Gall Bladder 12 Right Adrenal Gland 20 Rectum 28 Pancreas Tumor
    5 Esophagus 13 Left Adrenal Gland 21 Bladder 29 Hepatic Vessel Tumor
    6 Liver 14 Duodenum 22 Prostate 30 Lung Tumor
    7 Stomach 15 Hepatic Vessel 23 Left Head of Femur 31 Colon Tumor
    8 Aorta 16 Right Lung 24 Right Head of Femur 32 Kidney Cyst

Training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 train.py --data_root_path DATA_DIR --dist True --uniform_sample

Testing

  1. Do the inference process directly adopt our trained model:
cd pretrained_weights/
wget https://huggingface.co/jwu2009/CDPDNet/resolve/main/cdpdnet.pth
cd ../
CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume pretrained_weights/cdpdnet.pth --store_result 
  1. Do the inference process using your own trained model:
CUDA_VISIBLE_DEVICES=0 python test.py --data_root_path DATA_DIR --resume CHECKPOINT_PATH --store_result 

Acknowledgement

We appreciate the effort of the following repositories in providing open-source code to the community:

Citation

If you find this repository useful, please consider citing this paper:

@misc{wu2025cdpdnetintegratingtextguidance,
      title={CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation}, 
      author={Jiong Wu and Yang Xing and Boxiao Yu and Wei Shao and Kuang Gong},
      year={2025},
      eprint={2505.18958},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.18958}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published