Skip to content

embedded-robotics/path-rag

Repository files navigation

Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering


Accurate diagnosis and prognosis assisted by pathology images are essential for cancer treatment selection and planning. Despite the recent trend of adopting deep-learning approaches for analyzing complex pathology images, they fall short as they often overlook the domain-expert understanding of tissue structure and cell composition. In this work, we focus on a challenging Open-ended Pathology VQA (PathVQA-Open) task and propose a novel framework named Path-RAG, which leverages HistoCartography to retrieve relevant domain knowledge from pathology images and significantly improves performance on PathVQA-Open. Admitting the complexity of pathology image analysis, Path-RAG adopts a human-centered AI approach by retrieving domain knowledge using HistoCartography to select the relevant patches from pathology images. Our experiments suggest that domain guidance can significantly boost the accuracy of LLaVA-Med from 38% to 47%, with a notable gain of 28% for H&E-stained pathology images in the PathVQA-Open dataset. For longer-form question and answer pairs, our model consistently achieves significant improvements of 32.5% in ARCH-Open PubMed and 30.6% in ARCH-Open Books on H&E images.


Awais Naeem*, Tianhao Li*, Huang-Ru Liao*, Jiawei Xu*, Aby Mammen Mathew*, Zehao Zhu*, Zhen Tan**, Ajay Jaiswal*, Raffi Salibian*** , Ziniu Hu*** , Tianlong Chen****, Ying Ding*

*University of Texas at Austin, USA
**Arizona State University, USA
***University of California, Los Angeles, USA
****Massachusetts Institute of Technology, USA


Path-RAG Implementation

1. Clone this repository and navigate to path-rag folder

git clone https://github.com/embedded-robotics/path-rag.git
cd path-rag

2. Install Package: Create conda environment

conda create -n path-rag python=3.10 -y
conda activate path-rag
pip install --upgrade pip # enable PEP 660 support for LLaVA-Med

3. Download the PathVQA dataset from the following link

PathVQA Dataset

4. Clone the HistoCartography tool, setup the model checkpoints in histocartography/checkpoints and install the dependencies

git clone https://github.com/BiomedSciAI/histocartography

5. Clone the LLaVA-Med repository and install the dependencies

git clone https://github.com/microsoft/LLaVA-Med

6. Download the LLaMA-7B model and weights from HuggingFace

python llama_7B_model_weights.py # LLaMA-7B weights/model stored into $HF_HOME (By Default $HF_HOME = ~/.cache/huggingface)

7. Download LLaVA-Med delta weights llava_med_in_text_60k_ckpt2_delta and pvqa-9epoch_delta from https://github.com/microsoft/LLaVA-Med#model-download. Put them inside a folder named model_delta_weights

8. Apply the LLaVA-Med delta weights to base LLaMA-7B to come up with the final weights for LLaVA-Med

cd LLaVA-Med

LLaVA-Med pre-trained on general biomedicine data

!python3 -m llava.model.apply_delta \
    --base ~/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16 \
    --target ../final_models/llava_med \
    --delta ../model_delta_weights/llava_med_in_text_60k_ckpt2_delta

LLaVA-Med fine-tuned on PathVQA

!python -m llava.model.apply_delta \
    --base ~/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16 \
    --target ../final_models/llava_med_pvqa \
    --delta ../model_delta_weights/pvqa-9epoch_delta
cd ..

9. Generate the top patches for open-ended PathVQA images using HistoCartography

python generate_histo_patches.py

10. Generate the files for query to be asked for LLaVA-Med for both the images and patches

python generate_llava_med_query.py

11. Now we need to generate the answer for all the query files using raw model (final_models/llava_med) and fine-tuned model (final_models/llava_med_pvqa)

cd LLaVA-Med

Raw Model

python llava/eval/model_vqa.py --model-name ../final_models/llava_med \
    --question-file ../files/query/image_direct.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/raw/answer_image_direct.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med \
    --question-file ../files/query/patch_direct.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/raw/answer_patch_direct.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med \
    --question-file ../files/query/image_description.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/raw/answer_image_description.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med \
    --question-file ../files/query/patch_description.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/raw/answer_patch_description.jsonl

Fine-Tuned Model

python llava/eval/model_vqa.py --model-name ../final_models/llava_med_pvqa \
    --question-file ../files/query/image_direct.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/fine-tuned/answer_image_direct.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med_pvqa \
    --question-file ../files/query/patch_direct.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/fine-tuned/answer_patch_direct.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med_pvqa \
    --question-file ../files/query/image_description.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/fine-tuned/answer_image_description.jsonl
python llava/eval/model_vqa.py --model-name ../final_models/llava_med_pvqa \
    --question-file ../files/query/patch_description.jsonl \
    --image-folder ../pvqa/images/test \
    --answers-file ../files/answer/fine-tuned/answer_patch_description.jsonl

12. Evaluate the results for different use-cases using recall_calculation.py

(i) Path-RAG w/o GPT: Combine the answer of image + all patches to be the final predicted answer
(ii) Path-RAG (description): Combine the description of image + all patches. Then involve GPT-4 for reasoning to ge the final predicted answer (See Supplementary Section for Prompts)
(iii) Path-RAG (answer): Combine the answer of image + all patches. Then involve GPT-4 for reasoning to ge the final predicted answer (See Supplementary Section for Prompts)

ARCH-Open Dataset

  1. Download the books_set and pubmed_set of ARCH dataset from https://warwick.ac.uk/fac/cross_fac/tia/data/arch. Store both of these folders in a folder named arch. Both books_set and pubmed_set contains captions.json which lists a caption and a UUID, whereas UUID represents the file name in images folder and caption represents the description of the image.

  2. Using captions.json and images folder under arch/books_set, run the notebooks ARCH-OPEN/books_data/synthetic_data_textbook.ipynb by specifying the OpenAI credentials to generate the question-answer pairs for books set

  3. Using captions.json and images folder under arch/pubmed_set, run the notebooks ARCH-OPEN/pubmed_data/synthetic_data_pubmed.ipynb by specifying the OpenAI credentials to generate the question-answer pairs for pubmed set

  4. Run the notebook ARCH-OPEN/synthetic_data_compilation.ipynb to compile the pubmed and books question-answer pairs into json files namely ARCH-OPEN/pubmed_qa_pairs.json and ARCH-OPEN/textbook_qa_pairs.json. These files are already provided to be used directly

  5. The pubmed_qa_pairs.json and textbook_qa_pairs.json files contain 5 question-pairs for each pair of caption and uuid (refers to image name in arch data arch/pubmed_set/images, arch/books_set/images) in the following format (for both pubmed_set and books_set):

  {
    "figure_id": "00",
    "letter": "A",
    "caption": " A, Spindle cell variant of embryonal rhabdomyosarcoma is characterized by fascicles of eosinophilic spindle cells (B), some of which can show prominent paranuclear vacuolisation, as seen in leiomyosarcoma.",
    "uuid": "890e2e79-ab0a-4a2e-9d62-b0b6b3d43884",
    "Question_1": "What could be the general shape of cells in a spindle cell variant of embryonal rhabdomyosarcoma as seen in the image?",
    "Answer_1": "The cells often present with a spindle-like elongated shape.",
    "Question_2": "What type of structures could be visible in the image indicating the presence of spindle cells?",
    "Answer_2": "Fascicles, or bundles, of cells could be visible in the image, indicating the presence of spindle cells.",
    "Question_3": "Where in the cell would we likely find paranuclear vacuolisation in the image?",
    "Answer_3": "Paranuclear vacuolisation is usually seen around the nucleus area of the cell.",
    "Question_4": "What color might the spindle cells appear in the image?",
    "Answer_4": "The spindle cells may appear eosinophilic, or pinkish-red, under the microscope due to staining.",
    "Question_5": "What visual feature might differentiate spindle cells from leiomyosarcoma cells in the image?",
    "Answer_5": "Spindle cells might show prominent paranuclear vacuolisation, a feature that can differentiate them from leiomyosarcoma cells."
  }

Acknolwdgement

We would like to acknowledge the following funding supports: NIH OT2OD032581, NIH OTA-21-008, NIH 1OT2OD032742-01, NSF 2333703, NSF 2303038.

About

Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published