RaDialog-LLaVA is the improved version of the original RaDialog model which can be found on Github and Arxiv. It follows the same concepts, including the same image encoder, chexbert classifier, prompt construction and language model. However, we followed the LLaVA methodolgy for image-text alignment, leading to improved conversational assistance and making the model easier use. The main differences are the following:
- image projection: instead of the BLIP-inspired alignment module, we follow the LLaVA approach and use a simple MLP projection to project the image features to the language model input size, leading to more image tokens.
- the image encoder is fine-tuned during LLM training
- the model was trained on an updated version of the RaDialog-Instruct dataset with three additional instruct tasks: impression generation, view classification and Rad-ReStruct QA
✨ News ✨
- 29 May 2024: RaDialog-LLaVA is now available on Hugging Face
Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.
To test RaDialog and use it for inference, follow the instructions in our huggingface repository here.
For more detailed instructions on how to train and evaluate RaDialog, please refer to the instructions below.
- clone this repository and move to the radialog directory with
cd RaDialog_LLaVA
- Install the RaDialog environment with
conda create --name radialog python=3.10
- Activate the environment with
conda activate radialog
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
- Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0)
- install flashattention following https://github.com/Dao-AILab/flash-attention
- Install the CheXbert environment with
conda create --name chexbert python=3.7
- Activate the environment with
conda activate chexbert
- Move to the chexbert directory with
cd chexbert
- Install the requirements with
pip install -r requirements.txt
- Set the absolute path to the chexbert env and folder in
RaDialog_LLaVA/local_config.py
- Download chexbert.pth from here and place it in in RaDialog_LLaVA/chexbert/src/checkpoint/
- Download the instruct dataset from PhysioNet
- unzip it place it in RaDialog_LLaVA/data/
- in train.sh set the path to the instruct dataset (e.g. --data_path home/RaDialog_LLaVA/data/mimic_cxr_instruct_stratified.json)
- Download the MIMIC-CXR-JPG dataset from here
- The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg
- Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz
- from here, dowload mimic-cxr-reports.zip
- unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/)
- in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/)
- in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 )
- go to the mimic-cxr folder in the code with
cd mimic-cxr
- run
python create_section_files.py
to prepare the report data - go back to the RaDialog_LLaVA directory with
cd ..
- RaDialog-INS:
run
python -m test --prompt img_matching_examples_ig2_noexamples_IMG_findings --split "test" --vision_tower biovil
- RaDialog-INS (downstream tasks):
add
--do_corr
,--do_cp_bin_qa
,--do_cp_all_qa
,--do_view_class
or--do_impression
respectively to the command above
- run
python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"
- in chexpert_train.py set ckpt_path (line 152) to the path of the model you just trained
- then run
python -m findings_classifier.chexpert_train --run_name "save_preds"
to save the predictions of the trained model
- move to LLaVA directory with
cd LLaVA
- in train.sh set PYTHONPATH to the path of the RaDialog_LLaVA directory
- run ./train.sh to start training the model
- we used checkpoint-21000
- Install the RaDialog demo environment with
conda create --name radialog_demo python=3.10
- Activate the environment with
conda activate radialog_demo
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
- pip install -r requirements_demo.txt
- run
python demo.py
to start the demo - connect to the demo with a browser at
http://127.0.0.1:7861
and start chatting with RaDialog
When using our model (original and LLaVA version) or dataset, please cite:
@article{pellegrini2023radialog,
title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
journal={arXiv preprint arXiv:2311.18681},
year={2023}
}