Robustness of Clinical and non-Clinical Instruction Finetuned Models

Repository for the paper titled "Open (Clinical) LLMs are Sensitive to Instruction Phrasings", to appear in the BioNLP 2024 workshop as part of ACL 2024.

Create the environment

conda env create -f _env/env.yml
conda activate robust
pip install -r requirements_torch.txt 
pip install -r requirements_base.txt

Downloading the data

Download the data from https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ and place it under the following path (relative to the repository's root):

./datasets/n2c2_raw/
./datasets/n2c2_raw/2006
...
./datasets/n2c2_raw/2018

Preprocessing the data:

To preprocess the data, ensure you've created the above folders and then run all the notebooks present under preprocessing_notebooks.

For the MIMIC-III mortality prediction in particular, we follow the preprocessing here: https://github.com/MLforHealth/HurtfulWords/blob/master/scripts/make_targets.py

Running inference

Having preprocessed the data, you can run inference as follows:

python inference_auto.py --annotator="Annotator_9" --model="clinical-camel-7b" --root_dir="./results" # clf tasks
python inference_auto.py --annotator="Annotator_9" --model="clinical-camel-7b" --root_dir="./results"  # ie tasks

For ease of use, we provide a SLURM submit file that enables running the entire study by queueing up to N simultaneous jobs. A non-SLURM script can be easily derived from it if required.

Be aware that, under the provided configuration (bfloat16, batch size 1) the pipeline will require at least one A6000, A100 80 GB or H100 GPU. We cannot guarantee reproducibility of the results using quantization to fit lower capacity GPUs, although they should be fairly similar. Moreover, the pipeline has only been tested with up to batch size 1 and is not likely to work with higher values due to how we run evaluation in the binary classification case.

The following models are supported (should be referenced using the strings below). The actual model versions are the ones described in the paper (e.g., we're using Mistral 7b instruct 0.2, not the base version):

"Llama-2-7b-chat-hf",
"Llama-2-13b-chat",
"mistral-7b",
"asclepius",
"clinical-camel-7b",
"alpaca-7b",
"medalpaca-7b",

Postprocessing and results

To post-process the outputs, run the following commands from the root of the repository:

mkdir postprocessing/processed_csvs
python ./postprocessing/eval_script_extract.py

After that, you can access the notebooks and run them to produce the relevant figures.

For ease of reproducibility, we include pre-computed results as csv files under postprocessing/processed_csvs.

After running each notebook, the relevant figures will be stored under postprocessing/plots/.

Contact

If required, contact authors Monica Munnangi or Alberto Ceballos Arroyo via the email addresses listed in the paper or open a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
_env		_env
instructions		instructions
modules		modules
postprocessing		postprocessing
preprocessing_notebooks		preprocessing_notebooks
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
inference_auto.py		inference_auto.py
inference_auto_ie.py		inference_auto_ie.py
inference_sbatch.py		inference_sbatch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robustness of Clinical and non-Clinical Instruction Finetuned Models

Create the environment

Downloading the data

Preprocessing the data:

Running inference

Postprocessing and results

Contact

About

Releases

Packages

Contributors 2

Languages

alceballosa/clin-robust

Folders and files

Latest commit

History

Repository files navigation

Robustness of Clinical and non-Clinical Instruction Finetuned Models

Create the environment

Downloading the data

Preprocessing the data:

Running inference

Postprocessing and results

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages