Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a UMLS based retriever to enhance MedQA-USMLE performance #14

Open
manusikka opened this issue Mar 25, 2023 · 1 comment
Open

Comments

@manusikka
Copy link

We intended to supplement this MedQA-USMLE evaluation with a search on UMLS.
UMLS is a large biomedical corpus and can be queried using API’s. We expect the
new accuracy to go above the baseline, when adding searched medical term descriptions to the prompt during evaluation.

The techniques for adding additional context is:
a) for each answer choice, do a lookup against UMLS, and find additional context, based on the answer choice. Concatenate new context with each question/answer pair.

For example prompt is the original question and actual answer is c3: Common iliac artery aneurysm
The UMLS retriever returns searcher1 for c1, searcher2 for c2, searcher3 for c3, searcher4 for c4

prompt: A 68-year-old male comes to the physician for evaluation of right flank pain. He has a history of diabetes and peripheral artery disease. His blood pressure is 160/90 mm Hg. Physical examination shows abdominal tenderness and right flank tenderness. An ultrasound shows dilation of the right ureter and renal pelvis. Which of the following is the most likely underlying cause of this patient's condition? c1: Renal artery stenosis c2: Benign prostatic hyperplasia c3: Common iliac artery aneurysm c4: Urethral stricture
searcher1: Narrowing of a main artery in the kidney. searcher2: Obstructive nephropathy which has developed in a patient with evidence of bladder outflow obstruction caused by benign prostatic hypertrophy. searcher3: An artery arising from the bifurcation of the abdominal aorta which then bifurcates forming the internal and external iliac arteries. searcher4: Narrowing of the urethra associated with inflammation or scar tissue. [HPO:probinson]
predicted_label: 0 actual_label: 2

Our hypothesis is that the model will have better accuracy when answers are supplemented with the definitions from searcher1 , searcher2 etc.

Here is how the supplemented test data with searcher results looks like (see bolded above and below)
{"id": "test-00006", "sent1": "A 68-year-old male comes to the physician for evaluation of right flank pain. He has a history of diabetes and peripheral artery disease. His blood pressure is 160/90 mm Hg. Physical examination shows abdominal tenderness and right flank tenderness. An ultrasound shows dilation of the right ureter and renal pelvis. Which of the following is the most likely underlying cause of this patient's condition?", "sent2": "", "ending0": "Narrowing of a main artery in the kidney. Renal artery stenosis", "ending1": "Obstructive nephropathy which has developed in a patient with evidence of bladder outflow obstruction caused by benign prostatic hypertrophy. Benign prostatic hyperplasia", "ending2": "An artery arising from the bifurcation of the abdominal aorta which then bifurcates forming the internal and external iliac arteries. Common iliac artery aneurysm", "ending3": "Narrowing of the urethra associated with inflammation or scar tissue. [HPO:probinson] Urethral stricture", "label": 2}

However, the accuracy actually slightly drops when using the retriever.
Do we need to change any command line parameters because the answers are longer?Any thoughts would be welcome on why we are not seeing improvement in results
Here is what we used:
deepspeed --num_gpus 1 --num_nodes 1 run_multiple_choice.py
--tokenizer_name stanford-crfm/pubmed_gpt_tokenizer
--model_name_or_path "/content/drive/MyDrive/Colab Notebooks/SavedModel300"
--train_file $datadir/train300.json
--validation_file $datadir/dev300.json
--test_file $datadir/test300newRetr.json
--do_predict
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--learning_rate 2e-06
--warmup_ratio 0.5
--num_train_epochs 20
--max_seq_length 560
--logging_steps 100
--save_strategy no
--evaluation_strategy no
--output_dir medqa-finetune-demo
--overwrite_output_dir
--fp16
--seed 1
--run_name medqa-finetune-demo
--deepspeed deepspeed_config.json

@J38
Copy link
Contributor

J38 commented Mar 27, 2023

I think you need to supplement all of the training data and fine tune a model on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants