MedHelm: Implement medcalc bench scenario, metrics and specs #3207

sashimono-san · 2024-12-11T16:05:09Z

This PR has the adds support to MedCalc-Bench benchmarking.

As much as possible, I maintained the implementation as done originally. For example, data is downloaded directly from huggingFace, and that the evaluation metric is basically a copy of the original benchmark repo. But in some cases, changes were necessary, as discussed in the following paragraphs.

The original benchmark repo) expects the model to answer in JSON format. But considering we do not want to necessarily evaluate models' ability to output JSON, prompts were adapted so the output is given in natural language. It would be great if someone can review the prompts, make sure they align with other Helm scenarios.

Originally, one-shot examples come from a pre-defined JSON, and a specific example is used for each question type (defined by the field Calculator Id in the dataset. I couldn't figure how to pass information, at runtime, from the Scenario to the RunSpec. Namely I did not find a way to extract from the Instance being used the Calculator Id and use it when building the inference prompt. Is this supported by the current implementation of Helm?

The original implementation has a special truncation logic for the one-shot method. They truncates the Patient note and the step by step explanation of the output depending on the model. This is currently not implemented, but already during local tests (with gpt2) I had issues with input length. Any suggestions on how to solve this?

feat: implement medcalc bench scenario, metrics and specs

98d7d0b

sashimono-san closed this Dec 11, 2024

feat: med calc bench one shot spec

bb15f35

sashimono-san reopened this Dec 17, 2024

sashimono-san changed the title ~~feat: implement medcalc bench scenario, metrics and specs~~ Implement medcalc bench scenario, metrics and specs Dec 17, 2024

fix: dataset loading and standardize naming

792fb4f

sashimono-san force-pushed the feat/medcalc_bench_scenario branch from 98d3f83 to 792fb4f Compare December 17, 2024 15:56

sashimono-san changed the title ~~Implement medcalc bench scenario, metrics and specs~~ MedHelm: Implement medcalc bench scenario, metrics and specs Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedHelm: Implement medcalc bench scenario, metrics and specs #3207

MedHelm: Implement medcalc bench scenario, metrics and specs #3207

sashimono-san commented Dec 11, 2024 •

edited

Loading

MedHelm: Implement medcalc bench scenario, metrics and specs #3207

Are you sure you want to change the base?

MedHelm: Implement medcalc bench scenario, metrics and specs #3207

Conversation

sashimono-san commented Dec 11, 2024 • edited Loading

sashimono-san commented Dec 11, 2024 •

edited

Loading