Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MedHelm: Implement medcalc bench scenario, metrics and specs #3207

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sashimono-san
Copy link

@sashimono-san sashimono-san commented Dec 11, 2024

This PR has the adds support to MedCalc-Bench benchmarking.

As much as possible, I maintained the implementation as done originally. For example, data is downloaded directly from huggingFace, and that the evaluation metric is basically a copy of the original benchmark repo. But in some cases, changes were necessary, as discussed in the following paragraphs.

The original benchmark repo) expects the model to answer in JSON format. But considering we do not want to necessarily evaluate models' ability to output JSON, prompts were adapted so the output is given in natural language. It would be great if someone can review the prompts, make sure they align with other Helm scenarios.

Originally, one-shot examples come from a pre-defined JSON, and a specific example is used for each question type (defined by the field Calculator Id in the dataset. I couldn't figure how to pass information, at runtime, from the Scenario to the RunSpec. Namely I did not find a way to extract from the Instance being used the Calculator Id and use it when building the inference prompt. Is this supported by the current implementation of Helm?

The original implementation has a special truncation logic for the one-shot method. They truncates the Patient note and the step by step explanation of the output depending on the model. This is currently not implemented, but already during local tests (with gpt2) I had issues with input length. Any suggestions on how to solve this?

@sashimono-san sashimono-san reopened this Dec 17, 2024
@sashimono-san sashimono-san changed the title feat: implement medcalc bench scenario, metrics and specs Implement medcalc bench scenario, metrics and specs Dec 17, 2024
@sashimono-san sashimono-san force-pushed the feat/medcalc_bench_scenario branch from 98d3f83 to 792fb4f Compare December 17, 2024 15:56
@sashimono-san sashimono-san changed the title Implement medcalc bench scenario, metrics and specs MedHelm: Implement medcalc bench scenario, metrics and specs Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant