Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distilling PubMedGPT #3

Open
ChantalMP opened this issue Dec 23, 2022 · 3 comments
Open

Distilling PubMedGPT #3

ChantalMP opened this issue Dec 23, 2022 · 3 comments

Comments

@ChantalMP
Copy link

Thank you very much for this great work and for publishing the model!
Do you have any plans of training / publishing a distilled version of your model, as the current size requires a lot of recources?

@J38
Copy link
Contributor

J38 commented Dec 24, 2022

We are very committed to helping people use the model and I think part of this project is just figuring out how to make a large scale model like this useful for a larger research community.

A simple solution would be for us to release one of the smaller models we trained on the way to the 2.7B. This would come at the cost of reduced task performance.

There are two aspects of this problem, handling the fine-tuning and handling the inference.

For fine-tuning, one possible way forward could be for us to fine-tune several biomedical task models (e.g. QA, summarization) ... and then make those fine-tuned available to researchers. You could imagine making a general biomedical QA model, and then if the user puts their custom QA task into the proper format, they could get reasonable results. I can't make any promises, but another possible direction is for users to give us their task data (if it is not private) and we can fine-tune models for them to make the model more accessible. I am asking if that is feasible for cases where it would only take us 30m-1h.

For inference, I think we could explore the kinds of things Tim Dettmers is working on, for instance making an 8bit version of the model for inference time. This would greatly reduce the resources needed to run inference.

Please feel free to let us know what projects you are working on and we can see what we can do to help make the model useful for you !

@ChantalMP
Copy link
Author

Hi,
I wanted to use the model as a decoder for medical VQA, where I would need to fine-tune it to also take into account the image information. Just fine-tuning few layers is a possibility but first of all this might harm the performance and also it is still very slow for me because of the model size. This is just one example for applications where it would be beneficial to have a model small enough for fine-tuning.

I was thinking about distillation as a potential way of reducing size while keeping the performance as high as possible.

@J38
Copy link
Contributor

J38 commented Jan 5, 2023

Okay I understand. We're open minded about looking into that but may not have the time to get it working.

At the moment, this is the best resource I know for trying a distillation experiment: https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation

Is there anything better you know of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants