-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distilling PubMedGPT #3
Comments
We are very committed to helping people use the model and I think part of this project is just figuring out how to make a large scale model like this useful for a larger research community. A simple solution would be for us to release one of the smaller models we trained on the way to the 2.7B. This would come at the cost of reduced task performance. There are two aspects of this problem, handling the fine-tuning and handling the inference. For fine-tuning, one possible way forward could be for us to fine-tune several biomedical task models (e.g. QA, summarization) ... and then make those fine-tuned available to researchers. You could imagine making a general biomedical QA model, and then if the user puts their custom QA task into the proper format, they could get reasonable results. I can't make any promises, but another possible direction is for users to give us their task data (if it is not private) and we can fine-tune models for them to make the model more accessible. I am asking if that is feasible for cases where it would only take us 30m-1h. For inference, I think we could explore the kinds of things Tim Dettmers is working on, for instance making an 8bit version of the model for inference time. This would greatly reduce the resources needed to run inference. Please feel free to let us know what projects you are working on and we can see what we can do to help make the model useful for you ! |
Hi, I was thinking about distillation as a potential way of reducing size while keeping the performance as high as possible. |
Okay I understand. We're open minded about looking into that but may not have the time to get it working. At the moment, this is the best resource I know for trying a distillation experiment: https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation Is there anything better you know of? |
Thank you very much for this great work and for publishing the model!
Do you have any plans of training / publishing a distilled version of your model, as the current size requires a lot of recources?
The text was updated successfully, but these errors were encountered: