-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation is suspiciously slow for long sequences #23
Comments
Ok I'll do some experiments too and get back to you. Just to double check, you are giving the same prompt to GPT-2 and BioMedLM and running generate and those numbers are the ratio between the 2 models? Just this week I have been spending a lot of time working on BioMedLM's generative abilities for downstream tasks ... I actually feel it is most useful for scenarios like reading a PubMed abstract and printing out a list of relations derived from the abstract for instance ... |
BioMedLM out of the box should just literally be running the same code as GPT-2 since it is just a GPT-2 model with different weights and different tokenizer ... it has a smaller vocabulary than GPT-2 ... we could also compare to GPT Neo 2.7B ... |
And what exactly are the input --> outputs ? Are BioMedLM and GPT-2 XL producing text of similar length or is there a difference in average output length? I don't think setting |
Yes, to both
I laughed when I read this, because I'm doing this exactly. I just wanted to provide a minimal example.
This is what I expected—and why I'm confused about the difference in speed.
For my minimal example, they are producing lengths within 2 tokens of each other, so I don't think sequence length accounts for it (also my code prints out number of generated tokens). I'm guessing this is a special tokens difference. |
I am trying to use BioMedLM for generation, but I find that it is very slow at generation for long sequences. Training occurs at a normal speed. I wrote a minimal program (below) to reproduce this, comparing it to GPT-2 (1.5B parameters) and Flan T5-XL (3B parameters) for comparison. I varied the maximum generation length value, and estimated the ratio of the durations of the decoder models (BioMedLM divided by GPT-2):
1024 tokens: 5.9
512 tokens: 3.2
256 tokens: 1.9
128 tokens: 1.3
64 tokens: 1.01
Anecdotally, the generation speed is similar to that of Flan UL2, a 20B parameter model.
I'd like to fix this—I don't know if the issue is in the the BioMedLM code, my software/environment versions/settings, or my hardware A100-80GB.
The text was updated successfully, but these errors were encountered: