-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
passthrough arg from api to model.forward #838
Conversation
Here's a (rather long) video of it working _00005.mp4(Again, if I drop off, feel free to close it or rewrite it or whatever. No need to ask me for permission.) |
I should have time for one or two rounds of improvements though |
Lmk what I should do with this |
b370efb
to
3673672
Compare
just rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, mostly just nits for now.
This PR doesn't seem to be complete yet, so I can't make a decisive approval. Can you commit an example usage script for this? I'm still struggling to see how it can be used in its current state 😅
return passthrough | ||
|
||
from aphrodite.common.pooling_params import PoolingParams # noqa: E402 | ||
from aphrodite.common.sampling_params import SamplingParams # noqa: E402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these imported at the bottom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prevent circular import
aphrodite/endpoints/llm.py
Outdated
@@ -92,6 +92,8 @@ class LLM: | |||
serving, use the :class:`~aphrodite.AsyncAphrodite` class instead. | |||
""" | |||
|
|||
# TODO:Luke do we need `enable_passthrough_param` stuff in llm.py? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave it for now, since passthrough seems to be a REST-API first-class.
@@ -340,6 +341,11 @@ def forward( | |||
hidden_states, _ = self.norm(hidden_states, residual) | |||
return hidden_states | |||
|
|||
_printed = set() | |||
def print_once(key, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a log_once()
function in aphrodite.common.logger
. You can use that instead of this.
Setting aside for a moment ways the PR could be improved, I'm also struggling to find a motivating use-case for this as it is now. Do you have a specific example of something this would allow you to do, that would be much harder without it? |
I am adding steering vectors to llama 405b. So I can just put |
If you need Steering/Control Vectors in particular, I've been working on this PR to enable support for it: #604 I've been neglecting it for a while but I'll get back to it as soon as possible. |
Oh sweet! But yeah I wanted something a bit more flexible. Quickly make changes in the |
I also was adding custom parameters to the samplers. It's nice to not have to change code in 20 places to try something out. But this is your repo not mine! Lmk |
If you think you can make the changes as unintrusive as possible, I'm not against it. What do you think @50h100a ? In case we do merge the PR, I'd appreciate it if you helped with the maintenance related to passthrough when needed. Since we don't have many maintainers, we try and add only the features we think we can comfortably maintain. |
Would be happy to try and patch it up when issues arise. I must admit I am still a bit confused about how the scheduling, queuing, inter-node comms, etc still work. Do you have a doc anywhere explaining like model_runner.py and scheduler.py? |
Everything that takes a My latest pattern is to have like I won't mind at all if you close it -- your project! I am not attached to this |
Oh by the way, the reason I am using aphrodite for my experiments is that it runs twice as fast as the leanest meanest llama inference code I could come up with! I realize that quick experiments are not your main target use case. But thanks a lot for all the cuda kernels |
There's unfortunately no explicit documentation for the internal sequence processing code, I'll get to it as soon as I can. For now, you'd have to rely on the docstrings and comments, which I think should be descriptive enough for most cases. |
This PR can be radically simplified by using kwargs unpacking in the I suspect the things upstream can be simplified as well, and certainly generalized into an anonymous dict, but I'm on less familiar ground there. |
Yeah I would've done that if I knew how. I don't really understand the code so I was just mimicking the style of the other stuff passed explicitly. If you did the kwargs thing then the diff might improve my understanding |
I am confused also about why most stuff gets explicitly tensorified/detensorified and the code still ran when i just didn't do that at all. Maybe it relates to torch.compile or XPUs or something? |
Please feel welcome to make any and all changes without consulting me!
Addresses #836
About this PR (and my limited understanding of how aphrodite works)
passthrough
will come in from the API and get to either SamplingParams or PoolingParams.model_executable(**execute_model_kwargs)
.passthrough
ontoModelInputForXPU
, the open vino thing,ModelInputForGPU
,ModelInputForNeuron
, andModelInputForTPU
.passthrough
if it gets SamplingParams or PoolingParams or LLMInputs or EncoderDecoderLLMInputs.There are a couple little todos that might need done, i cant tell