passthrough arg from api to model.forward #838

qpwo · 2024-11-26T03:25:31Z

Please feel welcome to make any and all changes without consulting me!

Addresses #836

About this PR (and my limited understanding of how aphrodite works)

A passthrough will come in from the API and get to either SamplingParams or PoolingParams.
The goal is to pass it around and get it all the way to model_executable(**execute_model_kwargs).
So we need to get passthrough onto ModelInputForXPU, the open vino thing, ModelInputForGPU, ModelInputForNeuron, and ModelInputForTPU.
I only implemented distributed stuff for GPU I think.
A function has access to the passthrough if it gets SamplingParams or PoolingParams or LLMInputs or EncoderDecoderLLMInputs.
- I tried to keep redundant passing of passthrough to a minimum
I think `passthrough' travels between GPUs and nodes/machines along ModelInputForGPUBuilder.InterDataForSeqGroup?
Not sure this is how you're supposed to do all this.
Seems to work for me.

There are a couple little todos that might need done, i cant tell

qpwo · 2024-11-26T03:26:37Z

Here's a (rather long) video of it working

_00005.mp4

(Again, if I drop off, feel free to close it or rewrite it or whatever. No need to ask me for permission.)

qpwo · 2024-11-26T05:15:44Z

I should have time for one or two rounds of improvements though

qpwo · 2024-11-26T21:40:01Z

Lmk what I should do with this

qpwo · 2024-11-26T21:41:27Z

just rebased

AlpinDale

Left a few comments, mostly just nits for now.

This PR doesn't seem to be complete yet, so I can't make a decisive approval. Can you commit an example usage script for this? I'm still struggling to see how it can be used in its current state 😅

AlpinDale · 2024-11-26T22:40:02Z

aphrodite/common/passthrough.py

+    return passthrough
+
+from aphrodite.common.pooling_params import PoolingParams  # noqa: E402
+from aphrodite.common.sampling_params import SamplingParams  # noqa: E402


Why are these imported at the bottom?

prevent circular import

AlpinDale · 2024-11-26T22:41:05Z

aphrodite/endpoints/llm.py

@@ -92,6 +92,8 @@ class LLM:
        serving, use the :class:`~aphrodite.AsyncAphrodite` class instead.
    """

+    # TODO:Luke do we need `enable_passthrough_param` stuff in llm.py?


You can leave it for now, since passthrough seems to be a REST-API first-class.

AlpinDale · 2024-11-26T22:43:13Z

aphrodite/modeling/models/llama.py

@@ -340,6 +341,11 @@ def forward(
        hidden_states, _ = self.norm(hidden_states, residual)
        return hidden_states

+_printed = set()
+def print_once(key, *args, **kwargs):


We have a log_once() function in aphrodite.common.logger. You can use that instead of this.

50h100a · 2024-11-27T01:33:25Z

Setting aside for a moment ways the PR could be improved, I'm also struggling to find a motivating use-case for this as it is now.

Do you have a specific example of something this would allow you to do, that would be much harder without it?

qpwo · 2024-11-27T21:56:42Z

Do you have a specific example of something this would allow you to do, that would be much harder without it?

I am adding steering vectors to llama 405b. So I can just put save_activation=True or steer_with=1234 in passthrough. If PR is not a good fit then I'll close it and just use my fork.

AlpinDale · 2024-11-27T21:58:24Z

If you need Steering/Control Vectors in particular, I've been working on this PR to enable support for it: #604

I've been neglecting it for a while but I'll get back to it as soon as possible.

qpwo · 2024-11-27T22:01:30Z

Oh sweet! But yeah I wanted something a bit more flexible. Quickly make changes in the <model>.pycode itself. Hmm. So should I finish this PR or not?

qpwo · 2024-11-27T22:02:37Z

I also was adding custom parameters to the samplers. It's nice to not have to change code in 20 places to try something out. But this is your repo not mine! Lmk

AlpinDale · 2024-11-27T22:04:24Z

If you think you can make the changes as unintrusive as possible, I'm not against it. What do you think @50h100a ?

In case we do merge the PR, I'd appreciate it if you helped with the maintenance related to passthrough when needed. Since we don't have many maintainers, we try and add only the features we think we can comfortably maintain.

qpwo · 2024-11-27T22:09:27Z

Would be happy to try and patch it up when issues arise. I must admit I am still a bit confused about how the scheduling, queuing, inter-node comms, etc still work. Do you have a doc anywhere explaining like model_runner.py and scheduler.py?

qpwo · 2024-11-30T20:22:53Z

Everything that takes a passthrough has passthrough: Optional[Passthrough] = None, so I think this is a relatively safe PR to merge as-is, despite the big diff. (Biggest thing is I haven't tested non-cuda arches.) I have been using it the last few days and I've found it pretty handy.

My latest pattern is to have like passthrough.action: Save | Steer then in layer.forward I can do if action.type == 'steer': ... and it's a bit more ergonomic than using hooks or wrapper classes for everything. But it's really only practical for running little experiments.

I won't mind at all if you close it -- your project! I am not attached to this

qpwo · 2024-11-30T20:24:38Z

Oh by the way, the reason I am using aphrodite for my experiments is that it runs twice as fast as the leanest meanest llama inference code I could come up with! I realize that quick experiments are not your main target use case. But thanks a lot for all the cuda kernels

AlpinDale · 2024-12-01T02:02:16Z

Would be happy to try and patch it up when issues arise. I must admit I am still a bit confused about how the scheduling, queuing, inter-node comms, etc still work. Do you have a doc anywhere explaining like model_runner.py and scheduler.py?

There's unfortunately no explicit documentation for the internal sequence processing code, I'll get to it as soon as I can. For now, you'd have to rely on the docstrings and comments, which I think should be descriptive enough for most cases.

50h100a · 2024-12-04T23:58:38Z

This PR can be radically simplified by using kwargs unpacking in the model_executable calls in the model runners - no need to make changes to every single model implementation. Just use **passthrough, treating it as a possibly-empty dict of extra args.
As it's a debug feature, failing ungracefully when unsupported kwargs are passed to a model seems acceptable.

I suspect the things upstream can be simplified as well, and certainly generalized into an anonymous dict, but I'm on less familiar ground there.

qpwo · 2024-12-05T00:07:56Z

Yeah I would've done that if I knew how. I don't really understand the code so I was just mimicking the style of the other stuff passed explicitly. If you did the kwargs thing then the diff might improve my understanding

qpwo · 2024-12-05T00:09:48Z

I am confused also about why most stuff gets explicitly tensorified/detensorified and the code still ran when i just didn't do that at all. Maybe it relates to torch.compile or XPUs or something?

AlpinDale self-requested a review November 26, 2024 16:13

qpwo added 2 commits November 26, 2024 21:40

passthrough arg from api to model.forward

f4fa979

fix ruff formatting etc

3673672

qpwo force-pushed the add-passthrough branch from b370efb to 3673672 Compare November 26, 2024 21:41

AlpinDale reviewed Nov 26, 2024

View reviewed changes

remove old todos

598ec57

qpwo closed this Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

passthrough arg from api to model.forward #838

passthrough arg from api to model.forward #838

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

AlpinDale left a comment

AlpinDale Nov 26, 2024

qpwo Nov 27, 2024

AlpinDale Nov 26, 2024

AlpinDale Nov 26, 2024

50h100a commented Nov 27, 2024

qpwo commented Nov 27, 2024

AlpinDale commented Nov 27, 2024 •

edited

Loading

qpwo commented Nov 27, 2024

qpwo commented Nov 27, 2024

AlpinDale commented Nov 27, 2024

qpwo commented Nov 27, 2024

qpwo commented Nov 30, 2024

qpwo commented Nov 30, 2024

AlpinDale commented Dec 1, 2024 •

edited

Loading

50h100a commented Dec 4, 2024

qpwo commented Dec 5, 2024

qpwo commented Dec 5, 2024

passthrough arg from api to model.forward #838

passthrough arg from api to model.forward #838

Conversation

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

qpwo commented Nov 26, 2024

AlpinDale left a comment

Choose a reason for hiding this comment

AlpinDale Nov 26, 2024

Choose a reason for hiding this comment

qpwo Nov 27, 2024

Choose a reason for hiding this comment

AlpinDale Nov 26, 2024

Choose a reason for hiding this comment

AlpinDale Nov 26, 2024

Choose a reason for hiding this comment

50h100a commented Nov 27, 2024

qpwo commented Nov 27, 2024

AlpinDale commented Nov 27, 2024 • edited Loading

qpwo commented Nov 27, 2024

qpwo commented Nov 27, 2024

AlpinDale commented Nov 27, 2024

qpwo commented Nov 27, 2024

qpwo commented Nov 30, 2024

qpwo commented Nov 30, 2024

AlpinDale commented Dec 1, 2024 • edited Loading

50h100a commented Dec 4, 2024

qpwo commented Dec 5, 2024

qpwo commented Dec 5, 2024

AlpinDale commented Nov 27, 2024 •

edited

Loading

AlpinDale commented Dec 1, 2024 •

edited

Loading