Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLLM Sampled Tokens #311

Open
wants to merge 2 commits into
base: 0.4
Choose a base branch
from
Open

VLLM Sampled Tokens #311

wants to merge 2 commits into from

Conversation

AdamBelfki3
Copy link
Collaborator

  • Fixed the narrowing of the logits module output to reflect accurately the batch_groups.
with vllm_gpt2.trace(temperature=0.0, top_p=1.0, max_tokens=3) as tracer:
    with tracer.invoke(
        [
            "Madison Square Garden is located in the city of", 
            "The Eiffel Tower is located in the city of",
        ]
    ):

        logits_1 = nnsight.list().save()

        for ii in range(3):
            logits_1.append(vllm_gpt2.logits.output)
            vllm_gpt2.logits.next()

    with tracer.invoke("Rome is the capital city of"):
        logits_2 = nnsight.list().save()

        for ii in range(5):
            logits_2.append(vllm_gpt2.logits.output)
            vllm_gpt2.logits.next()

assert all(logit.shape[0] == 2 for logit in logits_1)
assert all(logit.shape[0] == 1 for logit in logits_2)

Previous to this fix, the logit output inside each invoker would contain the logits from all the prompts passed in the entire trace.

  • Added traceability for sampled tokens. vLLM provides functionality to configure how each sequence samples its next token. Here's an example of how you can trace that operation with the nnsight VLLM wrapper.
with vllm_gpt2.trace("Madison Square Garden is located in the city of", temperature=0.8, top_p=0.95, max_tokens=3) as tracer:
    samples = nnsight.list().save()
    for ii in range(3):
        samples.append(vllm_gpt2.samples.output)
        vllm_gpt2.samples.next()

print(samples)
>>> [tensor([16940]), tensor([319]), tensor([262])]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant