Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

Open
gaow0007 opened this issue Sep 18, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@gaow0007
Copy link

Describe the issue

Hi, authors,
Thanks for your open-sourcing such excellent work. I am working on KV compression and expect to develop new algorithms on your repo. I am asking whether you can provide more exemplary code for other attention methods for the ease of comparison.

Best.

@gaow0007 gaow0007 added the question Further information is requested label Sep 18, 2024
@iofu728 iofu728 self-assigned this Sep 19, 2024
@iofu728
Copy link
Contributor

iofu728 commented Sep 19, 2024

Hi @gaow0007, I apologize for not providing complete documentation. You can refer to the attn_type parameter in https://github.com/microsoft/MInference/blob/hjiang/support_vllm_tp/experiments/infinite_bench/args.py#L67.

For example, attn_type=dilated1/dilated2/streaming correspond to 'StreamingLLM w/ dilated', 'StreamingLLM w/ strided', and 'StreamingLLM' in the paper, respectively.
If you need to use snapkv, set use_snapkv=True and attn_type=minference_with_dense.

I hope this information is helpful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants