[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

gaow0007 · 2024-09-18T10:13:21Z

Describe the issue

Hi, authors,
Thanks for your open-sourcing such excellent work. I am working on KV compression and expect to develop new algorithms on your repo. I am asking whether you can provide more exemplary code for other attention methods for the ease of comparison.

Best.

iofu728 · 2024-09-19T05:15:35Z

Hi @gaow0007, I apologize for not providing complete documentation. You can refer to the attn_type parameter in https://github.com/microsoft/MInference/blob/hjiang/support_vllm_tp/experiments/infinite_bench/args.py#L67.

For example, attn_type=dilated1/dilated2/streaming correspond to 'StreamingLLM w/ dilated', 'StreamingLLM w/ strided', and 'StreamingLLM' in the paper, respectively.
If you need to use snapkv, set use_snapkv=True and attn_type=minference_with_dense.

I hope this information is helpful to you.

gaow0007 added the question Further information is requested label Sep 18, 2024

iofu728 self-assigned this Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

gaow0007 commented Sep 18, 2024

iofu728 commented Sep 19, 2024

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv #76

Comments

gaow0007 commented Sep 18, 2024

Describe the issue

iofu728 commented Sep 19, 2024