[Question]: analysis of attention scores (too sparse) #82

wiluen · 2024-10-19T11:45:04Z

Describe the issue

I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable information, such as more attention on what kinds of tokens. Considering that a model has n layers and m and attention head, how can I gain some valuable insights?
my task is to extracting important information from the input I provide

wiluen · 2024-10-19T13:19:42Z

do difference attention head\ different layers matters ?

iofu728 · 2024-10-21T03:21:35Z

Hi @wiluen, thanks for your question.

If I understand correctly, you're asking how to determine which parts of the attention weights are more important to preserve, especially in highly sparse scenarios.

In MInference, we don’t perform fine-tuned adjustments. Most heads use the same kernel sparsity rate. However, we replace block sparsity with a higher-budget VS pattern for certain heads, as we found that allocating more resources to these heads can significantly improve performance.
There are several related works exploring this direction, including:
- KV cache compression: PyramidKV, RetrievalAttention
- Sparse Attention: RetrievalHead, DuoAttention, RazorAttention

You can evaluate the impact of small sparse attention weight values in different heads from an end-to-end perspective to measure their importance.

I hope this helps!

wiluen added the question Further information is requested label Oct 19, 2024

iofu728 self-assigned this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: analysis of attention scores (too sparse) #82

[Question]: analysis of attention scores (too sparse) #82

wiluen commented Oct 19, 2024

wiluen commented Oct 19, 2024

iofu728 commented Oct 21, 2024

[Question]: analysis of attention scores (too sparse) #82

[Question]: analysis of attention scores (too sparse) #82

Comments

wiluen commented Oct 19, 2024

Describe the issue

wiluen commented Oct 19, 2024

iofu728 commented Oct 21, 2024