Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: analysis of attention scores (too sparse) #82

Open
wiluen opened this issue Oct 19, 2024 · 2 comments
Open

[Question]: analysis of attention scores (too sparse) #82

wiluen opened this issue Oct 19, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@wiluen
Copy link

wiluen commented Oct 19, 2024

Describe the issue

I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable information, such as more attention on what kinds of tokens. Considering that a model has n layers and m and attention head, how can I gain some valuable insights?
my task is to extracting important information from the input I provide

@wiluen wiluen added the question Further information is requested label Oct 19, 2024
@wiluen
Copy link
Author

wiluen commented Oct 19, 2024

do difference attention head\ different layers matters ?

@iofu728 iofu728 self-assigned this Oct 21, 2024
@iofu728
Copy link
Contributor

iofu728 commented Oct 21, 2024

Hi @wiluen, thanks for your question.

If I understand correctly, you're asking how to determine which parts of the attention weights are more important to preserve, especially in highly sparse scenarios.

  1. In MInference, we don’t perform fine-tuned adjustments. Most heads use the same kernel sparsity rate. However, we replace block sparsity with a higher-budget VS pattern for certain heads, as we found that allocating more resources to these heads can significantly improve performance.

  2. There are several related works exploring this direction, including:

    • KV cache compression: PyramidKV, RetrievalAttention
    • Sparse Attention: RetrievalHead, DuoAttention, RazorAttention

You can evaluate the impact of small sparse attention weight values in different heads from an end-to-end perspective to measure their importance.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants