Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

Open
ALUKErnel opened this issue Jul 17, 2024 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@ALUKErnel
Copy link

Describe the issue

Thanks for the great work!

According to my own implementations, here are some questions about the settings of vertical_size and slash_size. It seems that the larger vertical_size and slash_size, the performances (i.e. ppl in my experiment) are not promised better. Intuitively, with the increase of vertical and slash size, more weights in attention matrix are reserved (as well as the corresponding kv cache), the performance should have been better. However, my experimental results are sometimes against this. And it seems that there is a trade-off between v_size and s_size, in my experiments s_size has a larger impact on the performances.

I wonder in your empirical experiments which explore the setting v_size and s_size (i.e. (30, 800) (500, 700) (1000, 6096)....), is the performance better with the increase of v_size and s_size, or is there any other specific pattern?

Looking forward to your reply!

@ALUKErnel ALUKErnel added the question Further information is requested label Jul 17, 2024
@ALUKErnel
Copy link
Author

Supplements (if necessary):
I conduct the experiments on llama2-7B, with the sequence length 4k, last_q 64 (inference). The metric is ppl on pg19. The experiments aim to explore the impact on vertical_size and slash_size on the performance only (without considering the efficiency currently).

@iofu728 iofu728 self-assigned this Jul 18, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jul 18, 2024

Hi @ALUErnel, thanks for your great question.

  1. Generally speaking, different heads have varying sensitivity to vertical size and slash size. Some heads, such as the one with the config (3500, 100), require a larger vertical size rather than slash size.
  2. Secondly, PPL in long-context scenarios is not an effective indicator. For PPL, local windows are very important and almost exclusively related to local windows. This is why the StreamingLLM method shows such good results in PPL tests. For downstream tasks, I would recommend using KV retrieval or Needle In A Haystack (though it is simple, it can reflect the capabilities of different context windows and depth).

@ALUKErnel
Copy link
Author

Thanks for your response! : ) I am also wondering whether the y-axis in Figure 5 represents the log of perplexity (i.e., e^{8-10}) or the actual perplexity values (i.e., 8-10)?
截屏2024-07-18 11 28 53

@iofu728
Copy link
Contributor

iofu728 commented Jul 19, 2024

Thanks for your response! : ) I am also wondering whether the y-axis in Figure 5 represents the log of perplexity (i.e., e^{8-10}) or the actual perplexity values (i.e., 8-10)? 截屏2024-07-18 11 28 53

Hi @ALUKErnel, the PPL results are after exp. you can refer to this code https://github.com/microsoft/MInference/blob/main/experiments/ppl/run_ppl.py#L138.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants