[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

ALUKErnel · 2024-07-17T12:18:27Z

Describe the issue

Thanks for the great work!

According to my own implementations, here are some questions about the settings of vertical_size and slash_size. It seems that the larger vertical_size and slash_size, the performances (i.e. ppl in my experiment) are not promised better. Intuitively, with the increase of vertical and slash size, more weights in attention matrix are reserved (as well as the corresponding kv cache), the performance should have been better. However, my experimental results are sometimes against this. And it seems that there is a trade-off between v_size and s_size, in my experiments s_size has a larger impact on the performances.

I wonder in your empirical experiments which explore the setting v_size and s_size (i.e. (30, 800) (500, 700) (1000, 6096)....), is the performance better with the increase of v_size and s_size, or is there any other specific pattern?

Looking forward to your reply!

ALUKErnel · 2024-07-17T12:27:08Z

Supplements (if necessary):
I conduct the experiments on llama2-7B, with the sequence length 4k, last_q 64 (inference). The metric is ppl on pg19. The experiments aim to explore the impact on vertical_size and slash_size on the performance only (without considering the efficiency currently).

iofu728 · 2024-07-18T01:52:32Z

Hi @ALUErnel, thanks for your great question.

Generally speaking, different heads have varying sensitivity to vertical size and slash size. Some heads, such as the one with the config (3500, 100), require a larger vertical size rather than slash size.
Secondly, PPL in long-context scenarios is not an effective indicator. For PPL, local windows are very important and almost exclusively related to local windows. This is why the StreamingLLM method shows such good results in PPL tests. For downstream tasks, I would recommend using KV retrieval or Needle In A Haystack (though it is simple, it can reflect the capabilities of different context windows and depth).

ALUKErnel · 2024-07-18T03:38:24Z

Thanks for your response! : ) I am also wondering whether the y-axis in Figure 5 represents the log of perplexity (i.e., e^{8-10}) or the actual perplexity values (i.e., 8-10)?

iofu728 · 2024-07-19T07:13:02Z

Thanks for your response! : ) I am also wondering whether the y-axis in Figure 5 represents the log of perplexity (i.e., e^{8-10}) or the actual perplexity values (i.e., 8-10)?

Hi @ALUKErnel, the PPL results are after exp. you can refer to this code https://github.com/microsoft/MInference/blob/main/experiments/ppl/run_ppl.py#L138.

ALUKErnel added the question Further information is requested label Jul 17, 2024

iofu728 self-assigned this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

ALUKErnel commented Jul 17, 2024

ALUKErnel commented Jul 17, 2024

iofu728 commented Jul 18, 2024

ALUKErnel commented Jul 18, 2024

iofu728 commented Jul 19, 2024

[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

[Question]: Question about the settings of vertical_size and slash_size in vertical_and_slash pattern #47

Comments

ALUKErnel commented Jul 17, 2024

Describe the issue

ALUKErnel commented Jul 17, 2024

iofu728 commented Jul 18, 2024

ALUKErnel commented Jul 18, 2024

iofu728 commented Jul 19, 2024