Eager/Spda Attention have lower results compared to Flash Attention in simcse stage #144

ThonyPan · 2024-09-09T14:55:49Z

I tried to reproduce the simcse stage of the framework. While using flash attention, the results are as good as reported. However, when trying to train on eager or spda attention, the outcome has a significant drop. What might be the potential reason?

In your code, you describe an error as "'LLM2Vec models were trained with flash attention enabled. For optimal performance, please install the flash_attn package with pip install flash-attn --no-build-isolation." Does that mean if I train the mntp stage using eager/spda attention as well, the performance would be on par with the flash attention?

Thank you!

The text was updated successfully, but these errors were encountered:

vaibhavad · 2024-10-20T23:53:16Z

Hi @ThonyPan,

Unfortunately we have not run experiments comparing different attention implementations, so I cannot say anything about performance differences. We chose flash attention as it is the fastest, and latency is crucial for both training and inference.

TianBaoGe · 2024-11-11T07:27:56Z

Hi @ThonyPan

Did you successfully reproduce the results of MNTP+SimCSE? I have successfully reproduced Sheared-LLaMA-1.3B SimCSE, but the results of MNTP+SimCSE are consistently lower than those reported in the paper. Could you share your training details?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eager/Spda Attention have lower results compared to Flash Attention in simcse stage #144

Eager/Spda Attention have lower results compared to Flash Attention in simcse stage #144

ThonyPan commented Sep 9, 2024

vaibhavad commented Oct 20, 2024

TianBaoGe commented Nov 11, 2024

Eager/Spda Attention have lower results compared to Flash Attention in simcse stage #144

Eager/Spda Attention have lower results compared to Flash Attention in simcse stage #144

Comments

ThonyPan commented Sep 9, 2024

vaibhavad commented Oct 20, 2024

TianBaoGe commented Nov 11, 2024