You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to reproduce the simcse stage of the framework. While using flash attention, the results are as good as reported. However, when trying to train on eager or spda attention, the outcome has a significant drop. What might be the potential reason?
In your code, you describe an error as "'LLM2Vec models were trained with flash attention enabled. For optimal performance, please install the flash_attn package with pip install flash-attn --no-build-isolation." Does that mean if I train the mntp stage using eager/spda attention as well, the performance would be on par with the flash attention?
Thank you!
The text was updated successfully, but these errors were encountered:
Unfortunately we have not run experiments comparing different attention implementations, so I cannot say anything about performance differences. We chose flash attention as it is the fastest, and latency is crucial for both training and inference.
Did you successfully reproduce the results of MNTP+SimCSE? I have successfully reproduced Sheared-LLaMA-1.3B SimCSE, but the results of MNTP+SimCSE are consistently lower than those reported in the paper. Could you share your training details?
Hi @vaibhavad,
I tried to reproduce the simcse stage of the framework. While using flash attention, the results are as good as reported. However, when trying to train on eager or spda attention, the outcome has a significant drop. What might be the potential reason?
In your code, you describe an error as "'LLM2Vec models were trained with flash attention enabled. For optimal performance, please install the
flash_attn
package withpip install flash-attn --no-build-isolation
." Does that mean if I train the mntp stage using eager/spda attention as well, the performance would be on par with the flash attention?Thank you!
The text was updated successfully, but these errors were encountered: