Enable flash attention by default #690

rsuderman · 2024-12-12T23:04:04Z

No description provided.

aviator19941 · 2024-12-13T00:03:30Z

@rsuderman Can you also add the block_seq_stride flag defaulted to 32 for export_paged_llm_v1.py like in https://github.com/rsuderman/sharktank/tree/configure_blocking?

EDIT: nvm seems like Archana made the change here: #692

Enable flash attention by default and make block size 32

0469c25

rsuderman requested a review from aviator19941 December 12, 2024 23:04

reverted some unneeded changes

9bc64ee

rsuderman changed the title ~~Enable flash attention by default and make block size 32~~ Enable flash attention by default Dec 12, 2024

aviator19941 approved these changes Dec 12, 2024

View reviewed changes

Merge branch 'main' into flash_attention_enable

5eb365a

rsuderman merged commit 77ca02f into nod-ai:main Dec 13, 2024
8 checks passed

IanNod pushed a commit to IanNod/SHARK-Platform that referenced this pull request Dec 17, 2024

Enable flash attention by default (nod-ai#690)

860b4ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable flash attention by default #690

Enable flash attention by default #690

rsuderman commented Dec 12, 2024

aviator19941 commented Dec 13, 2024 •

edited

Loading

Enable flash attention by default #690

Enable flash attention by default #690

Conversation

rsuderman commented Dec 12, 2024

aviator19941 commented Dec 13, 2024 • edited Loading

aviator19941 commented Dec 13, 2024 •

edited

Loading