Wave Dec 2024 Release #250

harsh-nod · 2024-11-04T19:02:33Z

DefaultThis issue lists all feature requests and improvements slated for the Nov 2024 Tkw release.

Flash Attention Performance is highest priority

================================================

Week 1 (Nov 8th)

Week 2(Nov 15)
Ivan

Adding support for using tensors from the kernel in mapping for reads and writes
Harsh
Create a FA page table dataset for Ivan to test his PR on
Create a harness for SGLANG grok / llama where we can test baseline perf and add our kernels and see perf (with Sai)
Write a decode attention kernel
Unaligned sequence length & Unaligned head dim
Stan
Adjusting k-width to maximize reads from shared memory and align layouts between 2 matmuls
Scheduling meeting with Giuseppe to show kernel and help him iterate
15th meeting with quantization team showing the FP8 kernel
=========================================================================================
Unassigned
Getting kernels with hipblaslt where we can turn knobs and relate knobs to output kernels
Packed Shuffles
Dynamic & aligned attention fp16 (M & K2 not specified)

Week 3(Nov 22)

Identifying which knobs represent multi-buffering and investigating strategy for multi-buffering
Assembly generation
Which knobs map to which instructions
Harsh: Differences between tensile and tensile lite?
Both: How does PGR2 fit into the big picture?
Wave performance numbers
Documentation on scheduling in tensile
Ivan
PR for dynamic symbol in read/write
Add conv kernel to iree-kernel-benchmark
Contiguous IGEMM loads PR
Stan
Land F8 Attention
F8 FA Performance
Detecting of rocm device instead of manually specifying gfx942
Support for scalar constants
Kernel caching
Packed Shuffle
I32 MFMA intrinsics
Harsh
Land dynamic GEMM PR
Dynamic attention PR
Flash Decoding but without PagedAttention (highest priority)
Hoist Q
Backward Attention

Week 4(Nov 29)

Performance evaluation across 5 sizes with the best tile sizes and other parameters enabled
Insights on whether we need llvm inline assembly and if so, how we could integrate with the rest of wave
Start drafting implementation strategy for mimicking multi buffering
Ivan
Performance optimizations of IGEMM
Stan
Ramp on scheduling and importance of set_prio

harsh-nod changed the title ~~Dec 2024 Release~~ Wave Dec 2024 Release Nov 4, 2024

Provide feedback