Add prefill/decode from seq lens in BaseCausalLMModel #383

sogartar · 2024-10-30T20:59:30Z

We do not have a clearly defined interface for LMs. Decode and prefill have different signature when exporting to IREE. One of them uses an attention mask the other sequence lenghts.

This change adds to BaseCausalLMModel default implementations for the new prefill_from_seq_lens and decode_from_seq_lens methods.

The export script export_paged_llm_v1 does too much in its exported functions. It computes the attention mask then it shards its arguments and unshards its result. This change lets it be a thinner wrapper around the new functions.

Make paged_llm_v1.TorchGenerator use the new interface methods.

We do not have a clearly defined interface for LMs. Decode and prefill have different signature when exporting to IREE. Here is added a new ABC CausalLMModelABC that makes a distinction between the two variants. The BaseCausalLMModel provides a default implementation for the new prefill_from_seq_lens and decode_from_seq_lens methods. The export script export_paged_llm_v1 does too much in its exported functions. It computes the attention mask then. It shards its arguments and unshards its result. This change lets it be a thiner wrapper around the new interface functions. Make paged_llm_v1.TorchGenerator use the new interface methods.

rsuderman · 2024-11-06T00:48:59Z

sharktank/sharktank/models/llama/llama.py

@@ -27,7 +27,7 @@
 ################################################################################


-class PagedLlamaModelV1(BaseCausalLMModel):
+class PagedLlamaModelV1(BaseCausalLMModel, CausalLMModelABC):


Double inheritance almost always a bad idea especially considering these are both LLM interfaces. Can we have an alternative option? Why can't we expand this functionality in the existing BaseCausalLMModel?

I removed CausalLMModelABC. I added the prefill and decode unimplemented methods to BaseCausalLMModel.

sogartar requested a review from rsuderman October 30, 2024 20:59

This was referenced Oct 30, 2024

Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

Closed

Add class that implements BaseCausalLMModel but is backed by an IREE module #393

Open

Add IREE numerics test for Llama 3.1 8B FP16 TP8 #394

Open

rsuderman requested changes Nov 6, 2024

View reviewed changes

Remove CausalLMModelABC

d0a9466

sogartar changed the title ~~Introduce CausalLMModelABC interface~~ Add prefill/decode from seq lens in BaseCausalLMModel Nov 7, 2024

sogartar requested a review from rsuderman November 7, 2024 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prefill/decode from seq lens in BaseCausalLMModel #383

Add prefill/decode from seq lens in BaseCausalLMModel #383

sogartar commented Oct 30, 2024 •

edited

Loading

rsuderman Nov 6, 2024

sogartar Nov 7, 2024

Add prefill/decode from seq lens in BaseCausalLMModel #383

Are you sure you want to change the base?

Add prefill/decode from seq lens in BaseCausalLMModel #383

Conversation

sogartar commented Oct 30, 2024 • edited Loading

rsuderman Nov 6, 2024

Choose a reason for hiding this comment

sogartar Nov 7, 2024

Choose a reason for hiding this comment

sogartar commented Oct 30, 2024 •

edited

Loading