Merging DiVA to Levanter Main #779

Helw150 · 2024-10-30T04:32:10Z

Cleaned up version of my code for the Distilled Voice Assistant models that I trained using a fork of Levanter!

@dlwh Main thing I want to check in with you here is what the appropriate design pattern you think would make sense for initializing the model weights from multiple other pretrained models would be! What I've done here is much cleaner than what I did originally for the paper, but still feels a bit messy.

Testing Procedure for the correctness of this training code:
I trained a new DiVA model with this updated code and Llama 3.2 1B using the config in diva_flash.yaml.

Training Log is here: https://wandb.ai/i18nlp/levanter/runs/jnxp463y?nw=nwuserheld
Resulting model is on HF in PyTorch form here: https://huggingface.co/WillHeld/DiVA-llama-3.2-1b
Demo which confirmed the result is ~reasonable here for now: https://b3f161194b514a990f.gradio.live/

dlwh · 2024-10-30T20:44:22Z

I am currently annoyed by how we initialize models and this seems fine enough (cf #780 ) so I don't have a super strong feeling right now on it. You could look to how we do Lora if you want, but that's a bit of a different case.

dlwh

Nice! Few minor comments. I don't understand everything but overall seems good to me!

dlwh · 2024-10-30T20:36:00Z

src/levanter/models/diva.py

+        lev_model: DivaModel = super().load_pretrained(
+            DivaModel, ref, config, axis_mapping, resize_vocab_to_match_tokenizer, dtype
+        )  # type: ignore[assignment]
+        llm: Union[LlamaLMHeadModel | MistralLMHeadModel | GemmaLMHeadModel] = HFCheckpointConverter(


Do you need this Union around your 3.10 union?

dlwh · 2024-10-30T20:36:45Z

src/levanter/models/diva.py

+    elif "gemma" in model_id:
+        config = GemmaConfig.from_hf_config(hf_config)
+    elif "mistral" in model_id:
+        config = MistralConfig.from_hf_config(hf_config)


should you raise a better error if it's none of these?

dlwh · 2024-10-30T20:37:34Z

src/levanter/models/diva.py

+    return config
+
+
+def get_prefix(tokenizer_ref):


doccomment maybe?

dlwh · 2024-10-30T20:37:45Z

src/levanter/models/diva.py

+    return prefix_tok, suffix_tok
+
+
+@LmConfig.register_subclass("diva")


Yes, good catch.

dlwh · 2024-10-30T20:38:08Z

src/levanter/models/diva.py

+    init_from_submodel: bool = True
+
+    # Connector Config
+    pre_audio_prompt = property(lambda self: get_prefix(self.reference_decoder)[0])


cache_property or who cares?

Definitely cache property.

dlwh · 2024-10-30T20:38:47Z

src/levanter/models/diva.py

+    )
+
+
+class DivaModel(eqx.Module, ModelWithHfSerializationMixin[DivaConfig]):


maybe link to paper or something?

dlwh · 2024-10-30T20:39:54Z

src/levanter/models/diva.py

+
+        # Convert to Virtual LLM Tokens
+        virt_whisper_tokens = self.connector.transformer(
+            (self.query_tokens + self.query_position_embeds).broadcast_axis(OtherAxes),


hrm i wouldn't think this should be necessary

I'll double check. I've somewhat forgotten why I added these explicit broadcasts.

dlwh · 2024-10-30T20:40:41Z

src/levanter/models/diva.py

+            text[
+                {
+                    "batch": hax.arange(Batch),
+                    "position": (hax.sum(text_tokens == pad_token_id, "position") * -1) - 1,


you just broke my brain

dlwh · 2024-10-30T20:42:01Z

src/levanter/models/diva.py

+        kl_proxy_loss = hax.dot(diff_distill, diff_distill, axis="embed") ** 0.5
+
+        # Compute Contrastive Loss on Input
+        # Correct for Normal Autoregressive Loss Mask


do you want to check that attn mask is causal or eh?

Helw150 · 2024-11-05T19:25:58Z

Still investigating a bit of stuff here - used this code to reproduce the original DiVA model with Llama 3 8B - but hitting some weirdness with Llama 3.1 8B where the resulting model has a lot of repetitions.

Hypotheses:

Some mismatch in RoPE still?
Some issue where the 3.1 model needs multiple tokens of distillation?

dlwh · 2024-11-06T04:27:55Z

hrm happy to pair if that would be helpful. We can definitely investigate the rope thing. It's a constant pain

dlwh · 2024-11-06T04:32:55Z

it looks like rope is exactly the same for llama3 and 3.1 so it's probably not that, unless you haven't merged main in month or two. I did fix a bug in #740

Helw150 requested a review from dlwh October 30, 2024 04:32

dlwh approved these changes Oct 30, 2024

View reviewed changes

Helw150 force-pushed the will/diva-merge branch from 2e6ca68 to 6c9f6f0 Compare November 20, 2024 02:22

Helw150 added 7 commits November 21, 2024 13:24

Merging DiVA to Levanter Main

1272412

Pre-Commit Fixes

e373697

Need to Merge These Whisper Changes Too

033b7a2

Somehow lost this in cherry-picks

cd80f6f

More Pre-Commit Fixes, I need to make this actually run pre-commit

188e1a0

Fix Token Shuffling

e26412d

Pull Master

69f29b4

Helw150 force-pushed the will/diva-merge branch from 6c9f6f0 to 69f29b4 Compare November 21, 2024 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging DiVA to Levanter Main #779

Merging DiVA to Levanter Main #779

Helw150 commented Oct 30, 2024

dlwh commented Oct 30, 2024

dlwh left a comment

dlwh Oct 30, 2024

dlwh Oct 30, 2024

dlwh Oct 30, 2024

dlwh Oct 30, 2024

Helw150 Nov 5, 2024

dlwh Oct 30, 2024

Helw150 Nov 5, 2024

dlwh Oct 30, 2024

dlwh Oct 30, 2024

Helw150 Nov 5, 2024

dlwh Oct 30, 2024

dlwh Oct 30, 2024

Helw150 commented Nov 5, 2024

dlwh commented Nov 6, 2024

dlwh commented Nov 6, 2024

		return prefix_tok, suffix_tok


		@LmConfig.register_subclass("diva")

		)


		class DivaModel(eqx.Module, ModelWithHfSerializationMixin[DivaConfig]):

Merging DiVA to Levanter Main #779

Are you sure you want to change the base?

Merging DiVA to Levanter Main #779

Conversation

Helw150 commented Oct 30, 2024

dlwh commented Oct 30, 2024

dlwh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Helw150 commented Nov 5, 2024

dlwh commented Nov 6, 2024

dlwh commented Nov 6, 2024