conceptual image

McGill-NLP · Apr 9, 2024 · 7ea9fba · 7ea9fba
1 parent 4fe5e74
commit 7ea9fba
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 3 deletions.
diff --git a/docs/_pages/tutorial.md b/docs/_pages/tutorial.md
@@ -11,11 +11,13 @@ In this tutorial, we will transform LlaMA models into text encoders, however, tr
 
 ## 1) Enabling Bidirectional Attention
 
-TODO:add a conceptual figure here
+A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention mechanism. 
 
-<!-- mention which transformer version is used for this -->
+<p align="center">
+  <img src="" width="75%" alt="Llama Conceptual overview"/>
+</p>
 
-A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention mechanism. We start bottoms-up by first modifying the attention mechanism to be bidirectional.
+We start bottoms-up by first modifying the attention mechanism to be bidirectional.
 
 HuggingFace implements three attention mechanisms for Llama and Mistral models - Eager, SDPA, and Flash Attention. Here, we only modify the flash attention implementation. In order to be able to use the bidirectional attention, we need to create new LLaMA flash attention class:
 ```python

diff --git a/docs/assets/images/LLM2Vec-tutorial.png b/docs/assets/images/LLM2Vec-tutorial.png