Skip to content

Commit

Permalink
conceptual image
Browse files Browse the repository at this point in the history
  • Loading branch information
vaibhavad committed Apr 9, 2024
1 parent 4fe5e74 commit 7ea9fba
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions docs/_pages/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@ In this tutorial, we will transform LlaMA models into text encoders, however, tr

## 1) Enabling Bidirectional Attention

TODO:add a conceptual figure here
A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention mechanism.

<!-- mention which transformer version is used for this -->
<p align="center">
<img src="" width="75%" alt="Llama Conceptual overview"/>
</p>

A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention mechanism. We start bottoms-up by first modifying the attention mechanism to be bidirectional.
We start bottoms-up by first modifying the attention mechanism to be bidirectional.

HuggingFace implements three attention mechanisms for Llama and Mistral models - Eager, SDPA, and Flash Attention. Here, we only modify the flash attention implementation. In order to be able to use the bidirectional attention, we need to create new LLaMA flash attention class:
```python
Expand Down
Binary file added docs/assets/images/LLM2Vec-tutorial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7ea9fba

Please sign in to comment.