drawing attention

dataflowr · May 30, 2023 · 53c69ba · 53c69ba
1 parent dbe398e
commit 53c69ba
Show file tree

Hide file tree

Showing 3 changed files with 1 addition and 1 deletion.
diff --git a/modules/12-attention.md b/modules/12-attention.md
@@ -99,7 +99,7 @@ where $W_1\in \mathbb{R}^{d\times h}$, $b_1\in \mathbb{R}^h$, $W_2\in \mathbb{R}
 
 Each of these layers is applied on each of the inputs given to the transformer block as depicted below:
 
-![](/modules/extras/attention/dessin.jpg)
+![](/modules/extras/attention/transformer_block_nocode.png)
 
 Note that this block is equivariant: if we permute the inputs, then the outputs will be permuted with the same permutation. As a result, the order of the input is irrelevant to the transformer block. In particular, this order cannot be used.
 The important notion of positional encoding allows us to take order into account. It is a deterministic unique encoding for each time step that is added to the input tokens.

diff --git a/modules/extras/attention/dessin.jpg b/modules/extras/attention/dessin.jpg
diff --git a/modules/extras/attention/transformer_block_nocode.png b/modules/extras/attention/transformer_block_nocode.png