Skip to content

Commit

Permalink
drawing attention
Browse files Browse the repository at this point in the history
  • Loading branch information
mlelarge committed May 30, 2023
1 parent dbe398e commit 53c69ba
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion modules/12-attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ where $W_1\in \mathbb{R}^{d\times h}$, $b_1\in \mathbb{R}^h$, $W_2\in \mathbb{R}
Each of these layers is applied on each of the inputs given to the transformer block as depicted below:
![](/modules/extras/attention/dessin.jpg)
![](/modules/extras/attention/transformer_block_nocode.png)
Note that this block is equivariant: if we permute the inputs, then the outputs will be permuted with the same permutation. As a result, the order of the input is irrelevant to the transformer block. In particular, this order cannot be used.
The important notion of positional encoding allows us to take order into account. It is a deterministic unique encoding for each time step that is added to the input tokens.
Expand Down
Binary file removed modules/extras/attention/dessin.jpg
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 53c69ba

Please sign in to comment.