Substitute regular attention module with sofmax-free attention module #9

Capchenxi · 2022-11-15T09:09:41Z

Hello,

The background is that due to the limitation of the computation platform I'm using, where the softmax operator costs a lot of time, I'm trying to substitute the regular attention modules into sofmax-free attention module.

I have one question about the structure of SOFT. The core of the softmax-free attention module runs like this:

    def forward(self, X, H, W):

        Q = self.split_heads(self.W_q(X))
        V = self.split_heads(self.W_v(X))
        attn_out = self.attn(Q, V, H, W)
        attn_out = self.combine_heads(attn_out)

        out = self.ff(attn_out)
        return out

As Q and V are generated from X, does that mean this attention module is keen to a self-attention module rather than the cross-attention module where the Q, K, V are from different domains? If that is the case, is there any suggestion on regular cross-attention module substitution with softmax-free attention? Thanks.

Best,
Chenxi

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitute regular attention module with sofmax-free attention module #9

Substitute regular attention module with sofmax-free attention module #9

Capchenxi commented Nov 15, 2022

Substitute regular attention module with sofmax-free attention module #9

Substitute regular attention module with sofmax-free attention module #9

Comments

Capchenxi commented Nov 15, 2022