You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The background is that due to the limitation of the computation platform I'm using, where the softmax operator costs a lot of time, I'm trying to substitute the regular attention modules into sofmax-free attention module.
I have one question about the structure of SOFT. The core of the softmax-free attention module runs like this:
def forward(self, X, H, W):
Q = self.split_heads(self.W_q(X))
V = self.split_heads(self.W_v(X))
attn_out = self.attn(Q, V, H, W)
attn_out = self.combine_heads(attn_out)
out = self.ff(attn_out)
return out
As Q and V are generated from X, does that mean this attention module is keen to a self-attention module rather than the cross-attention module where the Q, K, V are from different domains? If that is the case, is there any suggestion on regular cross-attention module substitution with softmax-free attention? Thanks.
Best,
Chenxi
The text was updated successfully, but these errors were encountered:
Hello,
The background is that due to the limitation of the computation platform I'm using, where the softmax operator costs a lot of time, I'm trying to substitute the regular attention modules into sofmax-free attention module.
I have one question about the structure of SOFT. The core of the softmax-free attention module runs like this:
As Q and V are generated from X, does that mean this attention module is keen to a self-attention module rather than the cross-attention module where the Q, K, V are from different domains? If that is the case, is there any suggestion on regular cross-attention module substitution with softmax-free attention? Thanks.
Best,
Chenxi
The text was updated successfully, but these errors were encountered: