You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I understand it, the attention layer here only uses one weight(W_Q,W_K,W_V). Shouldn't each channel use an independent weight(W_Q,W_K,W_V). for calculation, or at least provide an option?
class _MultiheadAttention(nn.Module):
def __init__(self, d_model, n_heads, d_k=None, d_v=None, res_attention=False, attn_dropout=0., proj_dropout=0., qkv_bias=True, lsa=False):
"""Multi Head Attention Layer
Input shape:
Q: [batch_size (bs) x max_q_len x d_model]
K, V: [batch_size (bs) x q_len x d_model]
mask: [q_len x q_len]
"""
# ...
self.W_Q = nn.Linear(d_model, d_k * n_heads, bias=qkv_bias)
self.W_K = nn.Linear(d_model, d_k * n_heads, bias=qkv_bias)
self.W_V = nn.Linear(d_model, d_v * n_heads, bias=qkv_bias)
# Scaled Dot-Product Attention (multiple heads)
# ...
The text was updated successfully, but these errors were encountered:
As I understand it, the attention layer here only uses one weight(W_Q,W_K,W_V). Shouldn't each channel use an independent weight(W_Q,W_K,W_V). for calculation, or at least provide an option?
The text was updated successfully, but these errors were encountered: