输入out.shape为torch.Size(192,512)和定义的输入通道self.out_att(128,512)怎么才能使用torch.matmul 可以帮我写好pytorch代码吗上述代码不成功,mat1 and mat2 shapes cannot be multiplied (192x512 and 128x512)。请写出能让我成功运行的代码 out = torch.matmul(alpha, x).squeeze(1) # self.out_att=self.out_att.transpose(1,0) out = F.elu(self.out_att(out))这句话怎么修改
时间: 2024-03-03 15:53:39 浏览: 21
如果你想使用 torch.matmul 计算矩阵乘积,需要保证两个矩阵的维度满足相容性,即第一个矩阵的列数等于第二个矩阵的行数。根据你给出的信息,out 的形状为 (192, 512),self.out_att 的形状为 (128, 512),因此需要将 self.out_att 的维度进行转置,使得第二个矩阵的行数为 512。修改后的代码如下所示:
```
out = torch.matmul(x, self.out_att.t()) # 注意转置 self.out_att
out = F.elu(out)
```
这里使用了 self.out_att.t() 将 self.out_att 进行转置,使得其形状变为 (512, 128),从而满足矩阵乘积的相容性要求。同时,我们将 alpha 这个向量的维度从 (192, 1) 通过 squeeze(1) 函数压缩成 (192,),以便与第二个矩阵的第一维相乘。
相关问题
我有一个输入a.shape为(192,512),但是我的线性全连接层是(128,512),我该怎么用F.elu(把这个输入送进去,这个pytorch代码该怎么写?)请不要再报错mat1 and mat2 shapes cannot be multiplied (192x512 and 128x512)另外,out = F.elu(self.out_att(out))应该怎么修改
由于输入的维度为(192,512),而线性全连接层的权重矩阵维度为(128,512),两者无法直接相乘,需要对输入进行降维或对权重矩阵进行升维。
可以使用torch.nn.Linear模块对输入进行降维,将其维度从(192,512)降至(192,128),然后再将其送入线性全连接层进行计算,最终输出为(192,128)的矩阵。
代码如下:
```
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.linear = nn.Linear(512, 128)
self.out_att = nn.Linear(128, 1)
def forward(self, x):
x = self.linear(x) # 降维
out = F.elu(x)
out = self.out_att(out)
out = out.squeeze(-1)
return out
```
接下来,对于`out = F.elu(self.out_att(out))`,由于`self.out_att`是一个线性全连接层,它的输出维度为(192,1),因此需要将其压缩维度,变为(192,)。修改后的代码如下:
```
out = self.out_att(out)
out = out.squeeze(-1)
out = F.elu(out)
```
用tensorflow的layers.Layer模块改写class SelfAttention(nn.Module): def init(self,in_c,out_c,fm_sz,pos_bias = False): super(SelfAttention,self).init() self.w_q = nn.Conv2d(in_channels = in_c,out_channels = out_c,kernel_size = 1) self.w_k = nn.Conv2d(in_channels = in_c,out_channels = out_c,kernel_size = 1) self.w_v = nn.Conv2d(in_channels = in_c,out_channels = out_c,kernel_size = 1) self.pos_code = self.__getPosCode(fm_sz,out_c) self.softmax = nn.Softmax(dim = 2) self.pos_bias = pos_bias def __getPosCode(self,fm_sz,out_c): x = [] for i in range(fm_sz): x.append([np.sin,np.cos][i % 2](1 / (10000 ** (i // 2 / fm_sz)))) x = torch.from_numpy(np.array([x])).float() return torch.cat([(x + x.t()).unsqueeze(0) for i in range(out_c)]) def forward(self,x): q,k,v = self.w_q(x),self.w_k(x),self.w_v(x) pos_code = torch.cat([self.pos_code.unsqueeze(0) for i in range(x.shape[0])]).to(x.device) if self.pos_bias: att_map = torch.matmul(q,k.permute(0,1,3,2)) + pos_code else: att_map = torch.matmul(q,k.permute(0,1,3,2)) + torch.matmul(q,pos_code.permute(0,1,3,2)) am_shape = att_map.shape att_map = self.softmax(att_map.view(am_shape[0],am_shape[1],am_shape[2] * am_shape[3])).view(am_shape) return att_map * v
以下是使用tensorflow的layers.Layer模块改写的代码:
```
import tensorflow as tf
class SelfAttention(tf.keras.layers.Layer):
def __init__(self, in_c, out_c, fm_sz, pos_bias=False):
super(SelfAttention, self).__init__()
self.w_q = tf.keras.layers.Conv2D(filters=out_c, kernel_size=1, input_shape=[None, None, in_c])
self.w_k = tf.keras.layers.Conv2D(filters=out_c, kernel_size=1, input_shape=[None, None, in_c])
self.w_v = tf.keras.layers.Conv2D(filters=out_c, kernel_size=1, input_shape=[None, None, in_c])
self.pos_code = self.__getPosCode(fm_sz, out_c)
self.softmax = tf.keras.layers.Softmax(axis=2)
self.pos_bias = pos_bias
def __getPosCode(self, fm_sz, out_c):
x = []
for i in range(fm_sz):
x.append([np.sin, np.cos][i % 2](1 / (10000 ** (i // 2 / fm_sz))))
x = np.array([x], dtype=np.float32)
return tf.concat([(x + tf.transpose(x)).unsqueeze(0) for i in range(out_c)], axis=0)
def call(self, x):
q, k, v = self.w_q(x), self.w_k(x), self.w_v(x)
pos_code = tf.concat([self.pos_code[None, ...] for i in range(tf.shape(x)[0])], axis=0)
if self.pos_bias:
att_map = tf.matmul(q, tf.transpose(k, perm=[0, 1, 3, 2])) + pos_code
else:
att_map = tf.matmul(q, tf.transpose(k, perm=[0, 1, 3, 2])) + tf.matmul(q, tf.transpose(pos_code, perm=[0, 1, 3, 2]))
am_shape = tf.shape(att_map)
att_map = self.softmax(tf.reshape(att_map, [am_shape[0], am_shape[1], am_shape[2] * am_shape[3]]))
att_map = tf.reshape(att_map, am_shape)
return att_map * v
```
需要注意的是,tensorflow中没有nn.Module这个概念,而是使用tf.keras.layers.Layer作为基类。因此,我们需要在类定义中加入`tf.keras.layers.Layer`,并对一些函数名称进行修改。此外,tensorflow默认使用NHWC格式的数据,因此在调用transpose函数时需要加上perm参数进行转置,以保证维度的正确性。