深度学习过拟合解决方案:Dropout原理与bagging视角

需积分: 0 0 下载量 194 浏览量 更新于2024-08-05 收藏 868KB PDF 举报
在深度学习领域,"Dropout理解1"这篇文章主要探讨了dropout技术及其与bagging方法的关系,特别是在预防过拟合方面的重要作用。dropout是一种在神经网络训练过程中引入随机性的方式,它按照一定概率临时“关闭”(丢弃)神经元,这样每个mini-batch都在训练一个略有差异的子网络。这种思想源于bagging(自助集成学习),即通过构建多个独立且参数共享的模型,每个模型基于不同的子集数据进行训练。 1. dropout与bagging的联系: - dropout可以被视为一种特殊的bagging策略,因为它们都试图利用模型多样性来减少泛化误差。bagging通过训练多个独立的分类器并取平均结果,而dropout则是在训练阶段随机抑制神经元,形成一个动态的网络集合。 - bagging中的分类器是完全独立的,每个模型都有自己的训练集,而dropout中的模型虽然参数共享,但每次前向传播时网络结构都是变化的。 - dropout在训练时更像是无监督地进行模型集成,因为它并不像bagging那样明确训练每个模型,而是每次迭代都训练一个随机子网络。 1. dropout的作用机制: - dropout能防止过拟合,因为随机丢弃神经元使得模型不会过度依赖任何单一特征,从而增加了模型的泛化能力。这类似于在测试时随机隐藏部分特征,迫使模型学习更鲁棒的特征表示。 - 论文《Dropout: A Simple Way to Prevent Neural Networks from Overfitting》[1]提出了dropout的有效性,表明大规模神经网络在时间效率和过拟合并存问题时,dropout提供了一种简单且有效的解决方案。 2. dropout与模型复杂性和稀疏性: - dropout通过引入稀疏性,使网络在训练过程中保持较低的复杂度,有助于减少过拟合风险。这与正则化类似,但dropout是动态的,而不是在参数层面施加硬性限制。 dropout是一种强大的深度学习工具,它通过模拟模型集成、增强泛化能力和引入网络稀疏性,有效地对抗过拟合问题。理解其背后的组合派观点,有助于我们更好地运用dropout优化深度学习模型的性能。
2023-07-15 上传

class ResidualBlock(nn.Module): def init(self, in_channels, out_channels, dilation): super(ResidualBlock, self).init() self.conv = nn.Sequential( nn.Conv1d(in_channels, out_channels, kernel_size=3, padding=dilation, dilation=dilation), nn.BatchNorm1d(out_channels), nn.ReLU(), nn.Conv1d(out_channels, out_channels, kernel_size=3, padding=dilation, dilation=dilation), nn.BatchNorm1d(out_channels), nn.ReLU() ) self.attention = nn.Sequential( nn.Conv1d(out_channels, out_channels, kernel_size=1), nn.Sigmoid() ) self.downsample = nn.Conv1d(in_channels, out_channels, kernel_size=1) if in_channels != out_channels else None def forward(self, x): residual = x out = self.conv(x) attention = self.attention(out) out = out * attention if self.downsample: residual = self.downsample(residual) out += residual return out class VMD_TCN(nn.Module): def init(self, input_size, output_size, n_k=1, num_channels=16, dropout=0.2): super(VMD_TCN, self).init() self.input_size = input_size self.nk = n_k if isinstance(num_channels, int): num_channels = [num_channels*(2**i) for i in range(4)] self.layers = nn.ModuleList() self.layers.append(nn.utils.weight_norm(nn.Conv1d(input_size, num_channels[0], kernel_size=1))) for i in range(len(num_channels)): dilation_size = 2 ** i in_channels = num_channels[i-1] if i > 0 else num_channels[0] out_channels = num_channels[i] self.layers.append(ResidualBlock(in_channels, out_channels, dilation_size)) self.pool = nn.AdaptiveMaxPool1d(1) self.fc = nn.Linear(num_channels[-1], output_size) self.w = nn.Sequential(nn.Conv1d(num_channels[-1], num_channels[-1], kernel_size=1), nn.Sigmoid()) # 特征融合 门控系统 # self.fc1 = nn.Linear(output_size * (n_k + 1), output_size) # 全部融合 self.fc1 = nn.Linear(output_size * 2, output_size) # 只选择其中两个融合 self.dropout = nn.Dropout(dropout) # self.weight_fc = nn.Linear(num_channels[-1] * (n_k + 1), n_k + 1) # 置信度系数,对各个结果加权平均 软投票思路 def vmd(self, x): x_imfs = [] signal = np.array(x).flatten() # flatten()必须加上 否则最后一个batch报错size不匹配! u, u_hat, omega = VMD(signal, alpha=512, tau=0, K=self.nk, DC=0, init=1, tol=1e-7) for i in range(u.shape[0]): imf = torch.tensor(u[i], dtype=torch.float32) imf = imf.reshape(-1, 1, self.input_size) x_imfs.append(imf) x_imfs.append(x) return x_imfs def forward(self, x): x_imfs = self.vmd(x) total_out = [] # for data in x_imfs: for data in [x_imfs[0], x_imfs[-1]]: out = data.transpose(1, 2) for layer in self.layers: out = layer(out) out = self.pool(out) # torch.Size([96, 56, 1]) w = self.w(out) out = w * out # torch.Size([96, 56, 1]) out = out.view(out.size(0), -1) out = self.dropout(out) out = self.fc(out) total_out.append(out) total_out = torch.cat(total_out, dim=1) # 考虑w1total_out[0]+ w2total_out[1],在第一维,权重相加得到最终结果,不用cat total_out = self.dropout(total_out) output = self.fc1(total_out) return output优化代码

2023-05-15 上传