We define the stimulation magnitude of the mth presynaptic neuron in the (l + 1)th layer and the n th postsynaptic neuron in the l th layer as ∂L/∂ul+1 m and x l n , respectively. The connections activated based on Hebbian theory would have a strong correlation between presynaptic and postsynaptic cells, thus a large value of (∂L/∂ul+1 m )x l n . This is also the magnitude of the gradient of L with respect to w (w is the weight that connects u l+1 m and x l n ): |∂L/∂w| = (∂L/∂ul+1 m )x l n (1),给我解释一下公示的含义
时间: 2023-06-25 19:03:05 浏览: 134
这个公式描述了神经网络的反向传播算法中,对于某个连接权重 w 的更新量的大小。在这个公式中,∂L/∂ul+1 m 表示损失函数 L 对于第 l+1 层的第 m 个神经元的输入信号 ul+1 m 的梯度,x l n 表示第 l 层的第 n 个神经元的输出信号,也就是第 l+1 层的第 m 个神经元与第 l 层的第 n 个神经元之间的连接权重 w 所对应的输入信号。
公式中的 (∂L/∂ul+1 m )x l n 表示两个向量的点积,它们的值越大,则说明损失函数 L 对于当前的连接权重 w 越敏感,需要更大的更新量来优化模型。因此,通过对这个公式的计算,我们可以获得当前连接权重 w 的梯度信息,从而优化神经网络的参数,使其更好地拟合训练数据,提高模型的准确度。
帮我解释一下这段话:The connection growth algorithm greedily activates useful, but currently ‘dormant,’ connections. We incorporate it in the following learning policy: Policy 1: Add a connection w iff it can quickly reduce the value of loss function L. The DNN seed contains only a small fraction of active connections to propagate gradients. To locate the ‘dormant’ connections that can reduce L effectively, we evaluate ∂L/∂w for all the ‘dormant’ connections w (computed either using the whole training set or a large batch). Policy 1 activates ‘dormant’ connections iff they are the most efficient at reducing L. This can also assist with avoiding local minima and achieving higher accuracy [28]. To illustrate this policy, we plot the connections grown from the input to the first layer of LeNet-300-100 [7] (for the MNIST dataset) in Fig. 3. The image center has a much higher grown density than the margins, consistent with the fact that the MNIST digits are centered. From a neuroscience perspective, our connection growth algorithm coincides with the Hebbian theory: “Neurons that fire together wire together [29]." We define the stimulation magnitude of the mth presynaptic neuron in the (l + 1)th layer and the n th postsynaptic neuron in the l th layer as ∂L/∂ul+1 m and x l n , respectively. The connections activated based on Hebbian theory would have a strong correlation between presynaptic and postsynaptic cells, thus a large value of (∂L/∂ul+1 m )x l n . This is also the magnitude of the gradient of L with respect to w (w is the weight that connects u l+1 m and x l n ): |∂L/∂w| = (∂L/∂ul+1 m )x l n (1) Thus, this is mathematically equivalent to Policy 1.