随机权重更新：前馈神经网络的加速训练策略

需积分: 10 29 浏览量更新于2024-09-13 收藏 729KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇文章探讨了在前馈神经网络中应用随机权重更新的策略，以此提高收敛概率和速度。由Salvetti和Wilamowski在1994年提出，这种更新方法在任意网络拓扑结构上的实现都非常简单。在随机权重更新场景下，固定数量的权重被随机选择并更新，不同于传统按顺序更新所有权重的方法。作者描述了其精确的实现方式，并通过在玩具任务数据上的实验结果展示其效果。随机权重更新可以作为经典有序更新的替代方案，不会增加实现复杂性，且有可能提高收敛质量。" 在深度学习中，神经网络的训练通常依赖于反向传播算法，其中权重的更新是关键步骤。传统的权重更新策略是在每个训练迭代过程中更新所有权重，这被称为顺序更新。然而，随机权重更新（Stochastic Weight Update）提供了一种不同的策略，它只更新一部分随机选取的权重。这种方法有以下几个优点： 1. **提高收敛速度**：由于不是每次迭代都更新所有权重，随机权重更新可以减少不必要的计算，使得模型能够更快地接近最优解。 2. **改善收敛概率**：随机选取部分权重进行更新可以避免陷入局部最优，增加了全局最优解的可能性。 3. **适应任意网络结构**：与依赖特定更新顺序的传统方法不同，随机更新方法在任何网络结构上都易于实现。 4. **减少计算资源需求**：由于并非每次都处理所有权重，随机更新在计算资源有限的情况下可能更有效。 5. **增强模型鲁棒性**：通过引入随机性，模型可能对数据的微小变化更具弹性，从而提高泛化能力。文章中提到，随机权重更新的实现包括两个主要步骤：首先，随机选择一定数量的权重；然后，基于这些选定的权重执行梯度下降或其他优化算法的更新。在实际应用中，选择多少权重进行更新以及如何选择（如均匀分布或基于权重的重要性）会影响模型的性能。通过在简单的任务数据上进行实验，作者展示了随机权重更新在保持甚至提高模型性能的同时，可以显著降低训练时间和测试速度。这使得随机权重更新成为一种颇具吸引力的替代方案，特别是在处理大规模、计算密集型的任务时。此外，文中还讨论了与传统的学习和解码算法（如log-linear和大间隔训练算法，以及动态规划解码）的比较，强调了随机权重更新框架的简洁性和语言独立性，使其在一系列句法处理任务中成为有力的竞争者。对于开发新的算法方法的研究者来说，这是一种值得考虑的框架，因为它可以提供快速、高效且灵活的解决方案。

资源详情

资源推荐

Stochastic Weight Update in The Backpropagation Algorithm

on Feed-Forward Neural Networks

Juraj Ko

cak, Rudolf Jak

sa, Member, IEEE and Peter Sin

ak, Member, IEEE

Abstract—We will examine stochastic weight update in the

backpropagation algorithm on feed-forward neural networks. It

was introduced by Salvetti and Wilamowski in 1994 in order to

improve probability of convergence and speed of convergence.

However, this update method has also one another quality, its

implementation is simple for arbitrary network topology. In

stochastic weight update scenario, constant number of weights

is randomly selected and updated. This is in contrast to classical

ordered update, where always all weights are updated. We will

describe exact implementation, and present example results

on toy-task data with feed-forward neural network topology.

Stochastic weight update is suitable to replace classical ordered

update without any penalty on implementation complexity and

with good chance without penalty on quality of convergence.

I. INTRODUCTION

The Stochastic Weight Update was introduced by Salvetti

and Wilamowski in 1994 [1] in order to improve probability

of convergence and speed of convergence of the backpropa-

gation algorithm. Besides the Stochastic Weight Update, they

examined another two stochastic methods: random pattern

selection and randomized learning rate. On the XOR problem

they demonstrated signiﬁcant improvement in the learning

speed and probability of convergence for every one from

these methods, especially for randomized learning rate.

The backpropagation algorithm (Werbos, 1974; Rumel-

hart, McClealland, 1986) is one of most used learning algo-

rithms today. Plain vanilla implementation and momentum

implementation (Silva, Almeida, 1990) [2] are the predomi-

nant implementations on feed-forward topologies. Multilayer

perceptron (MLP) is most used feed-forward topology, it is

a layered architecture with fully interconnected layers.

The Backpropagation Through Time (BPTT) [3] is most

used backpropagation variant for recurrent topologies. The

BPTT was originally described, and is well suited, for

recurrent topologies with full connectivity. Although the

BPTT implementation is not much more complex compare to

feed-forward vanilla backpropagation, the feed-forward time-

delay networks are preferably used for time-related problems.

Time-delay networks are of the MLP design, so layered

topology is most used topology with the backpropagation

algorithm.

Sparse topologies became recently popular with the recur-

rent echo state networks (ESN) (Jaeger, 2001) [4]. However

these are usually not based on backpropagation algorithm.

ESN networks are based on so called reservoir of randomly

Juraj Ko

cak, Rudolf Jak

sa, and Peter Sin

ak are with the Depart-

ment of Cybernetics and Artiﬁcial Intelligence, Technical University

of Ko

sice, Letn

a 9 Ko

sice, Slovakia (email: jurajkoscak@gmail.com,

jaksa@neuron.tuke.sk, peter.sincak@tuke.sk).

interconnected neurons, so they have fully random sparse

topology.

Another area where sparse topologies are employed are

neuro-fuzzy architectures, typical is backpropagation based

ANFIS (Jang, 1993) [5]. ANFIS-like neuro-fuzzy topologies

are characterized by layered, but not fully interconnected

structure. In contrast to randomly-connected ESN architec-

tures, topologies of ANFIS-like structures are deterministic.

The last popular group of backpropagation related topolo-

gies are neocognitron (Fukushima, 1980) [6] and LeNet

(LeCun et al., 1989) [7]. These so called convolutional neural

networks do use deep layered topologies with ﬁve or more

layers. The difference from the classical MLP network is

however the weight reuse, where several links in the network

share the same weight.

In modular neural networks several basic topologies are

connected to bigger meta-topology. Sometimes it might be

useful to process this meta-network as a single instance with

complex topology. This happen for instance with so called

actor-critic architectures for reinforcement learning, where

two or more networks are back-propagated by backpropaga-

tion algorithm in single run. Beside a couple of actor-critic

topologies most famous modular architectures are mixtures

of experts (Jacobs, Jordan, Nowlan, & Hinton, 1991) [8]

and NARA (Takagi, 1992) [9], which are general-use neural

networks.

Pruning methods and cascade correlation (Fahlman, 1991)

[10] style algorithms for incremental building of topology

can lead to pseudo-random sparse topologies too.

Although layered topologies are most frequent among

backpropagation trained neural networks, many interesting

structured or semi-random topologies are used too. Signals

propagation through structured topologies might be non-

trivial for more demanding backpropagation-group algo-

rithms like the BPTT with time unrolling signals propagation.

Stochastic Weight Update may simplify this “propagation”

task.

II. MOTIVATION

Implementation of classical ordered weight update in

backpropagation algorithm does rely on the order of weights

computation in network. In classical MLP networks it is

simple layer-by-layer computation from inputs to the output

layer. Less common, but still convenient implementation is

for fully interconnected network, like in the common BPTT

case. The necessity to update weights from every neuron to

every neuron, can actually bring us compact implementation

- we don’t need to care about any exceptions.

下载后可阅读完整内容，剩余3页未读，立即下载

yihaofang

粉丝: 0
资源: 1

随机权重更新：前馈神经网络的加速训练策略

Stochastic-Neural-Network-master_vhdl#_

Mummford-Pattern-Theory-the-Stochastic-Analysis-of-Real-World-Signals

Information-Theoretic Aspects of Neural Networks

小批量随机梯度下降（Mini-batch Stochastic Gradient Descent，Mini-batch SGD）。

rank-based stochastic pooling

stochastic control in continuous-time and application in finance

t-distributed Stochastic Neighbor Embedding方法的作用

Improving Generalization Performance by Switching from Adam to SGD

i mean what is the implementation of SGD in pytorch to use momentum to update weight

DeepLearnToolbox-master工具箱函数的功能

用于分类的神经网络原理

Accordingly, we visualize the features extracted from different abnormal tissue types of the segmented tumor. The visualization is performed using one of the most widely used high-dimensional data visualization techniques known as t-Distributed Stochastic Neighbor Embedding 中文解释

Robbins-Monro setting

Stochastic-local vol模型是什么

PPO pytorch

关于毫米波指向性波束干扰构建SINR模型的10篇文献推荐：

最新资源