平滑L1/2正则化：批处理梯度学习在Sigma-Pi-Sigma神经网络的收敛性分析

需积分: 10 74 浏览量更新于2024-08-26 收藏 703KB PDF 举报

"这篇研究论文探讨了在Sigma-Pi-Sigma神经网络中，采用具有平滑L1/2正则化的批处理梯度学习算法的收敛性问题。 Sigma-Pi-Sigma神经网络作为一种比传统前馈神经网络具有更强映射能力的模型，其在机器学习和模式识别等领域有广泛应用。L1/2正则化是一种有效的正则化方法，能够防止过拟合并提升模型的泛化能力。本文主要关注在训练这些网络时，如何通过平滑的L1/2正则化来优化批处理梯度下降算法的性能，并证明了该算法在给定条件下的收敛性。" 正文: 在神经网络领域，Sigma-Pi-Sigma（Σ-Π-Σ）神经网络因其强大的非线性表达能力和对复杂函数的建模能力而备受青睐。与传统的前馈神经网络相比，Σ-Π-Σ网络通常具有更好的性能，尤其是在处理非线性问题时。然而，随着网络结构的复杂度增加，训练过程可能会遇到过拟合的问题，即模型在训练数据上表现良好，但在未见过的数据上表现较差。为了缓解过拟合，正则化技术被广泛使用。L1/2正则化是其中的一种，它是L1和L2正则化的折中方案。L1正则化倾向于产生稀疏权重，即许多权重值会接近于零，从而降低模型的复杂度；L2正则化则通过惩罚所有权重的平方和来避免权重过大，保持模型的稳定性。L1/2正则化结合了两者的优势，既能产生部分稀疏性，又不会过于惩罚所有权重，使得模型在保持解释性的同时，保持一定的泛化能力。批处理梯度学习算法是神经网络训练中最常用的优化策略之一，它通过计算整个训练集的平均梯度来更新权重，而不是单个样本。这种全局视角可以提供更稳定的更新方向，有助于收敛到局部最优解。然而，当结合正则化时，算法的收敛性质可能会变得复杂，需要适当的分析和调整。该研究论文深入研究了在Σ-Π-Σ神经网络中，如何将平滑的L1/2正则化引入批处理梯度学习算法，并证明了在这种设置下算法的收敛性。平滑的L1/2正则化是通过对L1范数进行平滑处理，如使用Huber损失或Logistic函数，使得在优化过程中更容易处理，同时保留L1正则化的稀疏性优点。作者们通过数学分析和数值实验，展示了平滑L1/2正则化如何影响批处理梯度学习算法的迭代过程，并提供了理论保证，证明了在特定条件下，这种结合能够确保算法在训练过程中收敛到理想的解决方案。此外，他们还可能探讨了不同参数设置（如学习率、正则化强度等）对收敛速度和模型性能的影响。这篇研究论文对于理解如何在Σ-Π-Σ神经网络中有效利用平滑L1/2正则化以优化批处理梯度学习算法的性能具有重要意义。这不仅有助于理论上的进展，而且对于实际应用中的神经网络训练策略选择也提供了有价值的指导。

Convergence of batch gradient learning algorithm with smoothing

1/2

regularization for Sigma–Pi–Sigma neural networks

Yan Liu

a,d

, Zhengxue Li

, Dakun Yang

, Kh.Sh. Mohamed

, Jing Wang

,WeiWu

School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034, China

School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China

School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China

School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116024, China

article info

Article history:

Received 9 May 2014

Received in revised form

29 July 2014

Accepted 15 September 2014

Communicated by M.-J. Er

Available online 30 September 2014

Keywords:

Sigma–Pi–Sigma neural networks

Batch gradient learning algorithm

Convergence

Smoothing L

1/2

regularization

abstract

Sigma–Pi–Sigma neural networks are known to provide more powerful mapping capability than

traditional feed-forward neural networks. The L

1/2

regularizer is very useful and efﬁcient, and can be

taken as a representative of all the L

ð0o qo 1Þ regularizers. However, the nonsmoothness of L

1/2

regularization may lead to oscillation phenomenon. The aim of this paper is to develop a novel batch

gradient method with smoothing L

1/2

regularization for Sigma–Pi–Sigma neural networks. Compared

with conventional gradient learning algorithm, this method produces sparser weights and simpler

structure, and it improves the learning efﬁciency. A comprehensive study on the weak and strong

convergence results for this algorithm are also presented, indicating that the gradient of the error

function goes to zero and the weight sequence goes to a ﬁxed value, respectively.

1. Introduction

Sigma–Pi–Sigma neural networks (SPSNNs) are considered as

efﬁcient high-order neural networks which can learn to imple-

ment static mapping that multilayer neural networks and radial

basis function networks usually do [1], since the output of the

SPSNNs has the sum of product-of-sum form. A self-organizing

map of Sigma–Pi units was provided in [2]. The applicability of

networks built on Sigma–Pi units with Elman topology was

explored in [3]. A recurrent Sigma–Pi neural network was selected

as the network architecture providing strong dynamical properties

for the modelling of some non-linear time series [4]. The function

approximation capacity, convergence behavior and generalization

ability of sparselized Sigma–Pi networks were analyzed and

compared with those of ﬁrst-order networks [5,6]. The ridge

polynomial neural network is a special type of higher-order neural

networks using a number of product units as its basic building

blocks, which not only provides a more efﬁcient and regular

architecture, but also maintains the fast learning property and

powerful nonlinear mapping capability while avoiding the combi-

natorial increase in the number of required weights [7]. A binary

product-unit neural network is proposed in order to more efﬁ-

ciently realize Boolean functions [8].

Much attention in neural computing has been paid recently to

improve the structure of networks. The number of neurons is a

crucial factor of dynamic capability for feedforward networks.

There are two common approaches to determinate the appropriate

size for a network. The ﬁrst is to start from a minimal network and

to increase the number of units, and the other is to start from a

maximum network and to prune it (e.g. [9–11]).

A penalty term for pruning feedforward neural networks is

wildly used for weight elimination, which is to discourage the use

of unnecessary connections. One of the simplest penalties added

to the standard cost function is a term proportional to the L

norm

of the weight vectors [12–15], which is used to discourage the

weights from taking large values:

E ¼

E þ

λJ w J

; ð1Þ

where

E is a standard cost function,

λJ w J

is a penalty term and

λ4 0 is a scalar that determines the inﬂuence of penalty term, and

jj  jj

stands for the L

norm. This strategy is called L

regulariza-

tion. Moreover, L

regularization [16] is widely used in the

parameter estimation and is recently used as a feasible approach

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.09.031

☆

This work is supported by the National Natural Science Foundation of

China (Nos. 61473059 and 61403056), the Fundamental Research Funds for the

Central Universities of China, Foundation of Liaoning Educational Committee

(No. L2014218) and the Youth Foundation of Dalian Polytechnic University

(QNJJ201308).

Corresponding author.

E-mail address: lizx@dlut.edu.cn (Z. Li).

Neurocomputing 151 (2015) 333–341

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38708707

粉丝: 5
资源: 899

平滑L1/2正则化：批处理梯度学习在Sigma-Pi-Sigma神经网络的收敛性分析

论文研究-梯度倒数加权平滑算法的改进与实现.pdf

L0正则化，matlab程序

论文研究-Pi-sigma神经网络混合学习算法及收敛性分析.pdf

使用Sigma-Pi-Sigma神经网络的带有L2正则化器的在线反向传播算法收敛分析的宽松条件。

l1/l2正则化方法优化lstm模型的好处

前馈神经网络的平滑ℓ0正则化在线梯度方法

改进的L-1 / 2正则化用于荧光分子层析成像的稀疏重建

带L2正则化项的神经网络逆向迭代算法收敛性分析.pdf

平滑L0正则化的前馈神经网络批量梯度训练方法

L1-L2正则化图像复原：交替优化算法与性能提升

最新资源