在线梯度法：平滑ℓ0正则化提升前馈神经网络的稀疏与泛化

159 浏览量更新于2024-08-28 收藏 537KB PDF 举报

本文主要探讨了前馈神经网络中的一种新颖的训练策略——在线梯度方法结合平滑ℓ0正则化（Online Gradient Method with Smoothing ℓ0 Regularization, OGTSL0）。在当前深度学习领域，参数稀疏性对于提高模型的泛化能力和减少过拟合至关重要。传统的神经网络训练中，尤其是基于ℓp正则化的技术，如L1（也称为Lasso）和L2（Ridge）正则化，其中p通常设置为0或小于2，虽然L1更倾向于产生稀疏解，但其优化问题由于与NP完全问题相关，实际应用中的训练算法往往复杂且效率不高。然而，平滑ℓ0正则化作为一种对原生的非光滑的ℓ0函数进行近似的手段，能够提供一个可行的替代方案。它通过引入一个连续且易于优化的函数来模拟离散的稀疏性，从而使得在线学习过程更加高效。在前馈神经网络的训练中，OGTSL0算法试图在保持模型性能的同时，通过渐进地调整权重参数，促使模型结构趋向于最简洁的形式，这符合对小型、高效网络结构的需求。本文的核心贡献在于提出并分析了一种在线学习环境下，利用平滑ℓ0正则化处理前馈神经网络的训练算法。研究者们关注了算法的收敛性，即在训练过程中，OGTSL0能否确保权重参数朝着最优解决方案稳定接近。此外，他们还探讨了这种方法如何影响网络的稀疏性，以及这种稀疏性如何反过来影响模型的泛化性能。为了验证这一理论，文章可能包含了一系列实验，比较了OGTSL0与其他经典正则化方法在不同任务上的表现，包括但不限于准确性、训练速度和模型复杂度。结果表明，OGTSL0不仅能有效地实现网络的稀疏性，而且在保持甚至提升性能的同时，显著降低了模型的复杂度，这对于资源有限的应用场景尤其有价值。这篇研究论文提供了在在线学习背景下，通过平滑ℓ0正则化实现前馈神经网络训练的一种创新方法。它不仅填补了理论空白，也为实际应用中解决复杂且高效的神经网络结构优化问题提供了新的视角和工具。

Contents lists available at ScienceDirect

Neurocomputing

journa l homepa ge: www.elsevier.com/locate/neucom

Online gradient method with smoothing ℓ

regularization for feedforward

neural networks

Huisheng Zhang

⁎

, Yanli Tang

Department of Mathematics, Dalian Maritime University, Dalian 116026, China

ARTICLE INFO

Communicated by Sanguineti Marcello

Keywords:

Online learning

Gradient training algorithm

Smoothing ℓ

regularization

Feedforward neural networks

Convergence

Sparsity

ABSTRACT

ℓ

regularization has been a popular pruning method for neural networks. The parameter p was usually set as

<≤

in the literature, and practical training algorithms with ℓ

regularization are lacking due to the NP-

hard nature of the ℓ

regularization problem; however, the ℓ

regularization tends to produce the sparsest

solution, corresponding to the most parsimonious network structure which is desirable in view of the

generalization ability. To this end, this paper considers an online gradient training algorithm with smoothing

ℓ

regularization (OGTSL0) for feedforward neural networks, where the ℓ

regularizer is approximated by a

series of smoothing functions. The underlying principle for the sparsity of OGTSL0 is provided, and the

convergence of the algorithm is also theoretically analyzed. Simulation examples support the theoretical analysis

and illustrate the superiority of the proposed algorithm.

1. Introduction

Multilayer feedforward neural networks (FNNs) has been widely

used in various ﬁelds [1,2]. The training of FNNs can be reduced to

solving nonlinear least square problems, to which numerous traditional

numerical methods, such as the gradient descent method, Newton

method [3], conjugate gradient method [4], extended Kalman ﬁltering

[5], Levenberg-Marquardt method [6], etc., can be applied. Among

those training methods, backpropagation algorithm, which is derived

based on the gradient descent rule, has become one of the most popular

training strategy for its simplicity and ease of implementation [7].

Gradient-based learning can be implemented in two practical ways: the

batch learning and the online learning [8]. The batch learning approach

accumulates the weight correction over all training samples before

actually performing the update, nevertheless the online learning

approach updates the network weights immediately after each training

sample is fed. In this way, batch gradient training method corresponds

to the standard gradient descent algorithm, while the online gradient

training method directly makes use of the instantaneous approximated

gradient information and has enhanced ability especially when dealing

with big or redundant data [9]. Besides the gradient-based learning, as

another eﬀective learning strategy, extreme learning machine has also

been proposed and investigated by taking the batch mode [10] and

online mode [11] separately.

The appropriate network size is crucial to the learning eﬀectiveness

in real applications. Too small network cannot learn the data suﬃ-

ciently, whereas too large network easily leads to the well-known

overﬁtting problem and poor generalization. Though there have been

many related works in the literature, it is still hard to give an accurate

formula for the optimal network size [12]. There have been two

practical approaches instead: One is the constructive method, starting

with a minimal network and adding new nodes until the training

results are acceptable [13], and another is the pruning method, starting

with an oversized network and then removing the unimportant nodes

or weights [14].

ℓ

regularization learning is such a popular pruning method, aiming

at optimizing the network structure and weights simultaneously [15–

20]. By adding an ℓ

regularization term to the common error function



)

, the modiﬁed error function takes the form



λwww()= ()+‖‖,

(1)

where λ is the regularization coeﬃcient to balance the tradeoﬀ between

the training accuracy and the network complexity, and

∥

·∥

is the usual

ℓ

norm. The well-known “weight decay” technique just corresponds to

the ℓ

regularization, and has been shown to be eﬀective in controlling

the magnitude of the network weights and improving the generalization

performance of the trained networks [15,21–25]. However, “weight

decay” actually does not prune the network in the sense that almost no

sparse solutions can be produced by ℓ

regularization. According to the

regularization theory, an ℓ

regularization method produces sparse

solution only when

p0≤ ≤

, and the smaller the p, the sparser the

solution [26–30]. Peter M. Williams and Ishikawa proposed to use ℓ

http://dx.doi.org/10.1016/j.neucom.2016.10.057

Received 15 October 2015; Received in revised form 17 October 2016; Accepted 30 October 2016

⁎

Corresponding author.

E-mail address: zhhuisheng@dlmu.edu.cn (H. Zhang).

Neurocomputing 224 (2017) 1–8

Available online 03 November 2016

MARK

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38732425

粉丝: 6

在线梯度法：平滑ℓ0正则化提升前馈神经网络的稀疏与泛化

具有平滑L1 / 2正则化的批处理梯度学习算法在Sigma-Pi-Sigma神经网络中的收敛性

平滑L0正则化的前馈神经网络批量梯度训练方法

贝叶斯正则化的LMBP神经网络在电气检测系统中的应用.pdf

前馈神经网络、卷积神经网络和循环神经网络在处理不同类型数据时的优势分别是什么？结合优化与正则化技术如何应用于这些网络中？

前馈神经网络、卷积神经网络和循环神经网络在解决不同类型的机器学习问题中各自的优势是什么？请结合深度学习的优化与正则化技术加以说明。

前馈、卷积和循环神经网络如何利用优化与正则化技术处理不同类型的数据，以实现模型性能的提升？

前馈神经网络效果不好

matlab神经网络正则化

前馈神经网络学习笔记

jupyter前馈神经网络

最新资源