使用Marquardt算法训练前馈神经网络

需积分: 9 126 浏览量更新于2024-10-21 收藏 458KB PDF 举报

"这篇论文是关于使用Marquardt算法训练前馈神经网络的文章，源自1994年的IEEE Transactions on Neural Networks期刊。作者Martin T. Hagan和Mohammad B. Menhaj提出了一种结合Marquardt算法的反向传播算法，用于提升前馈网络训练的效率。他们在多个函数逼近问题上测试了该算法，并将其与共轭梯度算法和可变学习率算法进行了对比。结果表明，当网络权重不超过几百个时，Marquardt算法的效率远超其他两种技术。" 文章介绍了Marquardt算法在非线性最小二乘问题中的应用，并将其整合到反向传播算法中，以改进前馈神经网络的训练过程。前馈神经网络是一种基础的神经网络结构，通常包含输入层、隐藏层和输出层（如图1所示），其中信号从输入层单向传递到输出层，不形成反馈环。传统的反向传播算法在训练过程中，通过不断调整网络权重来最小化损失函数，但其收敛速度可能较慢。第一类加速方法主要涉及一些特定的技术，例如学习率的变化和动量的使用。这些方法虽然能在一定程度上改善训练效果，但往往缺乏普遍性和稳定性。 Marquardt算法，源自Levenberg-Marquardt算法，是在梯度下降法和高斯-牛顿法之间的一个折衷选择，尤其适用于具有局部最小值的问题。它在小步长下接近梯度下降法，保持良好的全局探索能力；而在大步长时接近高斯-牛顿法，能更快地收敛于局部最小值。因此，将Marquardt算法应用于反向传播，可以更好地处理权重更新，特别是在网络规模较小的情况下，其优势更为明显。文章的实验部分比较了Marquardt算法与其他两种方法在函数逼近任务上的表现。结果显示，当网络的权重数量不多时，Marquardt算法的收敛速度和最终的性能都优于共轭梯度算法和可变学习率算法。这表明Marquardt算法在优化前馈网络训练效率方面具有显著优势，特别是在处理复杂度适中的问题时。这项工作为神经网络的训练提供了一个新的视角，即通过引入Marquardt算法，可以有效地加速前馈网络的学习过程，提高了在实际应用中的效率和准确性。对于那些需要快速训练小型或中型前馈网络的领域，如控制系统、模式识别和数据分类等，Marquardt算法是一个值得考虑的优化工具。

IEEE

TRANSACTIONS

NEURAL

NETWORKS,

VOL.

NO.

NOVEMBER

1994

989

Brief

Papers

Training Feedforward Networks with the Marquardt Algorithm

Martin

Hagan

and

Mohammad

Menhaj

Abstract-

The Marquardt algorithm for nonlinear least

squares is presented and is incorporated into the backpropagation

algorithm for training feedforward neural networks. The

algorithm is tested on several function approximation problems,

and is compared with a conjugate gradient algorithm and a

variable learning rate algorithm. It is found that the Marquardt

algorithm is much more efficient than either of the other

techniques when the network contains no more than a few

hundred weights.

I I I

dl)

dn)

I. INTRODUCTION

Fig.

Three-layer feedforward network.

INCE the backpropagation learning algorithm

[

11 was

first popularized, there has been considerable research

on methods to accelerate the convergence of the algorithm.

This research falls roughly into two categories. The first

category involves the development of ad hoc techniques (e.g.,

[2]-[5]). These techniques include such ideas as varying the

learning rate, using momentum and rescaling variables. An-

other category of research has focused on standard numerical

optimization techniques (e.g.,

[

61-[ 91).

The most popular approaches from the second category have

used conjugate gradient or quasi-Newton (secant) methods.

The quasi-Newton methods are considered to

more efficient,

but their storage and computational requirements

up as the

square of the size of the network. There have been some lim-

ited memory quasi-Newton (one step secant) algorithms that

speed up convergence while limiting memory requirements

[&lo]. If exact line searches are used, the one step secant

methods produce conjugate directions.

Another area of numerical optimization that has been ap-

plied to neural networks is nonlinear least squares

[

1 11-[ 131.

The more general optimization methods were designed to

work effectively on all sufficiently smooth objective functions.

However, when the form of the objective function is known

it is often possible to design more efficient algorithms. One

particular form of objective function that is of interest for neu-

ral networks is a sum of squares of other nonlinear functions.

The minimization of objective functions of this type is called

nonlinear least squares.

Most of the applications of nonlinear least squares to neural

networks have concentrated on sequential implementations,

where the weights are updated after each presentation of an

input/output pair. This technique is useful when on-line adap-

tation is needed, but it requires that several approximations

This paper presents the application of a nonlinear least

squares algorithm to the batch training of multi-layer percep-

trons. For very large networks the memory requirements of

the algorithm make it impractical for most current machines

(as is the case for the quasi-Newton methods). However, for

networks with a few hundred weights the algorithm is very

efficient when compared with conjugate gradient techniques.

Section I1 briefly presents the basic backpropagation algorithm.

The main purpose of this section is to introduce notation

and concepts which are needed to describe the Marquardt

algorithm. The Marquardt algorithm is then presented in

Section 111. In Section IV the Marquardt algorithm is compared

with the conjugate gradient algorithm and with a variable

learning rate variation of backpropagation. Section V contains

a summary and conclusions.

11.

BACKPROPAGATION

ALGORITHM

Consider a multilayer feedforward network, such as the

The net input to unit

in layer

three-layer network of Fig. 1.

,k+l(i)

E,&++'(.

j)ak

(j)

bk+'(i).

(1)

j=1

The output of unit

will be

For an

layer network the system equations in matrix form

are given by

(3)

,k+l

fk++'

0,1,.

. .

(4)

made to the standard algorithms. The standard algorithms are

performed in batch mode, where the weights are only updated

after a complete sweep through the training set.

The task of the network is to learn associations between a spec-

ified set of input-output pairs

{

(pl

(p2

. .

(pQ

1045-9227/94$04.00

1994

IEEE

下载后可阅读完整内容，剩余4页未读，立即下载

Ethane

粉丝: 0
资源: 4

使用Marquardt算法训练前馈神经网络

Understanding the difficulty of training deep feedforward neural networks

Understanding the difficulty of training deep feedforward neural networks.zip

Multilayer feedforward networks are universal approximators

For me，Understanding the difficulty of training deep feedforward

Efficient Numerical Inversion Using Multilayer Feedforward Neural Networks

Global state regulation by output feedback for feedforward systems with input and output dependent incremental rate

A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training

Feedforward Sequential Memory Neural Networks.pdf

Feedforward Sequential Memory Networks- A New Structure to Learn

Feedforward Backpropagation Neural Networks（BP神经网络的Matlab程序）

最新资源