极限学习机：理论与应用

需积分: 50 50 浏览量更新于2024-09-08 收藏 465KB PDF 举报

"这篇文章是Guang-Bin Huang, Qin-Yu Zhu和Chee-Kheong Siew在2006年发表的《极端学习机：理论与应用》(Extreme Learning Machine: Theory and Applications)，主要讨论了一种新型的学习算法——极端学习机(ELM)，它应用于单隐藏层前馈神经网络(SLFNs)。ELM通过随机选择隐藏节点并解析确定输出权重，解决了传统神经网络训练中的速度慢和参数迭代调整的问题，提高了学习效率。" 正文：在人工智能领域，神经网络学习是一个重要的研究方向，而极限学习机(Extreme Learning Machine, ELM)作为一种创新的神经网络学习算法，由黄光弼等人在2006年的文章中提出。该文发表于《神经计算》(Neurocomputing)杂志，旨在解决过去几十年来前馈神经网络学习速度慢的瓶颈问题。传统的神经网络通常使用基于梯度的优化算法进行训练，这些算法效率低且耗时。黄光弼等人的研究指出，这种缓慢的学习速度可能是由于两个主要因素：一是广泛采用的基于梯度的训练算法，二是网络所有参数需要通过这些学习算法进行迭代调整。为了解决这些问题，ELM提出了一种新的方法，它改变了神经网络的学习方式。 ELM的核心思想在于，对于单隐藏层前馈神经网络(SLFNs)，不再需要迭代训练隐藏层节点的权重。相反，它随机生成隐藏层的节点，这大大减少了训练时间。然后，通过解析方法确定输出层的权重，从而快速获得网络的最优配置。这种方法不仅简化了训练过程，还能够保证网络的性能，通常能够得到较好的泛化能力。 ELM的高效性源于其避免了反向传播算法中的权重调整迭代过程，这使得它在许多实际应用中表现出色，比如模式识别、分类任务、回归分析和非线性系统建模。在文中，作者们展示了ELM在处理这些任务时与其他方法相比的优势，证实了其在处理复杂问题时的有效性和实用性。此外，ELM的另一个优势是它的可扩展性。由于ELM的训练过程不依赖于复杂的梯度计算，因此可以方便地应用于大规模的神经网络，包括大量的输入特征和隐藏节点，这对于大数据集的处理特别有利。《极端学习机：理论与应用》一文为神经网络学习提供了新的视角，其提出的ELM算法不仅提高了训练效率，而且简化了网络结构的设计，对于理解和改进其他机器学习算法，特别是神经网络算法，具有深远的影响。这一工作对人工智能领域的研究者和支持向量机(SVM)等其他模型的学习算法设计提供了重要的理论依据和实践指导。

If the activation function g is inﬁnitely differentiable we

can prove that the required number of hidden nodes

NpN.

Strictly speaking, we have

Theorem 2.1. Given a standard SLFN with N hidden nodes

and activation function g : R ! R which is inﬁnitely

differentiable in any interval, for N arbitrary distinct samples

ðx

; t

Þ, where x

2 R

and t

2 R

, for any w

and b

randomly chosen from any intervals of R

and R, respec-

tively, according to any continuous probability distribution,

then with probability one, the hidden layer output matrix H

of the SLFN is invertible and kHb  Tk¼0.

Proof. Let us consider a vector cðb

Þ¼½g

ðx

Þ; ...;

ðx

Þ

¼½gðw

 x

þ b

Þ; ...; gðw

 x

þ b

Þ

, the ith col-

umn of H, in Euclidean space R

, where b

2ða; bÞ and

ða; bÞ is any interval of R.

Following the same proof method of Tamura and

Tateishi ( [23], p. 252) and our previous work ( [10],

Theorem 2.1), it can be easily proved by contradiction that

vector c does not belong to any subspace whose dimension

is less than N.

Since w

are randomly generated based on a continuous

probability distribution, we can assume that w

 x



for all kak

. Let us suppose that c belongs to a

subspace of dimension N  1. Then there exists a vector a

which is orthogonal to this subspace

ða; cðb

ÞcðaÞÞ ¼ a

 gðb

þ d

Þþa

 gðb

þ d

þþa

 gðb

þ d

Þz ¼ 0, ð6Þ

where d

¼ w

 x

, k ¼ 1; ...; N and z ¼ a  cðaÞ ,

2ða; bÞ. Assume a

a0, Eq. (6) can be further written

gðb

þ d

Þ¼

N1

p¼1

gðb

þ d

Þþz=a

, (7)

where g

¼ a

, p ¼ 1; ...; N  1. Since gðxÞ is inﬁnitely

differentiable in any interval, we have

ðlÞ

ðb

þ d

Þ¼

N1

p¼1

ðlÞ

ðb

þ d

Þ,

l ¼ 1; 2; ...; N; N þ 1; ..., ð8Þ

where g

ðlÞ

is the lth derivative of function g of b

. However,

there are only N  1 free coefﬁcients: g

; ...; g

N1

for the

derived more than N  1 linear equations, this is contra-

dictory. Thus, vector c does not belong to any subspace

whose dimension is less than N.

Hence, from any interval ða; bÞ it is possible to randomly

choose N bias values b

; ...; b

for the N hidden nodes

such that the corresponding vector s cðb

Þ; cðb

Þ; ...; cðb

span R

. This means that for any weight vectors w

and

bias values b

chosen from any intervals of R

and R,

respectively, according to an y continuous probability

distribution, then with probability one, the column vectors

of H can be made full-rank. &

Such activation functi ons include the sigmoidal func-

tions as well as the radial basis, sine, cosine, exponential,

and many other nonregular functions as shown in Huang

and Babri [11].

Furthermore, we have

Theorem 2.2. Given any small positive value 40 and

activation function g : R ! R which is inﬁnitely differentiable

in any interval, there exists

NpN such that for N arbitrary

distinct samples ðx

; t

Þ, where x

2 R

and t

2 R

, for any w

and b

randomly chosen from any intervals of R

and R,

respectively, according to any continuous probability distribu-

tion, then with probability one, kH

N

Nm

 T

Nm

ko.

Proof. The validity of the theorem is obvious, otherwise, one

could simply choose

N ¼ N which makes kH

N

Nm



Nm

ko according to Theorem 2.1. &

3. Proposed ex treme learning machi ne (ELM)

Based on Theorems 2.1 and 2.2 we can propose in this

section an extremely simple and efﬁci ent method to train

SLFNs.

3.1. Conventional gradient-based solution of SLFNs

Traditionally, in order to train an SLFN, one may wish

to ﬁnd speciﬁc

;

b (i ¼ 1 ; ...;

N) such that

kHð

; ...;

;

; ...;

b  Tk

¼ min

kHðw

; ...; w

; b

; ...; b

Þb  Tkð9Þ

which is equivalent to minimizing the cost function

E ¼

j¼1

i¼1

gðw

 x

þ b

Þt

. (10)

When H is unknown gradient-based learning algori-

thms are generally used to search the minimum of

kHb  Tk. In the minimization procedure by using

gradient-based algorithms, vector W, which is the set of

weights (w

) and biases (b

) parameters, is iteratively

adjusted as follows:

¼ W

k1

 Z

qEðWÞ

. (11)

Here Z is a learning rate. The popular learning algorithm

used in feedforward neural networks is the BP learning

algorithm where gradients can be computed efﬁciently by

propagation from the output to the input. There are several

issues on BP learning algorithms:

(1) When the learning rate Z is too small, the learning

algorithm converges very slowly. However, when

Z is too large, the algorithm becomes unstable and

diverges.

ARTICLE IN PRESS

In fact, the theorem and its proof are also linearly valid for the case

ðxÞ¼gðkx  w

k=b

Þ; w

2 R

; b

2 R

G.-B. Huang et al. / Neurocomputing 70 (2006) 489–501 491

剩余12页未读，继续阅读

qq_39610440

粉丝: 20

极限学习机：理论与应用

Extreme Learning Machine资源分享：马尔科夫跳跃遗传振荡器同步研究

Extreme Learning Machine资源分享：色彩信息在大规模人脸识别中的应用

Extreme Learning Machine与半监督谱哈希：快速相似性搜索

Extreme learning machine: Theory and applications.pdf

Extreme Learning Machine based Point-of-Interest Recommendation in Location-Based Social Networks

A Hybrid Algorithm of Extreme Learning Machine and Sparse Auto-encoder

ExtremeLearningMachine资源共享-Weighted-extreme-learning-machine-for-imbalance-learning_2013_Neurocomputing.pdf

Extreme learning machine and its applications（Shifei Ding, Xinzheng Xu & Ru Nie）

Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation

Parallel multi-graph classification using extreme learning machine and MapReduce

最新资源