实时在线正则化极限学习机优化算法

28 浏览量更新于2024-08-27 收藏 2.11MB PDF 举报

"这篇研究论文提出了一个改进的在线正则化极限学习机算法，称为及时在线正则化极限学习机（TORELM），旨在提升单隐藏层前馈网络的学习性能。" 正文：在线学习是一种在数据流不断到达时进行模型训练的方法，它允许模型随着新数据的出现而实时更新。极限学习机（Extreme Learning Machine, ELM）是一种快速的单隐藏层神经网络训练方法，其特点是随机初始化隐藏层权重，仅通过最小化输出层权重来拟合训练数据。然而，传统的在线学习策略可能会遇到过拟合问题，特别是在数据流中存在大量新样本时。这篇论文针对这一挑战，引入了正则化机制到在线序列极限学习机（Online Sequential ELM, OSELM）中，提出了TORELM算法。正则化是防止过拟合的一种常见技术，通过添加一个惩罚项到损失函数中，限制模型参数的复杂度。TORELM在统一框架下处理这一问题，无论是逐个样本还是分块处理数据，都能有效地应用正则化。 TORELM算法的独特之处在于它还考虑了数据的时效性。在增量训练样本中，TORELM分析每个新样本或数据块，同时考虑到新样本可能比历史数据更重要，因为它们可能更能代表当前环境或趋势。这种“及时性管理”策略使得新数据能优先于旧数据被处理，从而提高新信息对模型的影响，增强了学习算法的适应性和鲁棒性。此外，TORELM算法允许块大小固定或可变，这为处理不同速率的数据流提供了灵活性。固定大小的块可以确保每个样本的处理时间相对均匀，而可变大小的块则能更好地应对数据流的变化。 "及时的在线正则化极限学习机"这篇研究论文提出了一个新颖的机器学习方法，该方法结合了在线学习、正则化和数据的时效性管理，以优化单隐藏层前馈网络的在线学习性能，特别是对于处理大规模、动态变化的数据流具有显著优势。TORELM有望在实时数据分析和预测任务中展现出更优秀的性能，如在物联网、金融时间序列预测和大数据分析等领域。

2.1 ELM

For N arbitrary distinct training sample fðx

; t

i¼1

g, where

¼ x

; x

; ...; x

½

2 R

and t

¼ t

; t

; ...; t

½

2 R

the corresponding output function of ELM with L hidden

neurons and activation function g(•) are mathematically

modeled as (1)[23].

j¼1

gðw

 x

þ b

Þ; ð1Þ

where w

= [w

, w

,…,w

]

(j = 1, 2,…,L) is the weight

vector connecting the j-th hidden neuron and the input

neurons, b

= [b

, b

,…,b

]

is the weight vector con-

necting the j-th hidden neuron and the output neurons, and

is the threshol d of the j-th hidden neuron. In addition,

x

denotes the inner product of w

and x

The above N equations can be written compactly as:

Hb ¼ T; ð2Þ

where

H ¼

hðx

NL

gðw

 x

þ b

Þ  gðw

 x

þ b



gðw

 x

þ b

Þ  gðw

 x

þ b

NL

;

b ¼

Lm

; and T ¼

Nm

Here, H is called the hidden layer output matrix of the

SLFN, and the j-th column of H is the j-th hidden node

output with respect to the inputs x

, x

,…,x

where j = 1,

2,…,L. In addition, h(•) is called the hidden layer feature

mapping. The i-th row of H, i.e., h(x

), is the hidden layer

feature mapping with respect to the i-th input x

, where

i = 1, 2,…,N.

According to the analysis in [24], unlike the most

common understanding that all the parameters of SLFN

need to be adjusted, the input weights w

and the ﬁrst

hidden layer biases b

of ELM are not necessarily tuned,

and they can be given randomly. Moreover, the orthogonal

projection method can be efﬁciently used in ELM: H



= (H

-1

if H

H is nonsingular, where H



is the

Moore–Penrose generalized inverse of H. Actually, the

matrix H maps the data x

from the input space to the

hidden-layer feature space, and the feature mapping matrix

H is irrelevant to the target t

Therefore, the solution of b is:

b ¼ H

T ¼ðH

HÞ

1

T: ð3Þ

Because the hidden layer matrix H remains unchanged

actually once random values have been assigned at the

beginning of learning. Then, the above equation can be

viewed as a linear system, in which the training of SLFN

can be achieved by solving this linear system. Trainin g the

SLFN is simply equivalent to ﬁnding a least square solu-

tion b of this liner system. And the minimum norm least

square solution of (3) is unique.

2.2 OSELM

In the ELM, all the samples need to be handled before

being trained. However, it is difﬁcult to obtain the whole

samples only the once in some practical applications. Then,

OSELM was proposed to deal with this issue [16]. After

dividing the matrix into several parts for training in

OSELM, it is effective to improve the computational

efforts and the learning performance. Let X

be the newly

incremental training data. The effect of incremental data is

inﬂuenced by the correction Db, which modiﬁes the his-

torical model b

to form a new model b

with the following

equation.



¼ b

þ DbðX



Þ: ð4Þ

In [16], a solution is provided to this model. Given a

chunk of initial training data set @

¼fðx

; t

i¼1

gðN

 LÞ,

under the ELM scheme, we can ﬁnd:

¼ K

1

; ð5Þ

where K

¼ H

and H

hðx

L

; T

m

, hðx

Þ¼½gðw

 x

þ b

Þ; ...; gðw

 x

þ b

Þ

1L

;

...; hðx

Þ¼½gðw

 x

þ b

Þ; ...; gðw

 x

þ b

Þ

1L

Suppose that we are given another chunk of data set

¼fðx

; t

þN

i¼N

þ1

g, where N

denotes the number of new

samples in this data set. Considering both training data sets

and @

, the output weight b

becomes [16]:

¼ K

1



; ð6Þ

where







¼ K

þ H

;

Int. J. Mach. Learn. & Cyber.

123

剩余11页未读，继续阅读

weixin_38519763

粉丝: 5
资源: 922

实时在线正则化极限学习机优化算法

极限学习机 代码

正则化超限学习机的多分块松弛交替方向乘子法

极限学习机的代码

Curvelet变换与稀疏多图正则化极限学习机在抽油机故障诊断中的应用

代价敏感正则化有限记忆多隐层在线序列极限学习机及图像识别应用.pdf

入门学习Linux常用必会60个命令实例详解doc/txt

改进核映射极限学习机提升入口氮氧化物预测精度

【性能瓶颈分析专家】：挑战吞吐量极限，探索系统优化之道

支持向量机的预测区间：理论与应用

【机器学习模型性能评估基石】：掌握混淆矩阵，提升分类准确率

最新资源

极限学习机代码