使用极限学习机进行离群值检测的新型方法

115 浏览量更新于2024-07-14 收藏 273KB PDF 举报

"这篇研究论文探讨了一种基于极限学习机(Extreme Learning Machine, ELM)的离群值检测方法，称为ODELM(Outlier Detection based on Extreme Learning Machine)。作者指出传统的离群值检测方法通常依赖于计算点之间的距离或假设内类数据的分布，而这些方法可能受到度量方式的影响，且往往需要参数调整或消耗大量时间，有时还需要额外的标签信息。论文中，作者提出利用ELM的高效学习能力构建一个复制模型，然后将其应用于整个数据集，通过分析模型输出的多样性来识别离群值。" 离群值检测是数据分析中的一个重要任务，它涉及到寻找那些在数据集中显著偏离其他数据点的样本。离群值可能是由于测量错误、异常事件或数据收集过程中的问题导致的，它们可能对统计分析和机器学习模型的性能产生重大影响。许多现有的离群值检测方法基于距离度量，如欧氏距离、马氏距离等，或者假设数据遵循某种特定的分布，如高斯分布。然而，这些方法在处理复杂或非高斯分布的数据时可能效果不佳。极限学习机是一种快速的单层神经网络学习算法，它通过随机初始化隐藏层节点权重并一次性求解输出层权重来实现高效训练。ELM的独特之处在于其不需要反向传播或其他迭代优化步骤，这使得它在处理大数据集时具有速度优势，并且对过拟合有较好的抵抗能力。论文中提出的ODELM方法利用了ELM的这些特性。首先，ELM被用来训练一个复制模型，该模型试图学习数据集的整体行为。随后，这个模型被应用到所有数据点上，生成预测输出。如果某个数据点的预测输出与实际值差异较大，那么它可能被标记为离群值，因为它的行为与模型学习的“正常”模式不符。这种方法避免了对数据分布的假设，减少了对参数调整的需求，并且可以处理大规模数据。这篇论文提出了一个新的离群值检测策略，它利用了极限学习机的高效性和对数据分布不敏感的特性。这种方法有望在处理各种类型和规模的数据集时提供有效的离群值检测，特别是在时间和计算资源有限的情况下。

2.2 Extreme Learning Machine

Extreme Learning Machines (ELMs) are ﬁrst proposed by G.-B. Huang in

[26]. They are originally derived from the single-hidden layer feed forward

neural networks (SLFNs) and then extended to the “generalized” SLFNs. The

essence of ELM is that: if the feature mapping of hidden layer satisﬁes the

universal approximation condition, the hidden layer of SLFNs need not be

iteratively tuned, which is diﬀerent from the traditional SLFNs. One of the

typical implementations of ELMs is that the weight matrix between the input

and the hidden layer is generated randomly, while the output weights are

calculated by the least-square method afterwards.

ELMs not only tend to reach the smallest training error but also the small-

est norm of output weights. According to the neural network theory, for feed

forward neural networks, the smaller norm of weights are, the better general-

ization performance the networks tend to have.

( )f x

1 1

( , , )G bw x

( , , )

i i

G bw x

( , , )

L L

G bw x

Figure 1. Illustration of SLFN.

SLFN network functions with L hidden nodes can be expressed by

(x) =

∑

j=1

(x)β

∑

j=1

G(w

, b

, x)β

, x ∈ R

, β

∈ R

(4)

where w

= [w

, w

, · · · , w

]

is the weight vector connecting the jth hidden

neuron and the input neurons, β

= [β

, β

, · · · , β

]

is the weight vector

connecting the jth hidden neuron and the output neurons, and b

is the thresh-

old of the jth hidden neuron, g

(x) denotes the output function G(w

, b

, x)

of the jth hidden node. For N arbitrary distinct samples (x

, t

) ∈ R

× R

SLFNs with L hidden nodes can approximate these N samples with zero error

which means that there exist (w

, b

) and β

such that

∑

j=1

G(w

, b

, x

)β

= t

, i = 1, · · · , N (5)

剩余23页未读，继续阅读

weixin_38738977

粉丝: 6
资源: 971

使用极限学习机进行离群值检测的新型方法

汽车价格离群值检测数据集

基于深度学习的离群值输入向量(matlab)

分布式算法的无监督极限学习机基于聚类的离群值检测

基于质心的离群值检测方法

abodoutlier:基于角度的离群值检测

基于密度的数值海量数据离群值检测方法

高维数据下基于Rocke估计的鲁棒离群值检测

基于M估计器的在线序贯极限学习机，用于预测具有离群值的混沌时间序列

自适应加权一类支持向量机的离群值检测

Density-Based_Outlier_Detection:使用相对密度和K均值聚类的基于密度的离群值检测

最新资源