深度学习与线性一阶级 SVM 的高维大规模异常检测

需积分: 9 195 浏览量更新于2024-07-19 收藏 969KB PDF 举报

"dbn1svm_Erfani" 是一篇关于使用线性一元支持向量机（SVM）结合深度学习进行高维和大规模异常检测的研究论文。文章发表在2016年4月的《模式识别》期刊上，由Sarah M. Erfani、Sutharshan Rajasegarar、Shanika Karunasekera和Christopher Leckie共同撰写。该研究主要探讨了如何利用深度学习提升线性一元SVM在异常检测任务中的性能。在大数据时代，高维度和大规模的数据集变得越来越常见，这给异常检测带来了新的挑战。传统的异常检测方法可能在处理这类数据时效率低下或者效果不佳。线性一元SVM是一种无监督学习方法，主要用于识别正常行为模式，从而能够检测出与正常模式显著不同的异常行为。在本文中，作者提出将深度学习与线性一元SVM相结合，通过构建深度信念网络（DBN）来预处理数据，增强数据的表示能力，然后再用线性SVM进行异常检测。深度信念网络是一种深度学习模型，由多个受限玻尔兹曼机（RBM）层堆叠而成。DBN可以用来学习数据的多层次表示，从而提取出数据的潜在特征。这种特征提取能力对于异常检测尤其有用，因为它可以帮助区分正常模式和异常模式的特征差异。在实验部分，作者可能会详细讨论如何构建和训练DBN，以及如何使用训练好的DBN对数据进行预处理。接着，他们将介绍如何应用线性一元SVM对预处理后的数据进行异常检测，可能包括SVM的参数选择、训练过程以及评估指标。文章的贡献可能包括提出的深度学习增强的线性一元SVM方法在高维大规模数据集上的性能改进，以及与其他异常检测方法的比较。论文受到了同行的广泛关注，获得了34次引用，被阅读了829次。作者Sarah M. Erfani和Sutharshan Rajasegarar分别来自墨尔本大学和迪肯大学，他们在学术界有丰富的研究成果和影响力。他们的工作不仅推动了异常检测领域的研究，也为实际应用提供了有价值的参考。这篇研究论文展示了深度学习和线性一元SVM的结合在异常检测中的潜力，为处理高维度和大规模数据集的异常检测问题提供了一个有效且具有前瞻性的解决方案。这种方法的提出，对于理解异常检测的机制，优化模型性能，以及在安全监控、网络入侵检测、医疗诊断等领域的应用都具有重要意义。

of time complexity, memory complexity and the required number of

labelled records.

One-class Support Vector Machines (1SVMs) [8–10] are a

popular technique for unsupervised anomaly detection. Generally,

they aim to model the underlying distribution of normal data

while being insensitive to noise or anomalies in the training

records. A kernel function implicitly maps the input space to a

higher dimensional feature space to make a clearer separation

between normal and anomalous data. When properly applied, in

principle a kernel-based method is able to model any non-linear

pattern of normal behaviour. For clarity in the rest of the paper, the

notation of 1SVM is used to denote (an unsupervised) one-class

SVM; lSVMs — short for labeled SVM — to denote (supervised)

binary and multi-class SVM classiﬁers; and SVMs when both

1SVMs and lSVMs are considered.

SVMs are theoretically appealing for the following reasons

[11,12]: they provide good generalisation when the parameters are

appropriately conﬁgured, even if the training set has some bias;

they deliver a unique solution, since the loss function is convex;

and in principal they can model any training set, when an

appropriate kernel is chosen.

In practice, however, training SVMs is memory and time

intensive. SVMs are non-parametric learning models, whose

complexity grows quadratically with the number of records [13].

They are best suited to small datasets with many features, and so

far large-scale training on high-dimensional records (e.g.,

 10

) has been limited with SVMs [14]. Large numbers of

input features result in the curse of dimensionality phenomenon,

which causes the generalisation error of shallow architectures

(discussed in Section 2.1), such as SVMs, to increase with the

number of irrelevant and redundant features. The curse of

dimensionality implies that to obtain good generalisation, the

number of training samples must grow exponentially with the

number of features [14,4,15]. Furthermore, shallow architectures

have practical limitations for efﬁcient representation of certain

types of function families [16]. To avoid these major issues, it is

essential to generate a model that can capture the large degree of

variation that occurs in the underlying data pattern, without

having to enumerate all of them. Therefore, a compact repre-

sentation of the data that captures most of the variation can

alleviate the curse of dimensionality as well as reducing the

computational complexity of the algorithm [16,17].

An alternative class of classiﬁcation algorithms that have

emerged in recent years are Deep Belief Nets (DBNs), which have

been proposed as a multi-class classiﬁer and dimensionality

reduction tool [18–20]. DBNs are multi-layer generative models

that learn one layer of features at a time from unlabelled data. The

extracted features are then treated as the input for training the

next layer. This efﬁcient, greedy learning can be followed by ﬁne-

tuning the weights to improve the generative or discriminative

performance of the whole network.

DBNs have a deep architecture, composed of multiple layers of

parameterised non-linear modules. There are a range of advanta-

geous properties that have been identiﬁed for DBNs [16]: they can

learn higher-level features that yield good classiﬁcation accuracy;

they are parametric models, whose training time scales linearly

with the number of records; they can use unlabelled data to learn

from complex and high-dimensional datasets.

A major limitation of DBNs is that their loss function is non-

convex, therefore the model often converges on local minima and

there is no guarantee that the global minimum will be found. In

addition, DBN classiﬁers are semi-supervised algorithms, and

require some labelled examples for discriminative ﬁne-tuning,

hence an unsupervised generative model of DBNs, known as

autoencoders, are used for anomaly detection.

The open research problem we address is how to overcome the

limitations of one-class SVM architectures on complex, high-

dimensional datasets. We propose the use of DBNs as a feature

reduction stage for one-class SVMs, to give a hybrid anomaly

detection architecture. While a variety of feature reduction

methods — i.e., feature selection and feature extraction methods —

have been considered for SVMs (e.g., [21–25] — see [26] for a

survey) none have studied the use of DBNs as a method for deep

feature construction in the context of anomaly detection, i.e., with

a one-class SVM. In this paper, we design and evaluate a new

architecture for anomaly detection in high-dimensional domains.

To the best of our knowledge, this is the ﬁrst method proposed for

combining DBNs with one-class SVMs to improve their perfor-

mance for anomaly detection.

The contributions of this paper are two-fold. The performance

of DBNs against one-class SVMs is evaluated for detecting

anomalies in complex high-dimensional data. In contrast, the

reported results in the literature from DBN classiﬁcation perfor-

mance only cover multi-class classiﬁcation, e.g., [14,27–29]. A novel

unsupervised anomaly detection model is also proposed, which

combines the advantages of deep belief nets with one-class SVMs.

In our proposed model an unsupervised DBN is trained to extract

features that are reasonably insensitive to irrelevant variations in

the input, and a 1SVM is trained on the feature vectors produced

by the DBN. More speciﬁcally, for anomaly detection we show that

computationally expensive non-linear kernel machines can be

replaced by linear ones, when aggregated with a DBN. To the best

of our knowledge, this is the ﬁrst time these frameworks have

been combined this way. The result of experiments conducted on

several benchmark datasets demonstrate that our hybrid model

yields signiﬁcant performance improvements over the stand-alone

systems. The combination of the hybrid DBN-1SVM avoids the

complexity of non-linear kernel machines, and reaches the accu-

racy of a state-of-the-art autoencoder while considerably lowering

its training and testing time.

The remainder of the paper is organised as follows. Section 2

begins with an introduction to deep architectures and their strengths

and weaknesses compared to their shallow counterparts. Then it

review s some of the leading 1SVM methods, and motiv ates the

req uir ements for the hybrid model by considering the shortcomings

of SVMs for processing large datasets. Section 3 presents our pro-

posed unsupervised anomaly detection approach DBN-1SVM. Section

4 describes the empirical analysis and provides a detailed statistical

comparison of the performance of autoencoder, 1SVM and DBN-

1SVM models on various real-wor ld and synthetic datasets. It

demonstrates the advantages of the DBN-1SVM architectur e in terms

of both accuracy and computational efﬁciency. Section 5 summarises

the paper and outlines future research.

2. Background

2.1. Shallow and deep architectures

Classiﬁcation techniques with shallow architectures typically

comprise an input layer together with a single layer of processing.

Kernel machines such as SVMs, for example, are a layer of kernel

functions that are applied to the input, followed by a linear com-

bination of the kernel outputs. In contrast, deep architectures are

composed of several layers of nonlinear processing nodes. The

widely used form of the latter type of architectures are multi-layer

neural networks with multiple hidden layers.

While shallow architectures offer important advantages when

optimising the parameters of the model, such as using convex loss

functions, they suffer from limitations in terms of providing an

efﬁcient representation for certain types of function families. In

S.M. Erfani et al. / Pattern Recognition ∎ (∎∎∎∎) ∎∎∎–∎∎∎2

Please cite this article as: S.M. Erfani, et al., High-dimensional and large-scale anomaly detection using a linear one-class SVM with

deep learning, Pattern Recognition (2016), http://dx.doi.org/10.1016/j.patcog.2016.03.028i

剩余14页未读，继续阅读

gamency

粉丝: 1
资源: 2

深度学习与线性一阶级 SVM 的高维大规模异常检测

DBN_predict_DBN预测_dbn_features_DBN预测_源码

rbmtrain.rar_DBN SVM_深度学习_玻尔兹曼_玻尔兹曼 matlab_玻尔兹曼机

DBN.rar_DBN 分类_DBN 数据分类_DBN可运行_DBN数据分类_深度学习分类

ruan zhu_dbn时间序列_DBN实例_test_example_DBN.m_序列数据_DBN预测_源码.zip

DBN in python.rar_DBN分类_DBN网络python_dbn python分类_dbn 分类_python d

dbn-py.rar_DBN Python_DBN python实现_DBN python算法_dbn_python实现dbn

DBN.rar_RBM代码_dbn_rbm dbn_rbmup_steamhuu

DBN_rbm_DBNmatlab_dbn_dbn预训练_源码.zip

ruan zhu_dbn时间序列_DBN实例_test_example_DBN.m_序列数据_DBN预测.zip

DBN_another_dbn_matlab_canvmj_深度信念网络_

最新资源