密度加权支持向量数据描述：一种无分布假设的异常检测方法

需积分: 10 182 浏览量更新于2024-08-31 收藏 816KB PDF 举报

本文档探讨了"密度加权支持向量数据描述"(Density Weighted Support Vector Data Description, DW-SVDD)的概念，这是一种在无代表性的非目标数据缺失的情况下，用于单类分类(One-Class Classification, OCC)的有效方法。该研究由Myungrae Cha、Jun Seok Kim和Jun-Geol Baek三位作者于2014年在《Expert Systems with Applications》期刊上发表。 DW-SVDD是基于支持向量数据描述(Support Vector Data Description, SVDD)的一种扩展，后者是OCC中广泛应用的技术。SVDD的核心思想是通过在高维空间中找到一个最优的球形描述边界来区分目标数据和异常点，这个边界仅依赖于每个数据点到边界点的核函数距离，而非数据点的实际密度分布。传统的SVDD忽略了数据密度在决策过程中的作用，而在DW-SVDD中，作者引入了一个关键改进，即考虑数据点的密度权重，这使得模型能更好地捕捉数据集中潜在的局部结构和异常模式。在密度加权的支持向量机中，数据点的密度被用来增强其在决策过程中的影响力。这种方法可以提高模型对稀疏数据区域的鲁棒性，因为在实际应用中，非均匀分布的数据是常见的。通过结合核技巧(kernel trick)，DW-SVDD能够有效地处理非线性关系，并且在高维空间中维持了有效的计算效率。与传统的k-最近邻(k-Nearest Neighbor, k-NN)方法相比，DW-SVDD的优势在于其能够提供一个确定的决策边界，而不仅仅是基于邻域的分类，这对于异常检测和异常行为识别尤其有价值。然而，由于需要估计数据点的密度，该方法可能对参数选择和计算复杂度有所增加。这篇论文对密度加权的支持向量数据描述提供了理论基础和实践策略，为单类分类任务，特别是在缺乏典型非目标样本的情况下，提供了一种更精确和鲁棒的分析工具。通过结合密度信息，DW-SVDD展示了在异常检测和模式识别领域的潜力，有望在未来的数据驱动应用中发挥重要作用。

Density weighted support vector data description

Myungraee Cha, Jun Seok Kim, Jun-Geol Baek

⇑

School of Industrial Management Engineering, Korea University, Anam-dong, Seongbuk-gu, 136-701 Seoul, Republic of Korea

article info

Keywords:

One-class classiﬁcation (OCC)

Support vector data description (SVDD)

Density weighted SVDD (DW-SVDD)

k-Nearest neighbor approach

abstract

One-class classiﬁcation (OCC) has received a lot of attention because of its usefulness in the absence of

statistically-representative non-target data. In this situation, the objective of OCC is to ﬁnd the optimal

description of the target data in order to better identify outlier or non-target data. An example of OCC,

support vector data description (SVDD) is widely used for its ﬂexible description boundaries without

the need to make assumptions regarding data distribution. By mapping the target dataset into high-

dimensional space, SVDD ﬁnds the spherical description boundary for the target data. In this process,

SVDD considers only the kernel-based distance between each data point and the spherical description,

not the density distribution of the data. Therefore, it may happen that data points in high-density regions

are not included in the description, decreasing classiﬁcation performance. To solve this problem, we pro-

pose a new SVDD introducing the notion of density weight, which is the relative density of each data

point based on the density distribution of the target data using the k-nearest neighbor (k-NN) app roach.

Incorporating the new weight into the search for an optimal description using SVDD, this new method

prioritizes data points in high-density regions, and eventually the optimal description shifts to these

regions. We demonstrate the improved performance of the new SVDD by using various datasets from

the UCI repository.

1. Introduction

One-class classiﬁcation (OCC) is a response to the data classiﬁ-

cation problem in which there is an absence of suitable negative

cases that can be used for training. The subject of a great deal of

past research, OCC thus aims to ﬁnd the best description of a data

set using only objects from one class, known as the target data. If

the target data is described accurately, it can be used to classify

other classes when there are insufﬁcient non-target data (Tax &

Duin, 2002); thus, OCC has attracted attention for use in excep-

tional situations where it is difﬁcult to gather datasets for other

classes or where no other classes exist (Khan & Madden, 2010;

Mazhelis, 2006).

Support vector data description (SVDD) is a widely used exam-

ple of OCC. The objective of SVDD is to ﬁnd a set of support vectors

(SVs) describing the spherical boundary of the target data by map-

ping it into high-dimensional feature space. Since the process oc-

curs in feature space, SVDD has a ﬂexible description boundary.

SVDD has been developed from support vector machines as a

way to compensate for weaknesses in previous OCC research. Be-

fore the use of support vectors, many classiﬁcation methods were

based on the estimation of the probability distribution of the target

data set, and this produced severe limitations for data sets that did

not follow a speciﬁc distribution (Tax & Duin, 2002). In contrast,

SVDD is easily applicable to data generated in the real world with

no assumptions regarding the data distribution (Grinblat, Uzal, &

Granitto, 2013; Sjöstrand, Hansen, Larsson, & Larsen, 2007).

In addition to not requiring assumptions of data distribution,

SVDD is also used in various ﬁelds for its ﬂexible description

boundaries. In terms of feature extraction methodology, it can be

used to produce a representative set of target data for image retrie-

val (Lai, Tax, Duin, Pe˛kalska, & Paclík, 2004), facial images (Lee,

Park, & Lee, 2006), and pattern recognition (Dong, Zhaohui, &

Wanfeng, 2001; Zhao, Wang, & Xiao, 2013). SVDD has also been

used in outlier detection for image sensory devices (Bovolo,

Camps-Valls, & Bruzzone, 2010; Guo, Chen, & Tsai, 2009), intrusion

detection (Kang, Jeong, & Kong, 2012), and mura inspection of thin-

ﬁlm transistor liquid–crystal displays (TFT-LCDs) (Liu, Lin, Hsueh,

& Lee, 2009; Liu, Liu, & Chen, 2011). With outliers recognized as

fault in the process, it is possible to identify faults in the dataset

using SVDD (Liu, Liu, & Chen, 2010; Luo, Cui, & Wang, 2011; Zhang,

Liu, Xie, & Li, 2009).

However, even though conventional SVDD has advantages in

data domain description, a major limitation exists. To decide the

optimal description of target data, SVDD takes into account only

the kernel-based distance between the spherical boundary and

the data points, not the distribution of the data. When SVDD sets

the description boundary without considering the density distribu-

tion of the data, it is possible that the boundary will pass through

http://dx.doi.org/10.1016/j.eswa.2013.11.025

⇑

Corresponding author. Tel.: +82 2 3290 3396; fax: +82 2 929 5888.

E-mail address: jungeol@korea.ac.kr (J.-G. Baek).

Expert Systems with Applications 41 (2014) 3343–3350

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

下载后可阅读完整内容，剩余7页未读，立即下载

Quant0xff

粉丝: 1w+

密度加权支持向量数据描述：一种无分布假设的异常检测方法

Weighted Guided Image Filtering.pdf

Research on spatial data interpolation.pdf

Burrus C S. Iterative re-weighted least squares.pdf

java邻接图_具有有向加权边的图的Java邻接列表实现(Java Adjacency list implementation of graph with directed weighted edge...

“public PearsonCorrelationSimilarity(DataModel dataModel, Weighting weighting) throws Exception { this.dataModel = dataModel; this.cachedNumItems = dataModel.getNumItems(); this.cachedNumUsers = dataModel.getNumUsers(); this.weighted = weighting == Weighting.WEIGHTED; }” 解释代码

(Real) Trade Weighted U.S. Dollar Index Collection （真实）贸易加权美元指数集合-数据集

ExtremeLearningMachine资源共享-Weighted-extreme-learning-machine-for-imbalance-learning_2013_Neurocomputing.pdf

Weighted Sparse Representation.zip

模型评价(svm)： precision recall f1-score support 积极 0.879 0.928 0.903 7902 消极 0.860 0.777 0.817 4525 accuracy 0.873 12427 macro avg 0.870 0.853 0.860 12427 weighted avg 0.872 0.873 0.871 12427

Write pytorch-based Python code to implement a neural network that solves a regression problem with an output layer of a positively weighted sub-network plus a negatively weighted sub-network.

最新资源